Re: Stateful mapPartitions

2014-12-05 Thread Patrick Wendell
Yeah the main way to do this would be to have your own static cache of
connections. These could be using an object in Scala or just a static
variable in Java (for instance a set of connections that you can
borrow from).

- Patrick

On Thu, Dec 4, 2014 at 5:26 PM, Tobias Pfeiffer t...@preferred.jp wrote:
 Hi,

 On Fri, Dec 5, 2014 at 3:56 AM, Akshat Aranya aara...@gmail.com wrote:

 Is it possible to have some state across multiple calls to mapPartitions
 on each partition, for instance, if I want to keep a database connection
 open?


 If you're using Scala, you can use a singleton object, this will exist once
 per JVM (i.e., once per executor), like

 object DatabaseConnector {
   lazy val conn = ...
 }

 Please be aware that shutting down the connection is much harder than
 opening it, because you basically have no idea when processing is done for
 an executor, AFAIK.

 Tobias


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Stateful mapPartitions

2014-12-04 Thread Akshat Aranya
Is it possible to have some state across multiple calls to mapPartitions on
each partition, for instance, if I want to keep a database connection open?


Re: Stateful mapPartitions

2014-12-04 Thread Akshat Aranya
I want to have a database connection per partition of the RDD, and then
reuse that connection whenever mapPartitions is called, which results in
compute being called on the partition.

On Thu, Dec 4, 2014 at 11:07 AM, Paolo Platter paolo.plat...@agilelab.it
wrote:

  Could you provide some further details ?
 What do you nerd to do with db cpnnection?

 Paolo

 Inviata dal mio Windows Phone
  --
 Da: Akshat Aranya aara...@gmail.com
 Inviato: ‎04/‎12/‎2014 18:57
 A: user@spark.apache.org
 Oggetto: Stateful mapPartitions

  Is it possible to have some state across multiple calls to mapPartitions
 on each partition, for instance, if I want to keep a database connection
 open?



Re: Stateful mapPartitions

2014-12-04 Thread Tobias Pfeiffer
Hi,

On Fri, Dec 5, 2014 at 3:56 AM, Akshat Aranya aara...@gmail.com wrote:

 Is it possible to have some state across multiple calls to mapPartitions
 on each partition, for instance, if I want to keep a database connection
 open?


If you're using Scala, you can use a singleton object, this will exist once
per JVM (i.e., once per executor), like

object DatabaseConnector {
  lazy val conn = ...
}

Please be aware that shutting down the connection is much harder than
opening it, because you basically have no idea when processing is done for
an executor, AFAIK.

Tobias