Re: Stateful mapPartitions
Yeah the main way to do this would be to have your own static cache of connections. These could be using an object in Scala or just a static variable in Java (for instance a set of connections that you can borrow from). - Patrick On Thu, Dec 4, 2014 at 5:26 PM, Tobias Pfeiffer t...@preferred.jp wrote: Hi, On Fri, Dec 5, 2014 at 3:56 AM, Akshat Aranya aara...@gmail.com wrote: Is it possible to have some state across multiple calls to mapPartitions on each partition, for instance, if I want to keep a database connection open? If you're using Scala, you can use a singleton object, this will exist once per JVM (i.e., once per executor), like object DatabaseConnector { lazy val conn = ... } Please be aware that shutting down the connection is much harder than opening it, because you basically have no idea when processing is done for an executor, AFAIK. Tobias - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Stateful mapPartitions
Is it possible to have some state across multiple calls to mapPartitions on each partition, for instance, if I want to keep a database connection open?
Re: Stateful mapPartitions
I want to have a database connection per partition of the RDD, and then reuse that connection whenever mapPartitions is called, which results in compute being called on the partition. On Thu, Dec 4, 2014 at 11:07 AM, Paolo Platter paolo.plat...@agilelab.it wrote: Could you provide some further details ? What do you nerd to do with db cpnnection? Paolo Inviata dal mio Windows Phone -- Da: Akshat Aranya aara...@gmail.com Inviato: 04/12/2014 18:57 A: user@spark.apache.org Oggetto: Stateful mapPartitions Is it possible to have some state across multiple calls to mapPartitions on each partition, for instance, if I want to keep a database connection open?
Re: Stateful mapPartitions
Hi, On Fri, Dec 5, 2014 at 3:56 AM, Akshat Aranya aara...@gmail.com wrote: Is it possible to have some state across multiple calls to mapPartitions on each partition, for instance, if I want to keep a database connection open? If you're using Scala, you can use a singleton object, this will exist once per JVM (i.e., once per executor), like object DatabaseConnector { lazy val conn = ... } Please be aware that shutting down the connection is much harder than opening it, because you basically have no idea when processing is done for an executor, AFAIK. Tobias