Re: Spark Function setup and cleanup
Look at mapPartitions. Where as map turns one value V1 into one value V2, mapPartitions lets you turn one entire Iterator[V1] to one whole Iterator [V2]. The function that does so can perform some initialization at its start, and then process all of the values, and clean up at its end. This is how you mimic a Mapper, really. The most literal translation of Hadoop MapReduce I can think of is: Mapper: mapPartitions to turn many (K1,V1) into many (K2,V2) (shuffle) groupByKey to turn that into (K2,Iterator[V2]) Reducer mapPartitions to turn many (K2,Iterator[V2]) into many (K3,V3) It's not necessarily optimal to do it this way -- especially the groupByKey bit. You have more expressive power here and need not fit it into this paradigm. But yes you can get the same effect as in MapReduce, mostly from mapPartitions. On Sat, Jul 26, 2014 at 8:52 AM, Yosi Botzer wrote: > Thank you, but that doesn't answer my general question. > > I might need to enrich my records using different datasources (or DB's) > > So the general use case I need to support is to have some kind of Function > that has init() logic for creating connection to DB, query the DB for each > records and enrich my input record with stuff from the DB, and use some kind > of close() logic to close the connection. > > I have implemented this kind of use case using Map/Reduce and I want to know > how can I do it with spark
Re: Spark Function setup and cleanup
Thank you, but that doesn't answer my general question. I might need to enrich my records using different datasources (or DB's) So the general use case I need to support is to have some kind of Function that has init() logic for creating connection to DB, query the DB for each records and enrich my input record with stuff from the DB, and use some kind of close() logic to close the connection. I have implemented this kind of use case using Map/Reduce and I want to know how can I do it with spark Thanks On Fri, Jul 25, 2014 at 6:24 AM, Yanbo Liang wrote: > You can refer this topic > http://www.mapr.com/developercentral/code/loading-hbase-tables-spark > > > 2014-07-24 22:32 GMT+08:00 Yosi Botzer : > > In my case I want to reach HBase. For every record with userId I want to >> get some extra information about the user and add it to result record for >> further prcessing >> >> >> On Thu, Jul 24, 2014 at 9:11 AM, Yanbo Liang >> wrote: >> >>> If you want to connect to DB in program, you can use JdbcRDD ( >>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala >>> ) >>> >>> >>> 2014-07-24 18:32 GMT+08:00 Yosi Botzer : >>> >>> Hi, I am using the Java api of Spark. I wanted to know if there is a way to run some code in a manner that is like the setup() and cleanup() methods of Hadoop Map/Reduce The reason I need it is because I want to read something from the DB according to each record I scan in my Function, and I would like to open the DB connection only once (and close it only once). Thanks >>> >>> >> >
Re: Spark Function setup and cleanup
You can refer this topic http://www.mapr.com/developercentral/code/loading-hbase-tables-spark 2014-07-24 22:32 GMT+08:00 Yosi Botzer : > In my case I want to reach HBase. For every record with userId I want to > get some extra information about the user and add it to result record for > further prcessing > > > On Thu, Jul 24, 2014 at 9:11 AM, Yanbo Liang wrote: > >> If you want to connect to DB in program, you can use JdbcRDD ( >> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala >> ) >> >> >> 2014-07-24 18:32 GMT+08:00 Yosi Botzer : >> >> Hi, >>> >>> I am using the Java api of Spark. >>> >>> I wanted to know if there is a way to run some code in a manner that is >>> like the setup() and cleanup() methods of Hadoop Map/Reduce >>> >>> The reason I need it is because I want to read something from the DB >>> according to each record I scan in my Function, and I would like to open >>> the DB connection only once (and close it only once). >>> >>> Thanks >>> >> >> >
Re: Spark Function setup and cleanup
In my case I want to reach HBase. For every record with userId I want to get some extra information about the user and add it to result record for further prcessing On Thu, Jul 24, 2014 at 9:11 AM, Yanbo Liang wrote: > If you want to connect to DB in program, you can use JdbcRDD ( > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala > ) > > > 2014-07-24 18:32 GMT+08:00 Yosi Botzer : > > Hi, >> >> I am using the Java api of Spark. >> >> I wanted to know if there is a way to run some code in a manner that is >> like the setup() and cleanup() methods of Hadoop Map/Reduce >> >> The reason I need it is because I want to read something from the DB >> according to each record I scan in my Function, and I would like to open >> the DB connection only once (and close it only once). >> >> Thanks >> > >
Re: Spark Function setup and cleanup
If you want to connect to DB in program, you can use JdbcRDD ( https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ) 2014-07-24 18:32 GMT+08:00 Yosi Botzer : > Hi, > > I am using the Java api of Spark. > > I wanted to know if there is a way to run some code in a manner that is > like the setup() and cleanup() methods of Hadoop Map/Reduce > > The reason I need it is because I want to read something from the DB > according to each record I scan in my Function, and I would like to open > the DB connection only once (and close it only once). > > Thanks >