Suppose I use rdd.joinWithCassnadra("keySpace", "table1"), does this do a full table scan which is not required at any cost ????
On Tue, Sep 22, 2015 at 3:03 PM, Artem Aliev <artem.al...@gmail.com> wrote: > All that code should looks like: > stream.filter(...).map(x=>(key, > ....)).joinWithCassandra(....).map(...).saveToCassandra(....) > > I'm not sure about exactly 10 messages, spark streaming focus on time not > count.. > > > On Tue, Sep 22, 2015 at 2:14 AM, Priya Ch <learnings.chitt...@gmail.com> > wrote: > >> I have scenario like this - >> >> I read dstream of messages from kafka. Now if my rdd contains 10 >> messages, for each message I need to query the cassandraDB, do some >> modification and update the records in DB. If there is no option of passing >> sparkContext to workers to read.write into DB, the only option is to use >> CassandraConnextor.withSession ???? If yes, for writing to table, should i >> construct the entire INSERT statement for thousands of fields in the DB ? >> Is this way of writing code is an optimized way ??? >> >> On Tue, Sep 22, 2015 at 1:32 AM, Romi Kuntsman <r...@totango.com> wrote: >> >>> Cody, that's a great reference! >>> As shown there - the best way to connect to an external database from >>> the workers is to create a connection pool on (each) worker. >>> The driver mass pass, via broadcast, the connection string, but not the >>> connect object itself and not the spark context. >>> >>> On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <c...@koeninger.org> >>> wrote: >>> >>>> That isn't accurate, I think you're confused about foreach. >>>> >>>> Look at >>>> >>>> >>>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd >>>> >>>> >>>> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <r...@totango.com> >>>> wrote: >>>> >>>>> foreach is something that runs on the driver, not the workers. >>>>> >>>>> if you want to perform some function on each record from cassandra, >>>>> you need to do cassandraRdd.map(func), which will run distributed on the >>>>> spark workers >>>>> >>>>> *Romi Kuntsman*, *Big Data Engineer* >>>>> http://www.totango.com >>>>> >>>>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch < >>>>> learnings.chitt...@gmail.com> wrote: >>>>> >>>>>> Yes, but i need to read from cassandra db within a spark >>>>>> transformation..something like.. >>>>>> >>>>>> dstream.forachRDD{ >>>>>> >>>>>> rdd=> rdd.foreach { >>>>>> message => >>>>>> sc.cassandraTable() >>>>>> . >>>>>> . >>>>>> . >>>>>> } >>>>>> } >>>>>> >>>>>> Since rdd.foreach gets executed on workers, how can i make >>>>>> sparkContext available on workers ??? >>>>>> >>>>>> Regards, >>>>>> Padma Ch >>>>>> >>>>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>>>> >>>>>>> You can use broadcast variable for passing connection information. >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <learnings.chitt...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> can i use this sparkContext on executors ?? >>>>>>> In my application, i have scenario of reading from db for certain >>>>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in >>>>>>> our >>>>>>> case), >>>>>>> >>>>>>> If sparkContext couldn't be sent to executors , what is the >>>>>>> workaround for this ?????? >>>>>>> >>>>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <oss.mli...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> add @transient? >>>>>>>> >>>>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch < >>>>>>>> learnings.chitt...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hello All, >>>>>>>>> >>>>>>>>> How can i pass sparkContext as a parameter to a method in an >>>>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable >>>>>>>>> Exception. >>>>>>>>> >>>>>>>>> How can i achieve this ? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Padma Ch >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> To unsubscribe from this group and stop receiving emails from it, send an >> email to spark-connector-user+unsubscr...@lists.datastax.com. >> > > To unsubscribe from this group and stop receiving emails from it, send an > email to spark-connector-user+unsubscr...@lists.datastax.com. >