Re: passing SparkContext as parameter

Priya Ch Tue, 22 Sep 2015 04:25:21 -0700

Suppose I use rdd.joinWithCassnadra("keySpace", "table1"), does this do a
full table scan which is not required at any cost ????


On Tue, Sep 22, 2015 at 3:03 PM, Artem Aliev <artem.al...@gmail.com> wrote:

> All that code should looks like:
> stream.filter(...).map(x=>(key,
> ....)).joinWithCassandra(....).map(...).saveToCassandra(....)
>
> I'm not sure about exactly 10 messages, spark streaming focus on time not
> count..
>
>
> On Tue, Sep 22, 2015 at 2:14 AM, Priya Ch <learnings.chitt...@gmail.com>
> wrote:
>
>> I have scenario like this -
>>
>>  I read dstream of messages from kafka. Now if my rdd contains 10
>> messages, for each message I need to query the cassandraDB, do some
>> modification and update the records in DB. If there is no option of passing
>> sparkContext to workers to read.write into DB, the only option is to use
>> CassandraConnextor.withSession ???? If yes, for writing to table, should i
>> construct the entire INSERT statement for thousands of fields in the DB ?
>> Is this way of writing code is an optimized way ???
>>
>> On Tue, Sep 22, 2015 at 1:32 AM, Romi Kuntsman <r...@totango.com> wrote:
>>
>>> Cody, that's a great reference!
>>> As shown there - the best way to connect to an external database from
>>> the workers is to create a connection pool on (each) worker.
>>> The driver mass pass, via broadcast, the connection string, but not the
>>> connect object itself and not the spark context.
>>>
>>> On Mon, Sep 21, 2015 at 5:31 PM Cody Koeninger <c...@koeninger.org>
>>> wrote:
>>>
>>>> That isn't accurate, I think you're confused about foreach.
>>>>
>>>> Look at
>>>>
>>>>
>>>> http://spark.apache.org/docs/latest/streaming-programming-guide.html#design-patterns-for-using-foreachrdd
>>>>
>>>>
>>>> On Mon, Sep 21, 2015 at 7:36 AM, Romi Kuntsman <r...@totango.com>
>>>> wrote:
>>>>
>>>>> foreach is something that runs on the driver, not the workers.
>>>>>
>>>>> if you want to perform some function on each record from cassandra,
>>>>> you need to do cassandraRdd.map(func), which will run distributed on the
>>>>> spark workers
>>>>>
>>>>> *Romi Kuntsman*, *Big Data Engineer*
>>>>> http://www.totango.com
>>>>>
>>>>> On Mon, Sep 21, 2015 at 3:29 PM, Priya Ch <
>>>>> learnings.chitt...@gmail.com> wrote:
>>>>>
>>>>>> Yes, but i need to read from cassandra db within a spark
>>>>>> transformation..something like..
>>>>>>
>>>>>> dstream.forachRDD{
>>>>>>
>>>>>> rdd=> rdd.foreach {
>>>>>>  message =>
>>>>>>      sc.cassandraTable()
>>>>>>       .
>>>>>>       .
>>>>>>       .
>>>>>>     }
>>>>>> }
>>>>>>
>>>>>> Since rdd.foreach gets executed on workers, how can i make
>>>>>> sparkContext available on workers ???
>>>>>>
>>>>>> Regards,
>>>>>> Padma Ch
>>>>>>
>>>>>> On Mon, Sep 21, 2015 at 5:10 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>>>
>>>>>>> You can use broadcast variable for passing connection information.
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>> On Sep 21, 2015, at 4:27 AM, Priya Ch <learnings.chitt...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> can i use this sparkContext on executors ??
>>>>>>> In my application, i have scenario of reading from db for certain
>>>>>>> records in rdd. Hence I need sparkContext to read from DB (cassandra in 
>>>>>>> our
>>>>>>> case),
>>>>>>>
>>>>>>> If sparkContext couldn't be sent to executors , what is the
>>>>>>> workaround for this ??????
>>>>>>>
>>>>>>> On Mon, Sep 21, 2015 at 3:06 PM, Petr Novak <oss.mli...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> add @transient?
>>>>>>>>
>>>>>>>> On Mon, Sep 21, 2015 at 11:27 AM, Priya Ch <
>>>>>>>> learnings.chitt...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hello All,
>>>>>>>>>
>>>>>>>>>     How can i pass sparkContext as a parameter to a method in an
>>>>>>>>> object. Because passing sparkContext is giving me TaskNotSerializable
>>>>>>>>> Exception.
>>>>>>>>>
>>>>>>>>> How can i achieve this ?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Padma Ch
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to spark-connector-user+unsubscr...@lists.datastax.com.
>>
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spark-connector-user+unsubscr...@lists.datastax.com.
>

Re: passing SparkContext as parameter

Reply via email to