Re: spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table

Yana Kadiyska Wed, 17 Jun 2015 04:53:27 -0700

Can you show some code how you're doing the reads? Have you successfully
read other stuff from Cassandra (i.e. do you have a lot of experience with
this path and this particular table is causing issues or are you trying to
figure out the right way to do a read).


What version of Spark and Cassandra-connector are you using?
Also, what do you get for "select count(*) from foo" -- is that just as bad?

On Wed, Jun 17, 2015 at 4:37 AM, Serega Sheypak <serega.shey...@gmail.com>
wrote:

> Hi, can somebody suggest me the way to reduce quantity of task?
>
> 2015-06-15 18:26 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>:
>
>> Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes,
>> Each of them has spark worker.
>> The problem is that spark runs 869 task to read 3 lines: select bar from
>> foo.
>> I've tried these properties:
>>
>> #try to avoid 769 tasks per dummy select foo from bar qeury
>> spark.cassandra.input.split.size_in_mb=32mb
>> spark.cassandra.input.fetch.size_in_rows=1000
>> spark.cassandra.input.split.size=10000
>>
>> but it doesn't help.
>>
>> Here are  mean metrics for the job :
>> input1= 8388608.0 TB
>> input2 = -320 B
>> input3 = -400 B
>>
>> I'm confused with input, there are only 3 rows in C* table.
>> Definitely, I don't have 8388608.0 TB of data :)
>>
>>
>>
>>
>

Re: spark sql and cassandra. spark generate 769 tasks to read 3 lines from cassandra table

Reply via email to