Can you show some code how you're doing the reads? Have you successfully read other stuff from Cassandra (i.e. do you have a lot of experience with this path and this particular table is causing issues or are you trying to figure out the right way to do a read).
What version of Spark and Cassandra-connector are you using? Also, what do you get for "select count(*) from foo" -- is that just as bad? On Wed, Jun 17, 2015 at 4:37 AM, Serega Sheypak <serega.shey...@gmail.com> wrote: > Hi, can somebody suggest me the way to reduce quantity of task? > > 2015-06-15 18:26 GMT+02:00 Serega Sheypak <serega.shey...@gmail.com>: > >> Hi, I'm running spark sql against Cassandra table. I have 3 C* nodes, >> Each of them has spark worker. >> The problem is that spark runs 869 task to read 3 lines: select bar from >> foo. >> I've tried these properties: >> >> #try to avoid 769 tasks per dummy select foo from bar qeury >> spark.cassandra.input.split.size_in_mb=32mb >> spark.cassandra.input.fetch.size_in_rows=1000 >> spark.cassandra.input.split.size=10000 >> >> but it doesn't help. >> >> Here are mean metrics for the job : >> input1= 8388608.0 TB >> input2 = -320 B >> input3 = -400 B >> >> I'm confused with input, there are only 3 rows in C* table. >> Definitely, I don't have 8388608.0 TB of data :) >> >> >> >> >