Folks,

I have a time series table with each record being 350 columns.

the primary key is  ((date, bucket), objectid, timestamp)
objective is to  read 1 day worth of data, which comes to around 12k
partitions, each partition has around 25MB of data,
I see only 1 task active during the read operation, on a 5 node cluster, (8
cores each ),  does this mean not enough spark partitions are getting
created ?
i have also set the input.split.size_in_mb to a lower number. like 10 .
Any pointers in this regard would be helpful.,


Thanks,

Reply via email to