RE: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Prateek .
<bryan.jeff...@gmail.com> Cc: Prateek . <prat...@aricent.com>; user@spark.apache.org Subject: Re: Spark job for Reading time series data from Cassandra Hi, the spark connector docs say: (https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md) "Th

Re: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Matthias Niehoff
Hi, the spark connector docs say: ( https://github.com/datastax/spark-cassandra-connector/blob/master/doc/FAQ.md ) "The number of Spark partitions(tasks) created is directly controlled by the setting spark.cassandra.input.split.size_in_mb. This number reflects the approximate amount of Cassandra

Re: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Bryan Jeffrey
Prateek, I believe that one task is created per Cassandra partition. How is your data partitioned? Regards, Bryan Jeffrey On Thu, Mar 10, 2016 at 10:36 AM, Prateek . wrote: > Hi, > > > > I have a Spark Batch job for reading timeseries data from Cassandra which > has

Spark job for Reading time series data from Cassandra

2016-03-10 Thread Prateek .
Hi, I have a Spark Batch job for reading timeseries data from Cassandra which has 50,000 rows. JavaRDD cassandraRowsRDD = javaFunctions.cassandraTable("iotdata", "coordinate") .map(new Function() { @Override public