subject:"Apache Spark or Spark\-Cassandra\-Connector doesnt look like it is reading multiple partitions in parallel."

Re: Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel.

2016-11-26 Thread kant kodali

https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/DataFrameReader.html#json(org.apache.spark.rdd.RDD) You can pass a rdd to spark.read.json. // Spark here is SparkSession Also it works completely fine with smaller dataset in a table but with 1B records it takes forever and more

Re: Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel.

2016-11-26 Thread Anastasios Zouzias

Hi there, spark.read.json usually takes a filesystem path (usually HDFS) where there is a file containing JSON per new line. See also http://spark.apache.org/docs/latest/sql-programming-guide.html Hence, in your case val df4 = spark.read.json(rdd) // This line takes forever seems wrong. I

Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel.

2016-11-26 Thread kant kodali

up vote down votefavorite Apache Spark or Spark-Cassandra-Connector doesnt look like it is reading multiple partitions in parallel. Here is my code using