spark.kryo.classesToRegister

2016-01-27 Thread amit tewari
This is what I have added in my code: rdd.persist(StorageLevel.MEMORY_ONLY_SER()) conf.set("spark.serializer","org.apache.spark.serializer.KryoSerializer"); Do I compulsorily need to do anything via : spark.kryo.classesToRegister? Or the above code sufficient to achieve performance gain

JavaPairRDD.treeAggregate

2015-11-25 Thread amit tewari
Hi, does someone has experience/knowledge on using JavaPairRDD.treeAggregate? Even sample code will be helpful. Not many articles etc. available on web. Thanks Amit

Re: Programatically create RDDs based on input

2015-11-02 Thread amit tewari
.. > ; > > Natu > > > On Sat, Oct 31, 2015 at 11:18 PM, ayan guha <guha.a...@gmail.com> wrote: > >> My java knowledge is limited, but you may try with a hashmap and put RDDs >> in it? >> >> On Sun, Nov 1, 2015 at 4:34 AM, amit tewari <amittewar...@

Programatically create RDDs based on input

2015-10-31 Thread amit tewari
Hi I need the ability to be able to create RDDs programatically inside my program (e.g. based on varaible number of input files). Can this be done? I need this as I want to run the following statement inside an iteration: JavaRDD rdd1 = jsc.textFile("/file1.txt"); Thanks Amit

Re: Programatically create RDDs based on input

2015-10-31 Thread amit tewari
om> wrote: > >> Yes, this can be done. quick python equivalent: >> >> # In Driver >> fileList=["/file1.txt","/file2.txt"] >> rdd = [] >> for f in fileList: >> rdd = jsc.textFile(f) >> rdds.append(rdd) >> >&

How to run scala script in Datastax Spark distribution?

2015-06-11 Thread amit tewari
Hi I am struggling to find how to run a scala script on Datastax Spark. (SPARK_HOME/bin/spark-shell -i test.scala is depricated) I dont want to use the scala prompt. Thanks AT

Re: Spark error value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]

2015-06-09 Thread amit tewari
at 1:54 PM, amit tewari amittewar...@gmail.com wrote: Actually the question was will keyBy() take accept multiple fields (eg x(0), x(1)) as Key? On Tue, Jun 9, 2015 at 1:07 PM, amit tewari amittewar...@gmail.com wrote: Thanks Akhil, as you suggested, I have to go keyBy(route) as need

Re: Spark error value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]

2015-06-09 Thread amit tewari
Actually the question was will keyBy() take accept multiple fields (eg x(0), x(1)) as Key? On Tue, Jun 9, 2015 at 1:07 PM, amit tewari amittewar...@gmail.com wrote: Thanks Akhil, as you suggested, I have to go keyBy(route) as need the columns intact. But wil keyBy() take accept multiple

Re: Spark error value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]

2015-06-09 Thread amit tewari
basically requires RDD[K,V] and in your case its ((String, String), String, String). You can also look in keyBy if you don't want to concatenate your keys. Thanks Best Regards On Tue, Jun 9, 2015 at 10:14 AM, amit tewari amittewar...@gmail.com wrote: Hi Dear Spark Users I am very new to Spark

Spark error value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]

2015-06-08 Thread amit tewari
Hi Dear Spark Users I am very new to Spark/Scala. Am using Datastax (4.7/Spark 1.2.1) and struggling with following error/issue. Already tried options like import org.apache.spark.SparkContext._ or explicit import org.apache.spark.SparkContext.rddToPairRDDFunctions. But error not resolved.

Re: Spark error value join is not a member of org.apache.spark.rdd.RDD[((String, String), String, String)]

2015-06-08 Thread amit tewari
, amit tewari amittewar...@gmail.com wrote: Hi Dear Spark Users I am very new to Spark/Scala. Am using Datastax (4.7/Spark 1.2.1) and struggling with following error/issue. Already tried options like import org.apache.spark.SparkContext._ or explicit import

Re: Hadoop 2.X Spark Client Jar 0.9.0 problem

2014-04-04 Thread Amit Tewari
I believe you got to set following SPARK_HADOOP_VERSION=2.2.0 (or whatever your version is) SPARK_YARN=true then type sbt/sbt assembly If you are using Maven to compile mvn -Pyarn -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 -DskipTests clean package Hope this helps -A On Fri, Apr 4, 2014