Spark: Scala Shell Very Slow (Unresponsive)

2017-02-02 Thread jimitkr
Friends, After i launch spark-shell, the default Scala shell appears but is unresponsive. When i type any command on the shell, nothing appears on my screen shell is completely unresponsive. My server has 32 gigs of memory and approx 18 GB is empty after launching spark-shell, so it may

Spark master takes more time with local[8] than local[1]

2016-01-24 Thread jimitkr
Hi All, I have a machine with the following configuration: 32 GB RAM 500 GB HDD 8 CPUs Following are the parameters i'm starting my Spark context with: val conf = new SparkConf().setAppName("MasterApp").setMaster("local[1]").set("spark.executor.memory", "20g") I'm reading a 4.3 GB file and

Calculate sum of values in 2nd element of tuple

2016-01-03 Thread jimitkr
Hi, I've created tuples of type (String, List[Int]) and want to sum the values in the List[Int] part, i.e. the 2nd element in each tuple. Here is my list / val input=sc.parallelize(List(("abc",List(1,2,3,4)),("def",List(5,6,7,8/ I want to sum up values in the 2nd element of the tuple so

Re: Cannot get repartitioning to work

2016-01-02 Thread jimitkr
Thanks. Repartitioning works now. Thread closed :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Cannot-get-repartitioning-to-work-tp25852p25858.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: How to save only values via saveAsHadoopFile or saveAsNewAPIHadoopFile

2016-01-01 Thread jimitkr
Doesn't this work?pair.values.saveAsHadoopFile() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-save-only-values-via-saveAsHadoopFile-or-saveAsNewAPIHadoopFile-tp25828p25853.html Sent from the Apache Spark User List mailing list archive at

Cannot get repartitioning to work

2016-01-01 Thread jimitkr
Hi, I'm trying to test some custom parallelism and repartitioning in spark. First, i reduce my RDD (forcing creation of 10 partitions for the same). I then repartition the data to 20 partitions and print out the number of partitions, but i always get 10. Looks like the repartition command is