Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread Zoltán Zvara
I personally build with SBT and run Spark on YARN with IntelliJ. You need to connect to remote JVMs with a remote debugger. You also need to do similar, if you use Python, because it will launch a JVM on the driver aswell. On Wed, Aug 19, 2015 at 2:10 PM canan chen ccn...@gmail.com wrote:

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread canan chen
Thanks Ted. I notice another thread about running spark programmatically (client mode for standalone and yarn). Would it be much easier to debug spark if is is possible ? Hasn't anyone thought about it ? On Wed, Aug 19, 2015 at 5:50 PM, Ted Yu yuzhih...@gmail.com wrote: See this thread:

RE: Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Ratika Prasad
Should this be done on master or slave node or both ? From: Madhusudanan Kandasamy [mailto:madhusuda...@in.ibm.com] Sent: Wednesday, August 19, 2015 9:31 PM To: Ratika Prasad rpra...@couponsinc.com Cc: dev@spark.apache.org Subject: Re: Unable to run the spark application in standalone cluster

Re: Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Madhusudanan Kandasamy
Try Increasing the spark worker memory in conf/spark-env.sh export SPARK_WORKER_MEMORY=2g Thanks, Madhu. Ratika Prasad rprasad@couponsi

Creating RDD with key and Subkey

2015-08-19 Thread Ratika Prasad
Hi, We have a need where we need the RDD with the following format JavaPairRDDString,HashMapString,ListString, mostly RDD with a Key and Subkey kind of a structure, how is that doable in Spark ? Thanks R

Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Ratika Prasad
Hi , We have a simple spark application which is running through when run locally on master node as below ./bin/spark-submit --class com.coupons.salestransactionprocessor.SalesTransactionDataPointCreation --master local sales-transaction-processor-0.0.1-SNAPSHOT-jar-with-dependencies.jar

RE: Unable to run the spark application in standalone cluster mode

2015-08-19 Thread Madhusudanan Kandasamy
Slave nodes.. Thanks, Madhu. Ratika Prasad rprasad@couponsi nc.com

Re: Creating RDD with key and Subkey

2015-08-19 Thread Silas Davis
This should be sent to the user mailing list, I think. It depends what you want to do with the RDD, so yes you could throw around (String, HashMapString,ListString) tuples or perhaps you'd like to be able to groupByKey, reduceByKey on the key and sub-key as a composite in which case

Re: Creating RDD with key and Subkey

2015-08-19 Thread Ratika Prasad
We need to create RDDas below JavaPairRDDString,ListHashMapString,ListString The idea is we need to do lookup() on Key which will return a list of hash maps kind of structure and then do lookup on subkey which is the key in the HashMap returned _ From: Silas

Re: [spark-csv] how to build with Hadoop 2.6.0?

2015-08-19 Thread Mohit Jaggi
spark-csv should not depend on hadoop On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik g...@il.ibm.com wrote: I would like to build spark-csv with Hadoop 2.6.0 I noticed that when i build it with sbt/sbt ++2.10.4 package it build it with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2

Re: Creating RDD with key and Subkey

2015-08-19 Thread Ranjana Rajendran
Hi Ratika, I tried the following: val l = List(apple, orange, banana) var inner = new scala.collection.mutable.HashMap[String, List[String]] inner.put(fruits,l) var list = new scala.collection.mutable.HashMap[String, scala.collection.mutable.HashMap[String, List[String]]] list.put(food,

Re: [spark-csv] how to build with Hadoop 2.6.0?

2015-08-19 Thread Gil Vernik
It shouldn't? This one com.databricks.spark.csv.util.TextFile has hadoop imports. I figured out that the answer to my question is just to add libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0. But i still wonder where is this 2.2.0 default comes from. From: Mohit Jaggi

Re: What's the best practice for developing new features for spark ?

2015-08-19 Thread Ted Yu
See this thread: http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+modulesubj=Building+Spark+Building+just+one+module+ On Aug 19, 2015, at 1:44 AM, canan chen ccn...@gmail.com wrote: I want to work on one jira, but it is not easy to do unit test, because it involves different

What's the best practice for developing new features for spark ?

2015-08-19 Thread canan chen
I want to work on one jira, but it is not easy to do unit test, because it involves different components especially UI. spark building is pretty slow, I don't want to build it each time to test my code change. I am wondering how other people do ? Is there any experience can share ? Thanks