I personally build with SBT and run Spark on YARN with IntelliJ. You need
to connect to remote JVMs with a remote debugger. You also need to do
similar, if you use Python, because it will launch a JVM on the driver
aswell.
On Wed, Aug 19, 2015 at 2:10 PM canan chen ccn...@gmail.com wrote:
Thanks Ted. I notice another thread about running spark programmatically
(client mode for standalone and yarn). Would it be much easier to debug
spark if is is possible ? Hasn't anyone thought about it ?
On Wed, Aug 19, 2015 at 5:50 PM, Ted Yu yuzhih...@gmail.com wrote:
See this thread:
Should this be done on master or slave node or both ?
From: Madhusudanan Kandasamy [mailto:madhusuda...@in.ibm.com]
Sent: Wednesday, August 19, 2015 9:31 PM
To: Ratika Prasad rpra...@couponsinc.com
Cc: dev@spark.apache.org
Subject: Re: Unable to run the spark application in standalone cluster
Try Increasing the spark worker memory in conf/spark-env.sh
export SPARK_WORKER_MEMORY=2g
Thanks,
Madhu.
Ratika Prasad
rprasad@couponsi
Hi,
We have a need where we need the RDD with the following format
JavaPairRDDString,HashMapString,ListString, mostly RDD with a Key and
Subkey kind of a structure, how is that doable in Spark ?
Thanks
R
Hi ,
We have a simple spark application which is running through when run locally on
master node as below
./bin/spark-submit --class
com.coupons.salestransactionprocessor.SalesTransactionDataPointCreation
--master local
sales-transaction-processor-0.0.1-SNAPSHOT-jar-with-dependencies.jar
Slave nodes..
Thanks,
Madhu.
Ratika Prasad
rprasad@couponsi
nc.com
This should be sent to the user mailing list, I think.
It depends what you want to do with the RDD, so yes you could throw around
(String, HashMapString,ListString) tuples or perhaps you'd like to be
able to groupByKey, reduceByKey on the key and sub-key as a composite in
which case
We need to create RDDas below
JavaPairRDDString,ListHashMapString,ListString
The idea is we need to do lookup() on Key which will return a list of hash maps
kind of structure and then do lookup on subkey which is the key in the HashMap
returned
_
From: Silas
spark-csv should not depend on hadoop
On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik g...@il.ibm.com wrote:
I would like to build spark-csv with Hadoop 2.6.0
I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2
Hi Ratika,
I tried the following:
val l = List(apple, orange, banana)
var inner = new scala.collection.mutable.HashMap[String, List[String]]
inner.put(fruits,l)
var list = new scala.collection.mutable.HashMap[String,
scala.collection.mutable.HashMap[String, List[String]]]
list.put(food,
It shouldn't?
This one com.databricks.spark.csv.util.TextFile has hadoop imports.
I figured out that the answer to my question is just to add
libraryDependencies += org.apache.hadoop % hadoop-client % 2.6.0.
But i still wonder where is this 2.2.0 default comes from.
From: Mohit Jaggi
See this thread:
http://search-hadoop.com/m/q3RTtdZv0d1btRHl/Spark+build+modulesubj=Building+Spark+Building+just+one+module+
On Aug 19, 2015, at 1:44 AM, canan chen ccn...@gmail.com wrote:
I want to work on one jira, but it is not easy to do unit test, because it
involves different
I want to work on one jira, but it is not easy to do unit test, because it
involves different components especially UI. spark building is pretty slow,
I don't want to build it each time to test my code change. I am wondering
how other people do ? Is there any experience can share ? Thanks
14 matches
Mail list logo