Spark hangs on collect (stuck on scheduler delay)

2015-08-16 Thread Sagi r
Hi, I'm building a spark application in which I load some data from an Elasticsearch cluster (using latest elasticsearch-hadoop connector) and continue to perform some calculations on the spark cluster. In one case, I use collect on the RDD as soon as it is created (loaded from ES). However, it

Re: TestSQLContext compilation error when run SparkPi in Intellij ?

2015-08-16 Thread canan chen
Thanks Andrew. On Sun, Aug 16, 2015 at 1:53 PM, Andrew Or and...@databricks.com wrote: Hi Canan, TestSQLContext is no longer a singleton but now a class. It is never meant to be a fully public API, but if you wish to use it you can just instantiate a new one: val sqlContext = new

Apache Spark - Parallel Processing of messages from Kafka - Java

2015-08-16 Thread mohanaugust
JavaPairReceiverInputDStreamString, byte[] messages = KafkaUtils.createStream(...); JavaPairDStreamString, byte[] filteredMessages = filterValidMessages(messages); JavaDStreamString useCase1 = calculateUseCase1(filteredMessages); JavaDStreamString useCase2 = calculateUseCase2(filteredMessages);

Re: Error: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration

2015-08-16 Thread Rishi Yadav
try --jars rather than --class to submit jar. On Fri, Aug 14, 2015 at 6:19 AM, Stephen Boesch java...@gmail.com wrote: The NoClassDefFoundException differs from ClassNotFoundException : it indicates an error while initializing that class: but the class is found in the classpath. Please

Spark cant fetch the added jar to http server

2015-08-16 Thread t4ng0
Hi I have been trying to run standalone application using spark-submit but somehow spark started the http server and added jar file to it but it is unable to fetch the jar file. I am running the spark-cluster on localhost. If anyone can help me to find what i am missing here, thanks in advance.

Spark on scala 2.11 build fails due to incorrect jline dependency in REPL

2015-08-16 Thread Stephen Boesch
I am building spark with the following options - most notably the **scala-2.11**: . dev/switch-to-scala-2.11.sh mvn -Phive -Pyarn -Phadoop-2.6 -Dhadoop2.6.2 -Pscala-2.11 -DskipTests -Dmaven.javadoc.skip=true clean package The build goes pretty far but fails in one of the minor modules

SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq

2015-08-16 Thread xiaohe lan
Hi, I am trying to run SparkPi in Intellij and getting NoClassDefFoundError. Anyone else saw this issue before ? Exception in thread main java.lang.NoClassDefFoundError: scala/collection/Seq at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at

Re: Difference between Sort based and Hash based shuffle

2015-08-16 Thread Muhammad Haseeb Javed
I did check it out and although I did get a general understanding of the various classes used to implement Sort and Hash shuffles, however these slides lack details as to how they are implemented and why sort generally has better performance than hash On Sun, Aug 16, 2015 at 4:31 AM, Ravi Kiran

Spark executor lost because of time out even after setting quite long time out value 1000 seconds

2015-08-16 Thread unk1102
Hi I have written Spark job which seems to be working fine for almost an hour and after that executor start getting lost because of timeout I see the following in log statement 15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10 with no recent heartbeats: 1051638 ms exceeds

Re: Executors on multiple nodes

2015-08-16 Thread Sandy Ryza
Hi Mohit, It depends on whether dynamic allocation is turned on. If not, the number of executors is specified by the user with the --num-executors option. If dynamic allocation is turned on, refer to the doc for details:

Re: SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq

2015-08-16 Thread Jeff Zhang
Check module example's dependency (right click examples and click Open Modules Settings), by default scala-library is provided, you need to change it to compile to run SparkPi in Intellij. As I remember, you also need to change guava and jetty related library to compile too. On Mon, Aug 17, 2015

Re: Spark Master HA on YARN

2015-08-16 Thread Jeff Zhang
To make it clear, Spark Standalone is similar to Yarn as a simple cluster management system. Spark Master --- Yarn Resource Manager Spark Worker --- Yarn Node Manager On Mon, Aug 17, 2015 at 4:59 AM, Ruslan Dautkhanov dautkha...@gmail.com wrote: There is no Spark master in YARN mode.

Re: Spark can't fetch application jar after adding it to HTTP server

2015-08-16 Thread Rishi Yadav
can you tell more about your environment. I understand you are running it on a single machine but is firewall enabled? On Sun, Aug 16, 2015 at 5:47 AM, t4ng0 manvendra.tom...@gmail.com wrote: Hi I am new to spark and trying to run standalone application using spark-submit. Whatever i could

Re: Spark Master HA on YARN

2015-08-16 Thread Ruslan Dautkhanov
There is no Spark master in YARN mode. It's standalone mode terminology. In YARN cluster mode, Spark's Application Master (Spark Driver runs in it) will be restarted automatically by RM up to yarn.resourcemanager.am.max-retries times (default is 2). -- Ruslan Dautkhanov On Fri, Jul 17, 2015 at

Example code to spawn multiple threads in driver program

2015-08-16 Thread unk1102
Hi I have Spark driver program which has one loop which iterates for around 2000 times and for two thousands times it executes jobs in YARN. Since loop will do the job serially I want to introduce parallelism If I create 2000 tasks/runnable/callable in my Spark driver program will it get executed

Understanding the two jobs run with spark sql join

2015-08-16 Thread Todd
Hi,I have a basic spark sql join run in the local mode. I checked the UI,and see that there are two jobs are run. There DAG graph are pasted at the end. I have several questions here: 1. Looks that Job0 and Job1 all have the same DAG Stages, but the stage 3 and stage4 are skipped. I would ask

Re: Apache Spark - Parallel Processing of messages from Kafka - Java

2015-08-16 Thread Hemant Bhanawat
In spark, every action (foreach, collect etc.) gets converted into a spark job and jobs are executed sequentially. You may want to refactor your code in calculateUseCase? to just run transformations (map, flatmap) and call a single action in the end. On Sun, Aug 16, 2015 at 3:19 PM, mohanaugust