Spark hangs on collect (stuck on scheduler delay)
Hi, I'm building a spark application in which I load some data from an Elasticsearch cluster (using latest elasticsearch-hadoop connector) and continue to perform some calculations on the spark cluster. In one case, I use collect on the RDD as soon as it is created (loaded from ES). However, it is sometimes hangs on one (and sometimes more) node and doesn't continue. In the web UI, I can see that one node is stuck on scheduler delay and prevents from the job to continue, (while others have finished). Do you have any idea what is going on here? The data that is being loaded is fairly small, and only gets mapped once to domain objects before being collected. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-hangs-on-collect-stuck-on-scheduler-delay-tp24283.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: TestSQLContext compilation error when run SparkPi in Intellij ?
Thanks Andrew. On Sun, Aug 16, 2015 at 1:53 PM, Andrew Or and...@databricks.com wrote: Hi Canan, TestSQLContext is no longer a singleton but now a class. It is never meant to be a fully public API, but if you wish to use it you can just instantiate a new one: val sqlContext = new TestSQLContext or just create a new SQLContext from a SparkContext. -Andrew 2015-08-15 20:33 GMT-07:00 canan chen ccn...@gmail.com: I am not sure other people's spark debugging environment ( I mean for the master branch) , Anyone can share his experience ? On Sun, Aug 16, 2015 at 10:40 AM, canan chen ccn...@gmail.com wrote: I import the spark source code to intellij, and want to run SparkPi in intellij, but meet the folliwing weird compilation error? I googled it and sbt clean doesn't work for me. I am not sure whether anyone else has meet this issue also, any help is appreciated Error:scalac: while compiling: /Users/root/github/spark/sql/core/src/main/scala/org/apache/spark/sql/test/TestSQLContext.scala during phase: jvm library version: version 2.10.4 compiler version: version 2.10.4 reconstructed args: -nobootcp -javabootclasspath : -deprecation -feature -classpath
Apache Spark - Parallel Processing of messages from Kafka - Java
JavaPairReceiverInputDStreamString, byte[] messages = KafkaUtils.createStream(...); JavaPairDStreamString, byte[] filteredMessages = filterValidMessages(messages); JavaDStreamString useCase1 = calculateUseCase1(filteredMessages); JavaDStreamString useCase2 = calculateUseCase2(filteredMessages); JavaDStreamString useCase3 = calculateUseCase3(filteredMessages); JavaDStreamString useCase4 = calculateUseCase4(filteredMessages); ... I retrieve messages from Kafka, filter that and use the same messages for mutiple use-cases. Here useCase1 to 4 are independent of each other and can be calculated parallely. However, when i look at the logs, i see that calculations are happening sequentially. How can i make them to run parallely. Any suggestion would be helpful -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Parallel-Processing-of-messages-from-Kafka-Java-tp24284.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Error: Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
try --jars rather than --class to submit jar. On Fri, Aug 14, 2015 at 6:19 AM, Stephen Boesch java...@gmail.com wrote: The NoClassDefFoundException differs from ClassNotFoundException : it indicates an error while initializing that class: but the class is found in the classpath. Please provide the full stack trace. 2015-08-14 4:59 GMT-07:00 stelsavva stel...@avocarrot.com: Hello, I am just starting out with spark streaming and Hbase/hadoop, i m writing a simple app to read from kafka and store to Hbase, I am having trouble submitting my job to spark. I 've downloaded Apache Spark 1.4.1 pre-build for hadoop 2.6 I am building the project with mvn package and submitting the jar file with ~/Desktop/spark/bin/spark-submit --class org.example.main.scalaConsumer scalConsumer-0.0.1-SNAPSHOT.jar And then i am getting the error you see in the subject line. Is this a problem with my maven dependencies? do i need to install hadoop locally? And if so how can i add the hadoop classpath to the spark job? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-Exception-in-thread-main-java-lang-NoClassDefFoundError-org-apache-hadoop-hbase-HBaseConfiguran-tp24266.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark cant fetch the added jar to http server
Hi I have been trying to run standalone application using spark-submit but somehow spark started the http server and added jar file to it but it is unable to fetch the jar file. I am running the spark-cluster on localhost. If anyone can help me to find what i am missing here, thanks in advance. LOGS: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/08/16 15:20:52 INFO SparkContext: Running Spark version 1.4.1 15/08/16 15:20:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/08/16 15:20:53 INFO SecurityManager: Changing view acls to: manvendratomar 15/08/16 15:20:53 INFO SecurityManager: Changing modify acls to: manvendratomar 15/08/16 15:20:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(manvendratomar); users with modify permissions: Set(manvendratomar) 15/08/16 15:20:53 INFO Slf4jLogger: Slf4jLogger started 15/08/16 15:20:53 INFO Remoting: Starting remoting 15/08/16 15:20:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.133.201.51:63985] 15/08/16 15:20:54 INFO Utils: Successfully started service 'sparkDriver' on port 63985. 15/08/16 15:20:54 INFO SparkEnv: Registering MapOutputTracker 15/08/16 15:20:54 INFO SparkEnv: Registering BlockManagerMaster 15/08/16 15:20:54 INFO DiskBlockManager: Created local directory at /private/var/folders/6z/2j2qj3rn3z32tymm_9ybz960gn/T/spark-bd1c72c8-5d27-4751-ae6f-f298451e4f66/blockmgr-ed753ee2-726e-45ae-97a2-0c82923262ef 15/08/16 15:20:54 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 15/08/16 15:20:54 INFO HttpFileServer: HTTP File server directory is /private/var/folders/6z/2j2qj3rn3z32tymm_9ybz960gn/T/spark-bd1c72c8-5d27-4751-ae6f-f298451e4f66/httpd-68c0afb2-92ef-4ddd-9ffc-639ccac81d6a 15/08/16 15:20:54 INFO HttpServer: Starting HTTP Server 15/08/16 15:20:54 INFO Utils: Successfully started service 'HTTP file server' on port 63986. 15/08/16 15:20:54 INFO SparkEnv: Registering OutputCommitCoordinator 15/08/16 15:20:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/08/16 15:20:54 INFO SparkUI: Started SparkUI at http://10.133.201.51:4040 15/08/16 15:20:54 INFO SparkContext: Added JAR target/scala-2.11/spark_matrix_2.11-1.0.jar at http://10.133.201.51:63986/jars/spark_matrix_2.11-1.0.jar with timestamp 1439718654603 15/08/16 15:20:54 INFO Executor: Starting executor ID driver on host localhost 15/08/16 15:20:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63987. 15/08/16 15:20:54 INFO NettyBlockTransferService: Server created on 63987 15/08/16 15:20:54 INFO BlockManagerMaster: Trying to register BlockManager 15/08/16 15:20:54 INFO BlockManagerMasterEndpoint: Registering block manager localhost:63987 with 265.4 MB RAM, BlockManagerId(driver, localhost, 63987) 15/08/16 15:20:54 INFO BlockManagerMaster: Registered BlockManager 15/08/16 15:20:55 INFO MemoryStore: ensureFreeSpace(157248) called with curMem=0, maxMem=278302556 15/08/16 15:20:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 265.3 MB) 15/08/16 15:20:55 INFO MemoryStore: ensureFreeSpace(14257) called with curMem=157248, maxMem=278302556 15/08/16 15:20:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 265.2 MB) 15/08/16 15:20:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:63987 (size: 13.9 KB, free: 265.4 MB) 15/08/16 15:20:55 INFO SparkContext: Created broadcast 0 from textFile at partition.scala:20 15/08/16 15:20:56 INFO FileInputFormat: Total input paths to process : 1 15/08/16 15:20:56 INFO SparkContext: Starting job: reduce at IndexedRowMatrix.scala:65 15/08/16 15:20:56 INFO DAGScheduler: Got job 0 (reduce at IndexedRowMatrix.scala:65) with 1 output partitions (allowLocal=false) 15/08/16 15:20:56 INFO DAGScheduler: Final stage: ResultStage 0(reduce at IndexedRowMatrix.scala:65) 15/08/16 15:20:56 INFO DAGScheduler: Parents of final stage: List() 15/08/16 15:20:56 INFO DAGScheduler: Missing parents: List() 15/08/16 15:20:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[6] at map at IndexedRowMatrix.scala:65), which has no missing parents 15/08/16 15:20:56 INFO MemoryStore: ensureFreeSpace(4064) called with curMem=171505, maxMem=278302556 15/08/16 15:20:56 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 265.2 MB) 15/08/16 15:20:56 INFO MemoryStore: ensureFreeSpace(2249) called with curMem=175569, maxMem=278302556 15/08/16 15:20:56 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 265.2 MB) 15/08/16 15:20:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:63987 (size: 2.2 KB, free: 265.4 MB) 15/08/16 15:20:56 INFO SparkContext:
Spark on scala 2.11 build fails due to incorrect jline dependency in REPL
I am building spark with the following options - most notably the **scala-2.11**: . dev/switch-to-scala-2.11.sh mvn -Phive -Pyarn -Phadoop-2.6 -Dhadoop2.6.2 -Pscala-2.11 -DskipTests -Dmaven.javadoc.skip=true clean package The build goes pretty far but fails in one of the minor modules *repl*: [INFO] [ERROR] Failed to execute goal on project spark-repl_2.11: Could not resolve dependencies for project org.apache.spark:spark-repl_2.11:jar:1.5.0-SNAPSHOT: Could not find artifact org.scala-lang:jline:jar:2.11.7 in central (https://repo1.maven.org/maven2) - [Help 1] Upon investigation - from 2.11.5 and later the scala version of jline is no longer required: they use the default jline distribution. And in fact the repl only shows dependency on jline for the 2.10.4 scala version: profile idscala-2.10/id activation propertyname!scala-2.11/name/property /activation properties scala.version2.10.4/scala.version scala.binary.version2.10/scala.binary.version jline.version${scala.version}/jline.version jline.groupidorg.scala-lang/jline.groupid /properties dependencyManagement dependencies dependency groupId${jline.groupid}/groupId artifactIdjline/artifactId version${jline.version}/version /dependency /dependencies /dependencyManagement /profile So then it is not clear why this error is occurring. Pointers appreciated.
SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq
Hi, I am trying to run SparkPi in Intellij and getting NoClassDefFoundError. Anyone else saw this issue before ? Exception in thread main java.lang.NoClassDefFoundError: scala/collection/Seq at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Caused by: java.lang.ClassNotFoundException: scala.collection.Seq at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 6 more Process finished with exit code 1 Thanks, Xiaohe
Re: Difference between Sort based and Hash based shuffle
I did check it out and although I did get a general understanding of the various classes used to implement Sort and Hash shuffles, however these slides lack details as to how they are implemented and why sort generally has better performance than hash On Sun, Aug 16, 2015 at 4:31 AM, Ravi Kiran ravikiranmag...@gmail.com wrote: Have a look at this presentation. http://www.slideshare.net/colorant/spark-shuffle-introduction . Can be of help to you. On Sat, Aug 15, 2015 at 1:42 PM, Muhammad Haseeb Javed 11besemja...@seecs.edu.pk wrote: What are the major differences between how Sort based and Hash based shuffle operate and what is it that cause Sort Shuffle to perform better than Hash? Any talks that discuss both shuffles in detail, how they are implemented and the performance gains ?
Spark executor lost because of time out even after setting quite long time out value 1000 seconds
Hi I have written Spark job which seems to be working fine for almost an hour and after that executor start getting lost because of timeout I see the following in log statement 15/08/16 12:26:46 WARN spark.HeartbeatReceiver: Removing executor 10 with no recent heartbeats: 1051638 ms exceeds timeout 100 ms I dont see any errors but I see above warning and because of it executor gets removed by YARN and I see Rpc client disassociated error and IOException connection refused and FetchFailedException After executor gets removed I see it is again getting added and starts working and some other executors fails again. My question is is it normal for executor getting lost? What happens to that task lost executors were working on? My Spark job keeps on running since it is long around 4-5 hours I have very good cluster with 1.2 TB memory and good no of CPU cores. To solve above time out issue I tried to increase time spark.akka.timeout to 1000 seconds but no luck. I am using the following command to run my Spark job Please guide I am new to Spark. I am using Spark 1.4.1. Thanks in advance. /spark-submit --class com.xyz.abc.MySparkJob --conf spark.executor.extraJavaOptions=-XX:MaxPermSize=512M --driver-java-options -XX:MaxPermSize=512m --driver-memory 4g --master yarn-client --executor-memory 25G --executor-cores 8 --num-executors 5 --jars /path/to/spark-job.jar -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-executor-lost-because-of-time-out-even-after-setting-quite-long-time-out-value-1000-seconds-tp24289.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Executors on multiple nodes
Hi Mohit, It depends on whether dynamic allocation is turned on. If not, the number of executors is specified by the user with the --num-executors option. If dynamic allocation is turned on, refer to the doc for details: https://spark.apache.org/docs/1.4.0/job-scheduling.html#dynamic-resource-allocation . -Sandy On Sat, Aug 15, 2015 at 6:40 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I am running on Yarn and do have a question on how spark runs executors on different data nodes. Is that primarily decided based on number of receivers? What do I need to do to ensure that multiple nodes are being used for data processing?
Re: SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq
Check module example's dependency (right click examples and click Open Modules Settings), by default scala-library is provided, you need to change it to compile to run SparkPi in Intellij. As I remember, you also need to change guava and jetty related library to compile too. On Mon, Aug 17, 2015 at 2:14 AM, xiaohe lan zombiexco...@gmail.com wrote: Hi, I am trying to run SparkPi in Intellij and getting NoClassDefFoundError. Anyone else saw this issue before ? Exception in thread main java.lang.NoClassDefFoundError: scala/collection/Seq at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140) Caused by: java.lang.ClassNotFoundException: scala.collection.Seq at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 6 more Process finished with exit code 1 Thanks, Xiaohe -- Best Regards Jeff Zhang
Re: Spark Master HA on YARN
To make it clear, Spark Standalone is similar to Yarn as a simple cluster management system. Spark Master --- Yarn Resource Manager Spark Worker --- Yarn Node Manager On Mon, Aug 17, 2015 at 4:59 AM, Ruslan Dautkhanov dautkha...@gmail.com wrote: There is no Spark master in YARN mode. It's standalone mode terminology. In YARN cluster mode, Spark's Application Master (Spark Driver runs in it) will be restarted automatically by RM up to yarn.resourcemanager.am.max-retries times (default is 2). -- Ruslan Dautkhanov On Fri, Jul 17, 2015 at 1:29 AM, Bhaskar Dutta bhas...@gmail.com wrote: Hi, Is Spark master high availability supported on YARN (yarn-client mode) analogous to https://spark.apache.org/docs/1.4.0/spark-standalone.html#high-availability ? Thanks Bhaskie -- Best Regards Jeff Zhang
Re: Spark can't fetch application jar after adding it to HTTP server
can you tell more about your environment. I understand you are running it on a single machine but is firewall enabled? On Sun, Aug 16, 2015 at 5:47 AM, t4ng0 manvendra.tom...@gmail.com wrote: Hi I am new to spark and trying to run standalone application using spark-submit. Whatever i could understood, from logs is that spark can't fetch the jar file after adding it to the http server. Do i need to configure proxy settings for spark too individually if it is a problem. Otherwise please help me, thanks in advance. PS: i am attaching logs here. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/08/16 15:20:52 INFO SparkContext: Running Spark version 1.4.1 15/08/16 15:20:53 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/08/16 15:20:53 INFO SecurityManager: Changing view acls to: manvendratomar 15/08/16 15:20:53 INFO SecurityManager: Changing modify acls to: manvendratomar 15/08/16 15:20:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(manvendratomar); users with modify permissions: Set(manvendratomar) 15/08/16 15:20:53 INFO Slf4jLogger: Slf4jLogger started 15/08/16 15:20:53 INFO Remoting: Starting remoting 15/08/16 15:20:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.133.201.51:63985] 15/08/16 15:20:54 INFO Utils: Successfully started service 'sparkDriver' on port 63985. 15/08/16 15:20:54 INFO SparkEnv: Registering MapOutputTracker 15/08/16 15:20:54 INFO SparkEnv: Registering BlockManagerMaster 15/08/16 15:20:54 INFO DiskBlockManager: Created local directory at /private/var/folders/6z/2j2qj3rn3z32tymm_9ybz960gn/T/spark-bd1c72c8-5d27-4751-ae6f-f298451e4f66/blockmgr-ed753ee2-726e-45ae-97a2-0c82923262ef 15/08/16 15:20:54 INFO MemoryStore: MemoryStore started with capacity 265.4 MB 15/08/16 15:20:54 INFO HttpFileServer: HTTP File server directory is /private/var/folders/6z/2j2qj3rn3z32tymm_9ybz960gn/T/spark-bd1c72c8-5d27-4751-ae6f-f298451e4f66/httpd-68c0afb2-92ef-4ddd-9ffc-639ccac81d6a 15/08/16 15:20:54 INFO HttpServer: Starting HTTP Server 15/08/16 15:20:54 INFO Utils: Successfully started service 'HTTP file server' on port 63986. 15/08/16 15:20:54 INFO SparkEnv: Registering OutputCommitCoordinator 15/08/16 15:20:54 INFO Utils: Successfully started service 'SparkUI' on port 4040. 15/08/16 15:20:54 INFO SparkUI: Started SparkUI at http://10.133.201.51:4040 15/08/16 15:20:54 INFO SparkContext: Added JAR target/scala-2.11/spark_matrix_2.11-1.0.jar at http://10.133.201.51:63986/jars/spark_matrix_2.11-1.0.jar with timestamp 1439718654603 15/08/16 15:20:54 INFO Executor: Starting executor ID driver on host localhost 15/08/16 15:20:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 63987. 15/08/16 15:20:54 INFO NettyBlockTransferService: Server created on 63987 15/08/16 15:20:54 INFO BlockManagerMaster: Trying to register BlockManager 15/08/16 15:20:54 INFO BlockManagerMasterEndpoint: Registering block manager localhost:63987 with 265.4 MB RAM, BlockManagerId(driver, localhost, 63987) 15/08/16 15:20:54 INFO BlockManagerMaster: Registered BlockManager 15/08/16 15:20:55 INFO MemoryStore: ensureFreeSpace(157248) called with curMem=0, maxMem=278302556 15/08/16 15:20:55 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.6 KB, free 265.3 MB) 15/08/16 15:20:55 INFO MemoryStore: ensureFreeSpace(14257) called with curMem=157248, maxMem=278302556 15/08/16 15:20:55 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 13.9 KB, free 265.2 MB) 15/08/16 15:20:55 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:63987 (size: 13.9 KB, free: 265.4 MB) 15/08/16 15:20:55 INFO SparkContext: Created broadcast 0 from textFile at partition.scala:20 15/08/16 15:20:56 INFO FileInputFormat: Total input paths to process : 1 15/08/16 15:20:56 INFO SparkContext: Starting job: reduce at IndexedRowMatrix.scala:65 15/08/16 15:20:56 INFO DAGScheduler: Got job 0 (reduce at IndexedRowMatrix.scala:65) with 1 output partitions (allowLocal=false) 15/08/16 15:20:56 INFO DAGScheduler: Final stage: ResultStage 0(reduce at IndexedRowMatrix.scala:65) 15/08/16 15:20:56 INFO DAGScheduler: Parents of final stage: List() 15/08/16 15:20:56 INFO DAGScheduler: Missing parents: List() 15/08/16 15:20:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[6] at map at IndexedRowMatrix.scala:65), which has no missing parents 15/08/16 15:20:56 INFO MemoryStore: ensureFreeSpace(4064) called with curMem=171505, maxMem=278302556 15/08/16 15:20:56 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.0 KB, free 265.2 MB) 15/08/16 15:20:56 INFO MemoryStore: ensureFreeSpace(2249) called with
Re: Spark Master HA on YARN
There is no Spark master in YARN mode. It's standalone mode terminology. In YARN cluster mode, Spark's Application Master (Spark Driver runs in it) will be restarted automatically by RM up to yarn.resourcemanager.am.max-retries times (default is 2). -- Ruslan Dautkhanov On Fri, Jul 17, 2015 at 1:29 AM, Bhaskar Dutta bhas...@gmail.com wrote: Hi, Is Spark master high availability supported on YARN (yarn-client mode) analogous to https://spark.apache.org/docs/1.4.0/spark-standalone.html#high-availability ? Thanks Bhaskie
Example code to spawn multiple threads in driver program
Hi I have Spark driver program which has one loop which iterates for around 2000 times and for two thousands times it executes jobs in YARN. Since loop will do the job serially I want to introduce parallelism If I create 2000 tasks/runnable/callable in my Spark driver program will it get executed in parallel in YARN cluster. Please guide it would be great if you can share example code where we can run multiple threads in driver program. I am new to Spark thanks in advance -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Example-code-to-spawn-multiple-threads-in-driver-program-tp24290.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Understanding the two jobs run with spark sql join
Hi,I have a basic spark sql join run in the local mode. I checked the UI,and see that there are two jobs are run. There DAG graph are pasted at the end. I have several questions here: 1. Looks that Job0 and Job1 all have the same DAG Stages, but the stage 3 and stage4 are skipped. I would ask what job 0 and job1 each do, why they have the same DAG graph and why stage3 and stage4 are skipped. 2. Job0 has only 5 tasks. What controls the number of tasks in the job0? 3. Job0 has 5 tasks and job1 has 199 tasks. I thought that the number of tasks of job1 are controlled by the ,which is 200 by default. And why it shows 199 here. Job0: Job1:
Re: Apache Spark - Parallel Processing of messages from Kafka - Java
In spark, every action (foreach, collect etc.) gets converted into a spark job and jobs are executed sequentially. You may want to refactor your code in calculateUseCase? to just run transformations (map, flatmap) and call a single action in the end. On Sun, Aug 16, 2015 at 3:19 PM, mohanaugust mohanaug...@gmail.com wrote: JavaPairReceiverInputDStreamString, byte[] messages = KafkaUtils.createStream(...); JavaPairDStreamString, byte[] filteredMessages = filterValidMessages(messages); JavaDStreamString useCase1 = calculateUseCase1(filteredMessages); JavaDStreamString useCase2 = calculateUseCase2(filteredMessages); JavaDStreamString useCase3 = calculateUseCase3(filteredMessages); JavaDStreamString useCase4 = calculateUseCase4(filteredMessages); ... I retrieve messages from Kafka, filter that and use the same messages for mutiple use-cases. Here useCase1 to 4 are independent of each other and can be calculated parallely. However, when i look at the logs, i see that calculations are happening sequentially. How can i make them to run parallely. Any suggestion would be helpful -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Apache-Spark-Parallel-Processing-of-messages-from-Kafka-Java-tp24284.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org