Re: [External] Re: no stdout output from worker
Hi Ranjan, Whatever code is being passed as closure to spark operations like map, flatmap, filter etc are part of task All others are in driver. Thanks, Sourav On Mon, Mar 10, 2014 at 12:03 PM, Sen, Ranjan [USA] sen_ran...@bah.comwrote: Hi Patrick How do I know which part of the code is in the driver and which in task? The structure of my code is as below- Š Static boolean done=false; Š Public static void main(.. .. JavaRDDString lines = .. .. While (!done) { .. While (..) { JavaPairRDDInteger, ListInteger labs1 = labs.map (new PairFunctionŠ ); !! Here I have System.out.println (A) } // inner while !! Here I have System.out.println (B) If (Š) { Done = true; !! Also here some System.out.println (C) Break; } Else { If (Š) { !! More System.out.println (D) labs = labs.map(Š) ; } } } // outer while !! Even more System.out.println (E) } // main } //class I get the console outputs on the master for (B) and (E). I do not see any stdout in the worker node. I find the stdout and stderr in the spark/work/appid/0/. I see output in stderr but not in stdout. I do get all the outputs on the console when I run it in local mode. Sorry I am new and may be asking some naïve question but it is really confusing to me. Thanks for your help. Ranjan On 3/9/14, 10:50 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sen, Is your code in the driver code or inside one of the tasks? If it's in the tasks, the place you would expect these to be is in stdout file under spark/appid/work/[stdout/stderr]. Are you seeing at least stderr logs in that folder? If not then the tasks might not be running on the workers machines. If you see stderr but not stdout that's a bit of a puzzler since they both go through the same mechanism. - Patrick On Sun, Mar 9, 2014 at 2:32 PM, Sen, Ranjan [USA] sen_ran...@bah.com wrote: Hi I have some System.out.println in my Java code that is working ok in a local environment. But when I run the same code on a standalone mode in a EC2 cluster I do not see them at the worker stdout (in the worker node under spark location/work ) or at the driver console. Could you help me understand how do I troubleshoot? Thanks Ranjan -- Sourav Chandra Senior Software Engineer · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · sourav.chan...@livestream.com o: +91 80 4121 8723 m: +91 988 699 3746 skype: sourav.chandra Livestream Ajmera Summit, First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd Block, Koramangala Industrial Area, Bangalore 560034 www.livestream.com
Re: Streaming JSON string from REST Api in Spring
Thanks Mayur for your clarification. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Streaming-JSON-string-from-REST-Api-in-Spring-tp2358p2451.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
subscribe
hi
subscribe
hi
Re: subscribe
send this to 'user-request', not 'user' 2014-03-10 17:32 GMT+08:00 hequn cheng chenghe...@gmail.com: hi
Using flume to create stream for spark streaming.
Hey, I am using the following flume flow, Flume agent 1 consisting of Rabbitmq- source, files- channet, avro- sink sending data to a slave node of spark cluster. Flume agent 2, slave node of spark cluster, consisting of avro- source, files- channel, now for the sink i tried avro, hdfs, file_roll as sink but i am not able to read the DStream from any of these where for avro sink type, i am giving sink address as the same slave node and some other port and i am asking the spark streaming program to listen to slave node and the port of the sink defined in the conf of slave node. Thus spark streaming is giving me no result. I am running the program as java -jar jar on master of the cluster. What should be the sink type that should be used on the slave node? I have stuck on this since two weeks now and i am confused how to approach this. Any help? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Using-flume-to-create-stream-for-spark-streaming-tp2457.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: [External] Re: no stdout output from worker
Hi Sourav That makes so much sense. Thanks much. Ranjan From: Sourav Chandra sourav.chan...@livestream.commailto:sourav.chan...@livestream.com Reply-To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Date: Sunday, March 9, 2014 at 10:37 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: [External] Re: no stdout output from worker Hi Ranjan, Whatever code is being passed as closure to spark operations like map, flatmap, filter etc are part of task All others are in driver. Thanks, Sourav On Mon, Mar 10, 2014 at 12:03 PM, Sen, Ranjan [USA] sen_ran...@bah.commailto:sen_ran...@bah.com wrote: Hi Patrick How do I know which part of the code is in the driver and which in task? The structure of my code is as below- Š Static boolean done=false; Š Public static void main(.. .. JavaRDDString lines = .. .. While (!done) { .. While (..) { JavaPairRDDInteger, ListInteger labs1 = labs.map (new PairFunctionŠ ); !! Here I have System.out.println (A) } // inner while !! Here I have System.out.println (B) If (Š) { Done = true; !! Also here some System.out.println (C) Break; } Else { If (Š) { !! More System.out.println (D) labs = labs.map(Š) ; } } } // outer while !! Even more System.out.println (E) } // main } //class I get the console outputs on the master for (B) and (E). I do not see any stdout in the worker node. I find the stdout and stderr in the spark/work/appid/0/. I see output in stderr but not in stdout. I do get all the outputs on the console when I run it in local mode. Sorry I am new and may be asking some naïve question but it is really confusing to me. Thanks for your help. Ranjan On 3/9/14, 10:50 PM, Patrick Wendell pwend...@gmail.commailto:pwend...@gmail.com wrote: Hey Sen, Is your code in the driver code or inside one of the tasks? If it's in the tasks, the place you would expect these to be is in stdout file under spark/appid/work/[stdout/stderr]. Are you seeing at least stderr logs in that folder? If not then the tasks might not be running on the workers machines. If you see stderr but not stdout that's a bit of a puzzler since they both go through the same mechanism. - Patrick On Sun, Mar 9, 2014 at 2:32 PM, Sen, Ranjan [USA] sen_ran...@bah.commailto:sen_ran...@bah.com wrote: Hi I have some System.out.println in my Java code that is working ok in a local environment. But when I run the same code on a standalone mode in a EC2 cluster I do not see them at the worker stdout (in the worker node under spark location/work ) or at the driver console. Could you help me understand how do I troubleshoot? Thanks Ranjan -- Sourav Chandra Senior Software Engineer · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · sourav.chan...@livestream.commailto:sourav.chan...@livestream.com o: +91 80 4121 8723 m: +91 988 699 3746 skype: sourav.chandra Livestream Ajmera Summit, First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd Block, Koramangala Industrial Area, Bangalore 560034 www.livestream.comhttp://www.livestream.com/
Log Analyze
Hi Guys, Could anyone help me to understand this piece of log in red? Why is this happened? Thanks 14/03/10 16:55:20 INFO SparkContext: Starting job: first at NetworkWordCount.scala:87 14/03/10 16:55:20 INFO JobScheduler: Finished job streaming job 1394466892000 ms.0 from job set of time 1394466892000 ms 14/03/10 16:55:20 INFO JobScheduler: Total delay: 28.537 s for time 1394466892000 ms (execution: 4.479 s) 14/03/10 16:55:20 INFO JobScheduler: Starting job streaming job 1394466893000 ms.0 from job set of time 1394466893000 ms 14/03/10 16:55:20 INFO JobGenerator: Checkpointing graph for time 1394466892000 ms 14/03/10 16:55:20 INFO DStreamGraph: Updating checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO DStreamGraph: Updated checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO CheckpointWriter: Saving checkpoint for time 1394466892000 ms to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000' 14/03/10 16:55:20 INFO DAGScheduler: Registering RDD 496 (combineByKey at ShuffledDStream.scala:42) 14/03/10 16:55:20 INFO DAGScheduler: Got job 39 (first at NetworkWordCount.scala:87) with 1 output partitions (allowLocal=true) 14/03/10 16:55:20 INFO DAGScheduler: Final stage: Stage 77 (first at NetworkWordCount.scala:87) 14/03/10 16:55:20 INFO DAGScheduler: Parents of final stage: List(Stage 78) 14/03/10 16:55:20 INFO DAGScheduler: Missing parents: List(Stage 78) 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-1-1394466782400 on computer10.ant-net:34062 in memory (size: 5.9 MB, free: 502.2 MB) 14/03/10 16:55:20 INFO DAGScheduler: Submitting Stage 78 (MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42), which has no missing parents 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Added input-1-1394466816600 in memory on computer10.ant-net:34062 (size: 4.4 MB, free: 497.8 MB) 14/03/10 16:55:20 INFO DAGScheduler: Submitting 15 missing tasks from Stage 78 (MapPartitionsRDD[496] at combineByKey at ShuffledDStream.scala:42) 14/03/10 16:55:20 INFO TaskSchedulerImpl: Adding task set 78.0 with 15 tasks 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:9 as TID 539 on executor 2: computer1.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:9 as 4144 bytes in 1 ms 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:10 as TID 540 on executor 1: computer10.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:10 as 4144 bytes in 0 ms 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:11 as TID 541 on executor 0: computer11.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:11 as 4144 bytes in 0 ms 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394466874200 on computer1.ant-net:51406 in memory (size: 2.9 MB, free: 460.0 MB) 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-0-1394466874400 on computer1.ant-net:51406 in memory (size: 4.1 MB, free: 468.2 MB) 14/03/10 16:55:20 INFO TaskSetManager: Starting task 78.0:12 as TID 542 on executor 1: computer10.ant-net (PROCESS_LOCAL) 14/03/10 16:55:20 INFO TaskSetManager: Serialized task 78.0:12 as 4144 bytes in 1 ms 14/03/10 16:55:20 WARN TaskSetManager: Lost TID 540 (task 78.0:10) 14/03/10 16:55:20 INFO CheckpointWriter: Deleting hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000 14/03/10 16:55:20 INFO CheckpointWriter: Checkpoint for time 1394466892000 ms saved to file 'hdfs://computer8:54310/user/root/INPUT/checkpoint-1394466892000', took 3633 bytes and 93 ms 14/03/10 16:55:20 INFO DStreamGraph: Clearing checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO DStreamGraph: Cleared checkpoint data for time 1394466892000 ms 14/03/10 16:55:20 INFO BlockManagerMasterActor$BlockManagerInfo: Removed input-2-1394466789000 on computer11.ant-net:58332 in memory (size: 3.9 MB, free: 536.0 MB) 14/03/10 16:55:20 WARN TaskSetManager: Loss was due to java.lang.Exception java.lang.Exception: Could not compute split, block input-2-1394466794200 not found at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:45) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.UnionPartition.iterator(UnionRDD.scala:32) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:72) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) at
Unsubscribe
Room for rent in Aptos
Hello , My name is Arjun and i am 30 years old and I was inquiring about the room ad that you have put up on craigslist in Aptos. I am very much interested in the room and can move in pretty early . My annual income is around 105K and I am a software engineer working in the silicon valley for about three years now . I have a masters in Computer Science from UCI . I have generally being a resident of the Santa Cruz area for some time now even though i work in Palo Alto . I guess that tells something about my love for Santa Cruz . Please reply me at your earliest . I can also be reached at 6505759206. Regards Arjun
Re: Sbt Permgen
hey sandy, i think that pulreq is not relevant to the 0.9 branch i am using switching to java 7 for sbt/sbt test made it work. not sure why... On Sun, Mar 9, 2014 at 11:44 PM, Sandy Ryza sandy.r...@cloudera.com wrote: There was an issue related to this fixed recently: https://github.com/apache/spark/pull/103 On Sun, Mar 9, 2014 at 8:40 PM, Koert Kuipers ko...@tresata.com wrote: edit last line of sbt/sbt, after which i run: sbt/sbt test On Sun, Mar 9, 2014 at 10:24 PM, Sean Owen so...@cloudera.com wrote: How are you specifying these args? On Mar 9, 2014 8:55 PM, Koert Kuipers ko...@tresata.com wrote: i just checkout out the latest 0.9 no matter what java options i use in sbt/sbt (i tried -Xmx6G -XX:MaxPermSize=2000m -XX:ReservedCodeCacheSize=300m) i keep getting errors java.lang.OutOfMemoryError: PermGen space when running the tests. curiously i managed to run the tests with the default dependencies, but with cdh4.5.0 mr1 dependencies i always hit the dreaded Permgen space issue. Any suggestions?
RE: Pig on Spark
Hi Mayur,We are planning to upgrade our distribution MR1 MR2 (YARN) and the goal is to get SPROK set up next month. I will keep you posted. Can you please keep me informed about your progress as well. From: mayur.rust...@gmail.com Date: Mon, 10 Mar 2014 11:47:56 -0700 Subject: Re: Pig on Spark To: user@spark.apache.org Hi Sameer,Did you make any progress on this. My team is also trying it out would love to know some detail so progress. Mayur Rustagi Ph: +1 (760) 203 3257http://www.sigmoidanalytics.com@mayur_rustagi On Thu, Mar 6, 2014 at 2:20 PM, Sameer Tilak ssti...@live.com wrote: Hi Aniket,Many thanks! I will check this out. Date: Thu, 6 Mar 2014 13:46:50 -0800 Subject: Re: Pig on Spark From: aniket...@gmail.com To: user@spark.apache.org; tgraves...@yahoo.com There is some work to make this work on yarn at https://github.com/aniket486/pig. (So, compile pig with ant -Dhadoopversion=23) You can look at https://github.com/aniket486/pig/blob/spork/pig-spark to find out what sort of env variables you need (sorry, I haven't been able to clean this up- in-progress). There are few known issues with this, I will work on fixing them soon. Known issues-1. Limit does not work (spork-fix)2. Foreach requires to turn off schema-tuple-backend (should be a pig-jira)3. Algebraic udfs dont work (spork-fix in-progress) 4. Group by rework (to avoid OOMs)5. UDF Classloader issue (requires SPARK-1053, then you can put pig-withouthadoop.jar as SPARK_JARS in SparkContext along with udf jars) ~Aniket On Thu, Mar 6, 2014 at 1:36 PM, Tom Graves tgraves...@yahoo.com wrote: I had asked a similar question on the dev mailing list a while back (Jan 22nd). See the archives: http://mail-archives.apache.org/mod_mbox/spark-dev/201401.mbox/browser - look for spork. Basically Matei said: Yup, that was it, though I believe people at Twitter picked it up again recently. I’d suggest asking Dmitriy if you know him. I’ve seen interest in this from several other groups, and if there’s enough of it, maybe we can start another open source repo to track it. The work in that repo you pointed to was done over one week, and already had most of Pig’s operators working. (I helped out with this prototype over Twitter’s hack week.) That work also calls the Scala API directly, because it was done before we had a Java API; it should be easier with the Java one. Tom On Thursday, March 6, 2014 3:11 PM, Sameer Tilak ssti...@live.com wrote: Hi everyone, We are using to Pig to build our data pipeline. I came across Spork -- Pig on Spark at: https://github.com/dvryaboy/pig and not sure if it is still active. Can someone please let me know the status of Spork or any other effort that will let us run Pig on Spark? We can significantly benefit by using Spark, but we would like to keep using the existing Pig scripts. -- ...:::Aniket:::... Quetzalco@tl
Re: [BLOG] Spark on Cassandra w/ Calliope
We are happy that you found Calliope useful and glad we could help. *Founder CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Sat, Mar 8, 2014 at 2:18 AM, Brian O'Neill b...@alumni.brown.edu wrote: FWIW - I posted some notes to help people get started quickly with Spark on C*. http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html (tnx again to Rohit and team for all of their help) -brian -- Brian ONeill CTO, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Java example of using broadcast
Hi Patrick Yes I get it. I have a different question now - (changed the sub) Can anyone point me to a Java example of using broadcast variables? - Ranjan From: Patrick Wendell pwend...@gmail.commailto:pwend...@gmail.com Reply-To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Date: Monday, March 10, 2014 at 1:24 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: [External] Re: no stdout output from worker Hey Sen, Suarav is right, and I think all of your print statements are inside of the driver program rather than inside of a closure. How are you running your program (i.e. what do you run that starts this job)? Where you run the driver you should expect to see the output. - Patrick On Mon, Mar 10, 2014 at 8:56 AM, Sen, Ranjan [USA] sen_ran...@bah.commailto:sen_ran...@bah.com wrote: Hi Sourav That makes so much sense. Thanks much. Ranjan From: Sourav Chandra sourav.chan...@livestream.commailto:sourav.chan...@livestream.com Reply-To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Date: Sunday, March 9, 2014 at 10:37 PM To: user@spark.apache.orgmailto:user@spark.apache.org user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: [External] Re: no stdout output from worker Hi Ranjan, Whatever code is being passed as closure to spark operations like map, flatmap, filter etc are part of task All others are in driver. Thanks, Sourav On Mon, Mar 10, 2014 at 12:03 PM, Sen, Ranjan [USA] sen_ran...@bah.commailto:sen_ran...@bah.com wrote: Hi Patrick How do I know which part of the code is in the driver and which in task? The structure of my code is as below- Š Static boolean done=false; Š Public static void main(.. .. JavaRDDString lines = .. .. While (!done) { .. While (..) { JavaPairRDDInteger, ListInteger labs1 = labs.map (new PairFunctionŠ ); !! Here I have System.out.println (A) } // inner while !! Here I have System.out.println (B) If (Š) { Done = true; !! Also here some System.out.println (C) Break; } Else { If (Š) { !! More System.out.println (D) labs = labs.map(Š) ; } } } // outer while !! Even more System.out.println (E) } // main } //class I get the console outputs on the master for (B) and (E). I do not see any stdout in the worker node. I find the stdout and stderr in the spark/work/appid/0/. I see output in stderr but not in stdout. I do get all the outputs on the console when I run it in local mode. Sorry I am new and may be asking some naïve question but it is really confusing to me. Thanks for your help. Ranjan On 3/9/14, 10:50 PM, Patrick Wendell pwend...@gmail.commailto:pwend...@gmail.com wrote: Hey Sen, Is your code in the driver code or inside one of the tasks? If it's in the tasks, the place you would expect these to be is in stdout file under spark/appid/work/[stdout/stderr]. Are you seeing at least stderr logs in that folder? If not then the tasks might not be running on the workers machines. If you see stderr but not stdout that's a bit of a puzzler since they both go through the same mechanism. - Patrick On Sun, Mar 9, 2014 at 2:32 PM, Sen, Ranjan [USA] sen_ran...@bah.commailto:sen_ran...@bah.com wrote: Hi I have some System.out.println in my Java code that is working ok in a local environment. But when I run the same code on a standalone mode in a EC2 cluster I do not see them at the worker stdout (in the worker node under spark location/work ) or at the driver console. Could you help me understand how do I troubleshoot? Thanks Ranjan -- Sourav Chandra Senior Software Engineer · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · sourav.chan...@livestream.commailto:sourav.chan...@livestream.com o: +91 80 4121 8723tel:%2B91%2080%204121%208723 m: +91 988 699 3746tel:%2B91%20988%20699%203746 skype: sourav.chandra Livestream Ajmera Summit, First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd Block, Koramangala Industrial Area, Bangalore 560034 www.livestream.comhttp://www.livestream.com/
Re: [External] Re: no stdout output from worker
Hey Sen, Suarav is right, and I think all of your print statements are inside of the driver program rather than inside of a closure. How are you running your program (i.e. what do you run that starts this job)? Where you run the driver you should expect to see the output. - Patrick On Mon, Mar 10, 2014 at 8:56 AM, Sen, Ranjan [USA] sen_ran...@bah.comwrote: Hi Sourav That makes so much sense. Thanks much. Ranjan From: Sourav Chandra sourav.chan...@livestream.com Reply-To: user@spark.apache.org user@spark.apache.org Date: Sunday, March 9, 2014 at 10:37 PM To: user@spark.apache.org user@spark.apache.org Subject: Re: [External] Re: no stdout output from worker Hi Ranjan, Whatever code is being passed as closure to spark operations like map, flatmap, filter etc are part of task All others are in driver. Thanks, Sourav On Mon, Mar 10, 2014 at 12:03 PM, Sen, Ranjan [USA] sen_ran...@bah.comwrote: Hi Patrick How do I know which part of the code is in the driver and which in task? The structure of my code is as below- Š Static boolean done=false; Š Public static void main(.. .. JavaRDDString lines = .. .. While (!done) { .. While (..) { JavaPairRDDInteger, ListInteger labs1 = labs.map (new PairFunctionŠ ); !! Here I have System.out.println (A) } // inner while !! Here I have System.out.println (B) If (Š) { Done = true; !! Also here some System.out.println (C) Break; } Else { If (Š) { !! More System.out.println (D) labs = labs.map(Š) ; } } } // outer while !! Even more System.out.println (E) } // main } //class I get the console outputs on the master for (B) and (E). I do not see any stdout in the worker node. I find the stdout and stderr in the spark/work/appid/0/. I see output in stderr but not in stdout. I do get all the outputs on the console when I run it in local mode. Sorry I am new and may be asking some naïve question but it is really confusing to me. Thanks for your help. Ranjan On 3/9/14, 10:50 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Sen, Is your code in the driver code or inside one of the tasks? If it's in the tasks, the place you would expect these to be is in stdout file under spark/appid/work/[stdout/stderr]. Are you seeing at least stderr logs in that folder? If not then the tasks might not be running on the workers machines. If you see stderr but not stdout that's a bit of a puzzler since they both go through the same mechanism. - Patrick On Sun, Mar 9, 2014 at 2:32 PM, Sen, Ranjan [USA] sen_ran...@bah.com wrote: Hi I have some System.out.println in my Java code that is working ok in a local environment. But when I run the same code on a standalone mode in a EC2 cluster I do not see them at the worker stdout (in the worker node under spark location/work ) or at the driver console. Could you help me understand how do I troubleshoot? Thanks Ranjan -- Sourav Chandra Senior Software Engineer · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · sourav.chan...@livestream.com o: +91 80 4121 8723 m: +91 988 699 3746 skype: sourav.chandra Livestream Ajmera Summit, First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd Block, Koramangala Industrial Area, Bangalore 560034 www.livestream.com
computation slows down 10x because of cached RDDs
hello all, i am observing a strange result. i have a computation that i run on a cached RDD in spark-standalone. it typically takes about 4 seconds. but when other RDDs that are not relevant to the computation at hand are cached in memory (in same spark context), the computation takes 40 seconds or more. the problem seems to be GC time, which goes from milliseconds to tens of seconds. note that my issue is not that memory is full. i have cached about 14G in RDDs with 66G available across workers for the application. also my computation did not push any cached RDD out of memory. any ideas?
Re: computation slows down 10x because of cached RDDs
hey matei, it happens repeatedly. we are currently runnning on java 6 with spark 0.9. i will add -XX:+PrintGCDetails and collect details, and also look into java 7 G1. thanks On Mon, Mar 10, 2014 at 6:27 PM, Matei Zaharia matei.zaha...@gmail.comwrote: Does this happen repeatedly if you keep running the computation, or just the first time? It may take time to move these Java objects to the old generation the first time you run queries, which could lead to a GC pause that also slows down the small queries. If you can run with -XX:+PrintGCDetails in your Java options, it would also be good to see what percent of each GC generation is used. The concurrent mark-and-sweep GC -XX:+UseConcMarkSweepGC or the G1 GC in Java 7 (-XX:+UseG1GC) might also avoid these pauses by GCing concurrently with your application threads. Matei On Mar 10, 2014, at 3:18 PM, Koert Kuipers ko...@tresata.com wrote: hello all, i am observing a strange result. i have a computation that i run on a cached RDD in spark-standalone. it typically takes about 4 seconds. but when other RDDs that are not relevant to the computation at hand are cached in memory (in same spark context), the computation takes 40 seconds or more. the problem seems to be GC time, which goes from milliseconds to tens of seconds. note that my issue is not that memory is full. i have cached about 14G in RDDs with 66G available across workers for the application. also my computation did not push any cached RDD out of memory. any ideas?
Re: Too many open files exception on reduceByKey
Hey Matt, The best way is definitely just to increase the ulimit if possible, this is sort of an assumption we make in Spark that clusters will be able to move it around. You might be able to hack around this by decreasing the number of reducers but this could have some performance implications for your job. In general if a node in your cluster has C assigned cores and you run a job with X reducers then Spark will open C*X files in parallel and start writing. Shuffle consolidation will help decrease the total number of files created but the number of file handles open at any time doesn't change so it won't help the ulimit problem. This means you'll have to use fewer reducers (e.g. pass reduceByKey a number of reducers) or use fewer cores on each machine. - Patrick On Mon, Mar 10, 2014 at 10:41 AM, Matthew Cheah matthew.c.ch...@gmail.com wrote: Hi everyone, My team (cc'ed in this e-mail) and I are running a Spark reduceByKey operation on a cluster of 10 slaves where I don't have the privileges to set ulimit -n to a higher number. I'm running on a cluster where ulimit -n returns 1024 on each machine. When I attempt to run this job with the data originating from a text file, stored in an HDFS cluster running on the same nodes as the Spark cluster, the job crashes with the message, Too many open files. My question is, why are so many files being created, and is there a way to configure the Spark context to avoid spawning that many files? I am already setting spark.shuffle.consolidateFiles to true. I want to repeat - I can't change the maximum number of open file descriptors on the machines. This cluster is not owned by me and the system administrator is responding quite slowly. Thanks, -Matt Cheah
How to create RDD from Java in-memory data?
I would like to construct an RDD from data I already have in memory as POJO objects. Is this possible? For example, is it possible to create an RDD from IterableString? I'm running Spark from Java as a stand-alone application. The JavaWordCount example runs fine. In the example, the initial RDD is populated from a text file. In my use case, I'm streaming data from a database, but even this is hidden behind an interface which is essentially IterableString. What I am doing is so basic that I must not understand something obvious. Thanks for any suggestions. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-from-Java-in-memory-data-tp2486.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Unsubscribe
Unsubscribe
if there is shark 0.9 build can be download?
Does anyone know if there is shark 0.9 build can be download? if not, when there will be shark 0.9 build?
Re: How to create RDD from Java in-memory data?
I was right ... I was missing something obvious. The answer to my question is to use JavaSparkContext.parallelize which works with ListT or ListTuple2lt;K,V. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-create-RDD-from-Java-in-memory-data-tp2486p2487.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: [BLOG] Spark on Cassandra w/ Calliope
+1 that we have been using calliope for few months and its working out really great for us. Any plans on integrating into spark? On Mar 10, 2014 1:58 PM, Rohit Rai ro...@tuplejump.com wrote: We are happy that you found Calliope useful and glad we could help. *Founder CEO, **Tuplejump, Inc.* www.tuplejump.com *The Data Engineering Platform* On Sat, Mar 8, 2014 at 2:18 AM, Brian O'Neill b...@alumni.brown.eduwrote: FWIW - I posted some notes to help people get started quickly with Spark on C*. http://brianoneill.blogspot.com/2014/03/spark-on-cassandra-w-calliope.html (tnx again to Rohit and team for all of their help) -brian -- Brian ONeill CTO, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://brianoneill.blogspot.com/ twitter: @boneill42
Re: Sharing SparkContext
Which version of Spark are you using? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Mon, Mar 10, 2014 at 6:49 PM, abhinav chowdary abhinav.chowd...@gmail.com wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, Ognen Duzlevski og...@nengoiksvelzud.com wrote: In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark’s scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users). By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages” (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don’t need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly. Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings. To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext: On 2/25/14, 12:30 PM, Mayur Rustagi wrote: fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code order is decided then fair scheduler will ensure that all tasks get equal cluster time :) Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski og...@nengoiksvelzud.com wrote: Doesn't the fair scheduler solve this? Ognen On 2/25/14, 12:08 PM, abhinav chowdary wrote: Sorry for not being clear earlier how do you want to pass the operations to the spark context? this is partly what i am looking for . How to access the active spark context and possible ways to pass operations Thanks On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: how do you want to pass the operations to the spark context? Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary abhinav.chowd...@gmail.com wrote: Hi, I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. Below is code of a simple app i am testing def main(args: Array[String]) { println(Welcome to example application!) val sc = new SparkContext(spark://10.128.228.142:7077, Simple App) println(Spark context created!) println(Creating RDD!) Now once this context is created i want to access this to submit multiple jobs/operations Any help is much appreciated Thanks -- Warm Regards Abhinav Chowdary
Re: Sharing SparkContext
0.8.1 we used branch 0.8 and pull request into our local repo. I remember we have to deal with few issues but once we are thought that its working great. On Mar 10, 2014 6:51 PM, Mayur Rustagi mayur.rust...@gmail.com wrote: Which version of Spark are you using? Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi https://twitter.com/mayur_rustagi On Mon, Mar 10, 2014 at 6:49 PM, abhinav chowdary abhinav.chowd...@gmail.com wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, Ognen Duzlevski og...@nengoiksvelzud.com wrote: In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By job, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark's scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users). By default, Spark's scheduler runs jobs in FIFO fashion. Each job is divided into stages (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don't need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly. Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a round robin fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings. To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext: On 2/25/14, 12:30 PM, Mayur Rustagi wrote: fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code order is decided then fair scheduler will ensure that all tasks get equal cluster time :) Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski og...@nengoiksvelzud.com wrote: Doesn't the fair scheduler solve this? Ognen On 2/25/14, 12:08 PM, abhinav chowdary wrote: Sorry for not being clear earlier how do you want to pass the operations to the spark context? this is partly what i am looking for . How to access the active spark context and possible ways to pass operations Thanks On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: how do you want to pass the operations to the spark context? Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary abhinav.chowd...@gmail.com wrote: Hi, I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. Below is code of a simple app i am testing def main(args: Array[String]) { println(Welcome to example application!) val sc = new SparkContext(spark://10.128.228.142:7077, Simple App) println(Spark context created!) println(Creating RDD!) Now once this context is created i want to access this to submit multiple jobs/operations Any help is much appreciated Thanks -- Warm Regards Abhinav Chowdary
Re: Sharing SparkContext
Are you using it with HDFS? What version of Hadoop? 1.0.4? Ognen On 3/10/14, 8:49 PM, abhinav chowdary wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, Ognen Duzlevski og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote: In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By job, in this section, we mean a Spark action (e.g.|save|,|collect|) and any tasks that need to run to evaluate that action. Spark's scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users). By default, Spark's scheduler runs jobs in FIFO fashion. Each job is divided into stages (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don't need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly. Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a round robin fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings. To enable the fair scheduler, simply set the|spark.scheduler.mode|to|FAIR|before creating a SparkContext: On 2/25/14, 12:30 PM, Mayur Rustagi wrote: fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code order is decided then fair scheduler will ensure that all tasks get equal cluster time :) Mayur Rustagi Ph: +919632149971 tel:%2B919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com http://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski og...@nengoiksvelzud.com mailto:og...@nengoiksvelzud.com wrote: Doesn't the fair scheduler solve this? Ognen On 2/25/14, 12:08 PM, abhinav chowdary wrote: Sorry for not being clear earlier how do you want to pass the operations to the spark context? this is partly what i am looking for . How to access the active spark context and possible ways to pass operations Thanks On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi mayur.rust...@gmail.com mailto:mayur.rust...@gmail.com wrote: how do you want to pass the operations to the spark context? Mayur Rustagi Ph: +919632149971 tel:%2B919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com http://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary abhinav.chowd...@gmail.com mailto:abhinav.chowd...@gmail.com wrote: Hi, I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. Below is code of a simple app i am testing def main(args: Array[String]) { println(Welcome to example application!) val sc = new SparkContext(spark://10.128.228.142:7077 http://10.128.228.142:7077, Simple App) println(Spark context created!) println(Creating RDD!) Now once this context is created i want to access this to submit multiple jobs/operations Any help is much appreciated Thanks -- Warm Regards Abhinav Chowdary -- Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems. -- Jamie Zawinski
is spark 0.9.0 HA?
is spark 0.9.0 HA? we only have one master server , i think is is not . so, Does anyone know how to support HA for spark?
Re: Sharing SparkContext
hdfs 1.0.4 but we primarily use Cassandra + Spark (calliope). I tested it with both Are you using it with HDFS? What version of Hadoop? 1.0.4? Ognen On 3/10/14, 8:49 PM, abhinav chowdary wrote: for any one who is interested to know about job server from Ooyala.. we started using it recently and been working great so far.. On Feb 25, 2014 9:23 PM, Ognen Duzlevski og...@nengoiksvelzud.com wrote: In that case, I must have misunderstood the following (from http://spark.incubator.apache.org/docs/0.8.1/job-scheduling.html). Apologies. Ognen Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By job, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action. Spark's scheduler is fully thread-safe and supports this use case to enable applications that serve multiple requests (e.g. queries for multiple users). By default, Spark's scheduler runs jobs in FIFO fashion. Each job is divided into stages (e.g. map and reduce phases), and the first job gets priority on all available resources while its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the head of the queue don't need to use the whole cluster, later jobs can start to run right away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly. Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair sharing, Spark assigns tasks between jobs in a round robin fashion, so that all jobs get a roughly equal share of cluster resources. This means that short jobs submitted while a long job is running can start receiving resources right away and still get good response times, without waiting for the long job to finish. This mode is best for multi-user settings. To enable the fair scheduler, simply set the spark.scheduler.mode to FAIR before creating a SparkContext: On 2/25/14, 12:30 PM, Mayur Rustagi wrote: fair scheduler merely reorders tasks .. I think he is looking to run multiple pieces of code on a single context on demand from customers...if the code order is decided then fair scheduler will ensure that all tasks get equal cluster time :) Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 10:24 AM, Ognen Duzlevski og...@nengoiksvelzud.com wrote: Doesn't the fair scheduler solve this? Ognen On 2/25/14, 12:08 PM, abhinav chowdary wrote: Sorry for not being clear earlier how do you want to pass the operations to the spark context? this is partly what i am looking for . How to access the active spark context and possible ways to pass operations Thanks On Tue, Feb 25, 2014 at 10:02 AM, Mayur Rustagi mayur.rust...@gmail.com wrote: how do you want to pass the operations to the spark context? Mayur Rustagi Ph: +919632149971 h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com https://twitter.com/mayur_rustagi On Tue, Feb 25, 2014 at 9:59 AM, abhinav chowdary abhinav.chowd...@gmail.com wrote: Hi, I am looking for ways to share the sparkContext, meaning i need to be able to perform multiple operations on the same spark context. Below is code of a simple app i am testing def main(args: Array[String]) { println(Welcome to example application!) val sc = new SparkContext(spark://10.128.228.142:7077, Simple App) println(Spark context created!) println(Creating RDD!) Now once this context is created i want to access this to submit multiple jobs/operations Any help is much appreciated Thanks -- Warm Regards Abhinav Chowdary -- Some people, when confronted with a problem, think I know, I'll use regular expressions. Now they have two problems. -- Jamie Zawinski
Re: SPARK_JAVA_OPTS not picked up by the application
have your send spark-env.sh to the slave nodes ? 2014-03-11 6:47 GMT+08:00 Linlin linlin200...@gmail.com: Hi, I have a java option (-Xss) setting specified in SPARK_JAVA_OPTS in spark-env.sh, noticed after stop/restart the spark cluster, the master/worker daemon has the setting being applied, but this setting is not being propagated to the executor, my application continue behave the same. I am not sure if there is a way to specify it through SparkConf? like SparkConf.set(), and what is the correct way of setting this up for a particular spark application. Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: is spark 0.9.0 HA?
Spark 0.9.0 does include standalone scheduler HA, but it requires running multiple masters. The docs are located here: https://spark.apache.org/docs/0.9.0/spark-standalone.html#high-availability 0.9.0 also includes driver HA (for long-running normal or streaming jobs), allowing you to submit a driver into the standalone cluster which will be restarted automatically if it crashes. That doc is on the same page: https://spark.apache.org/docs/0.9.0/spark-standalone.html#launching-applications-inside-the-cluster Please let me know if you have further questions. On Mon, Mar 10, 2014 at 6:57 PM, qingyang li liqingyang1...@gmail.comwrote: is spark 0.9.0 HA? we only have one master server , i think is is not . so, Does anyone know how to support HA for spark?
Re: SPARK_JAVA_OPTS not picked up by the application
The properties in spark-env.sh are machine-specific. so need to specify in you worker as well. I guess you ask is the System.setproperty(). you can call it before you initialize your sparkcontext. Best Regards, Chen Jingci On Tue, Mar 11, 2014 at 6:47 AM, Linlin linlin200...@gmail.com wrote: Hi, I have a java option (-Xss) setting specified in SPARK_JAVA_OPTS in spark-env.sh, noticed after stop/restart the spark cluster, the master/worker daemon has the setting being applied, but this setting is not being propagated to the executor, my application continue behave the same. I am not sure if there is a way to specify it through SparkConf? like SparkConf.set(), and what is the correct way of setting this up for a particular spark application. Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: SPARK_JAVA_OPTS not picked up by the application
my cluster only has 1 node (master/worker). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483p2506.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: how to use the log4j for the standalone app
Thanks, but I do not to log myself program info, I just do not want spark output all the info to my console, I want the spark output the log into to some file which I specified. On Tue, Mar 11, 2014 at 11:49 AM, Robin Cjc cjcro...@gmail.com wrote: Hi lihu, you can extends the org.apache.spark.logging class. Then use the function like logInfo(). Then will log according to the config in your log4j.properties. Best Regards, Chen Jingci On Tue, Mar 11, 2014 at 11:36 AM, lihu lihu...@gmail.com wrote: Hi, I use the spark0.9, and when i run the spark-shell, I can log property according the log4j.properties in the SPARK_HOME/conf directory.But when I use the standalone app, I do not know how to log it. I use the SparkConf to set it, such as: *val conf = new SparkConf()* * conf.set(*log4j.configuration*, /home/hadoop/spark/conf/l* *og4j.properties**)* but it does not work. this question maybe simple, but I can not find anything in the web. and I think this maybe helpful for many people who do not familiar with spark. -- *Best Wishes!* *Li Hu(李浒) | Graduate Student* *Institute for Interdisciplinary Information Sciences(IIIS http://iiis.tsinghua.edu.cn/)* *Tsinghua University, China* *Email: lihu...@gmail.com lihu...@gmail.com* *Tel : +86 15120081920* *Homepage: http://iiis.tsinghua.edu.cn/zh/lihu/ http://iiis.tsinghua.edu.cn/zh/lihu/*
Re: SPARK_JAVA_OPTS not picked up by the application
Thanks! since my worker is on the same node, -Xss JVM option is for setting thread maximum stack size, my worker does show this option now. now I realized I accidently run the the app run in local mode as I didn't give the master URL when initializing the spark context, for local mode, how to pass jvm option to the app? hadoop 17315 1 0 14:56 ?00:02:12 /home/hadoop/ibm-java-x86_64-60/bin/java -cp :/home/hadoop/spark-0.9.0-incubating/conf:/home/hadoop/spark-0.9.0-incubating/assembly/target/scala-2.10/spark-assembly-0.9.0-incubating-hadoop1.2.1.jar -Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m -Xss1024k org.apache.spark.deploy.worker.Worker spark://hdtest021.svl.ibm.com:7077 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483p2510.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
Re: SPARK_JAVA_OPTS not picked up by the application
Thanks! so SPARK_DAEMON_JAVA_OPTS is for worker? and SPARK_JAVA_OPTS is for master? I only set SPARK_JAVA_OPTS in spark-env.sh, and the JVM opt is applied to both master/worker daemon. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-JAVA-OPTS-not-picked-up-by-the-application-tp2483p2511.html Sent from the Apache Spark User List mailing list archive at Nabble.com.