Re: Spark on Mesos 0.20
I was able to reproduce it on a small 4 node cluster (1 mesos master and 3 mesos slaves) with relatively low-end specs. As I said, I just ran the log query examples with the fine-grained mesos mode. Spark 1.1.0 and mesos 0.20.1. Fairiz, could you try running the logquery example included with Spark and see what you get? Thanks! On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com wrote: That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (less than 100 cores across 3 nodes). The workloads reads in just a few gigabytes from HDFS, via an ipython notebook spark shell. thanks, Fi Fairiz Fi Azizi On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote: Ok I created SPARK-3817 to track this, will try to repro it as well. Tim On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote: I've recently run into this issue as well. I get it from running Spark examples such as log query. Maybe that'll help reproduce the issue. On Monday, October 6, 2014, Gurvinder Singh gurvinder.si...@uninett.no wrote: The issue does not occur if the task at hand has small number of map tasks. I have a task which has 978 map tasks and I see this error as 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-5 Here is the log from the mesos-slave where this container was running. http://pastebin.com/Q1Cuzm6Q If you look for the code from where error produced by spark, you will see that it simply exit and saying in comments this should never happen, lets just quit :-) - Gurvinder On 10/06/2014 09:30 AM, Timothy Chen wrote: (Hit enter too soon...) What is your setup and steps to repro this? Tim On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com wrote: Hi Gurvinder, I tried fine grain mode before and didn't get into that problem. On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh gurvinder.si...@uninett.no wrote: On 10/06/2014 08:19 AM, Fairiz Azizi wrote: The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 I've gotten it to work just fine on 0.18.1 and 0.18.2 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? -Fi Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in coarse mode, in fine grain mode there is an issue with blockmanager names conflict. I have been waiting for it to be fixed but it is still there. -Gurvinder - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314
Local tests logging to log4j
Hi, I have added some changes to ALS tests and I am re-running tests as: mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn -DwildcardSuites=org.apache.spark.mllib.recommendation.ALSSuite test I have some INFO logs in the code which I want to see on my console. They work fine if I add println. I copied conf/log4j.properties.template to conf/log4j.properties The options are: log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err I still don't see the INFO msgs on the console. Any idea if I am setting up my log4j properties correctly ? Thanks. Deb
Re: TorrentBroadcast slow performance
Could you create a JIRA for it? maybe it's a regression after https://issues.apache.org/jira/browse/SPARK-3119. We will appreciate that if you could tell how to reproduce it. On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel guillaume.pi...@exensa.com wrote: Hi, I've had no answer to this on u...@spark.apache.org, so I post it on dev before filing a JIRA (in case the problem or solution is already identified) We've had some performance issues since switching to 1.1.0, and we finally found the origin : TorrentBroadcast seems to be very slow in our setting (and it became default with 1.1.0) The logs of a 4MB variable with TorrentBroadcast : (15s) 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 stored as bytes in memory (estimated size 171.6 KB, free 7.2 GB) 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block broadcast_84_piece1 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) called with curMem=1401611984, maxMem=9168696115 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 stored as bytes in memory (estimated size 4.0 MB, free 7.2 GB) 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block broadcast_84_piece0 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast variable 84 took 15.202260006 s 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) called with curMem=1405806288, maxMem=9168696115 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as values in memory (estimated size 4.2 MB, free 7.2 GB) (notice that a 10s lag happens after the Updated info of block broadcast_... and before the MemoryStore log And with HttpBroadcast (0.3s): 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast variable 147 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) called with curMem=1373493232, maxMem=9168696115 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as values in memory (estimated size 4.2 MB, free 7.3 GB) 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found block broadcast_147 locally Since Torrent is supposed to perform much better than Http, we suspect a configuration error from our side, but are unable to pin it down. Does someone have any idea of the origin of the problem ? For now we're sticking with the HttpBroadcast workaround. Guillaume -- Guillaume PITEL, Président +33(0)626 222 431 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: TorrentBroadcast slow performance
Maybe there is a firewall issue that makes it slow for your nodes to connect through the IP addresses they're configured with. I see there's this 10 second pause between Updated info of block broadcast_84_piece1 and ensureFreeSpace(4194304) called (where it actually receives the block). HTTP broadcast used only HTTP fetches from the executors to the driver, but TorrentBroadcast has connections between the executors themselves and between executors and the driver over a different port. Where are you running your driver app and nodes? Matei On Oct 7, 2014, at 11:42 AM, Davies Liu dav...@databricks.com wrote: Could you create a JIRA for it? maybe it's a regression after https://issues.apache.org/jira/browse/SPARK-3119. We will appreciate that if you could tell how to reproduce it. On Mon, Oct 6, 2014 at 1:27 AM, Guillaume Pitel guillaume.pi...@exensa.com wrote: Hi, I've had no answer to this on u...@spark.apache.org, so I post it on dev before filing a JIRA (in case the problem or solution is already identified) We've had some performance issues since switching to 1.1.0, and we finally found the origin : TorrentBroadcast seems to be very slow in our setting (and it became default with 1.1.0) The logs of a 4MB variable with TorrentBroadcast : (15s) 14/10/01 15:47:13 INFO storage.MemoryStore: Block broadcast_84_piece1 stored as bytes in memory (estimated size 171.6 KB, free 7.2 GB) 14/10/01 15:47:13 INFO storage.BlockManagerMaster: Updated info of block broadcast_84_piece1 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4194304) called with curMem=1401611984, maxMem=9168696115 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84_piece0 stored as bytes in memory (estimated size 4.0 MB, free 7.2 GB) 14/10/01 15:47:23 INFO storage.BlockManagerMaster: Updated info of block broadcast_84_piece0 14/10/01 15:47:23 INFO broadcast.TorrentBroadcast: Reading broadcast variable 84 took 15.202260006 s 14/10/01 15:47:23 INFO storage.MemoryStore: ensureFreeSpace(4371392) called with curMem=1405806288, maxMem=9168696115 14/10/01 15:47:23 INFO storage.MemoryStore: Block broadcast_84 stored as values in memory (estimated size 4.2 MB, free 7.2 GB) (notice that a 10s lag happens after the Updated info of block broadcast_... and before the MemoryStore log And with HttpBroadcast (0.3s): 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Started reading broadcast variable 147 14/10/01 16:05:58 INFO storage.MemoryStore: ensureFreeSpace(4369376) called with curMem=1373493232, maxMem=9168696115 14/10/01 16:05:58 INFO storage.MemoryStore: Block broadcast_147 stored as values in memory (estimated size 4.2 MB, free 7.3 GB) 14/10/01 16:05:58 INFO broadcast.HttpBroadcast: Reading broadcast variable 147 took 0.320907112 s 14/10/01 16:05:58 INFO storage.BlockManager: Found block broadcast_147 locally Since Torrent is supposed to perform much better than Http, we suspect a configuration error from our side, but are unable to pin it down. Does someone have any idea of the origin of the problem ? For now we're sticking with the HttpBroadcast workaround. Guillaume -- Guillaume PITEL, Président +33(0)626 222 431 eXenSa S.A.S. 41, rue Périer - 92120 Montrouge - FRANCE Tel +33(0)184 163 677 / Fax +33(0)972 283 705 - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Local tests logging to log4j
What has worked for me is to bundle log4j.properties in the root of the application's .jar file, since log4j will look for it there, and configuring log4j will turn off Spark's default log4j configuration. I don't think conf/log4j.properties is going to do anything by itself, but -Dlog4j.configuration=/path/to/file should cause it read a config file on the file system. But for messing with a local build of Spark, just edit core/src/main/resources/org/apache/spark/log4j-defaults.properties and rebuild. Yes I think your syntax is OK; here's some of mine where I turn off a bunch of INFO messages: log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{1}:%L %m%n log4j.logger.org.apache.hadoop=WARN log4j.logger.org.apache.kafka=WARN log4j.logger.kafka=WARN log4j.logger.akka=WARN log4j.logger.org.apache.spark=WARN log4j.logger.org.apache.spark.storage.BlockManager=ERROR log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.I0Itec.zkclient=WARN On Tue, Oct 7, 2014 at 7:42 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I have added some changes to ALS tests and I am re-running tests as: mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn -DwildcardSuites=org.apache.spark.mllib.recommendation.ALSSuite test I have some INFO logs in the code which I want to see on my console. They work fine if I add println. I copied conf/log4j.properties.template to conf/log4j.properties The options are: log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err I still don't see the INFO msgs on the console. Any idea if I am setting up my log4j properties correctly ? Thanks. Deb - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Extending Scala style checks
For starters, do we have a list of all the Scala style rules that are currently not enforced automatically but are likely well-suited for automation? Let's put such a list together in a JIRA issue and work through implementing them. Nick On Thu, Oct 2, 2014 at 12:06 AM, Cheng Lian lian.cs@gmail.com wrote: Since we can easily catch the list of all changed files in a PR, I think we can start with adding the no trailing space check for newly changed files only? On 10/2/14 9:24 AM, Nicholas Chammas wrote: Yeah, I remember that hell when I added PEP 8 to the build checks and fixed all the outstanding Python style issues. I had to keep rebasing and resolving merge conflicts until the PR was merged. It's a rough process, but thankfully it's also a one-time process. I might be able to help with that in the next week or two if no-one else wants to pick it up. Nick On Wed, Oct 1, 2014 at 9:20 PM, Michael Armbrust mich...@databricks.com wrote: The hard part here is updating the existing code base... which is going to create merge conflicts with like all of the open PRs... On Wed, Oct 1, 2014 at 6:13 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Ah, since there appears to be a built-in rule for end-of-line whitespace, Michael and Cheng, y'all should be able to add this in pretty easily. Nick On Wed, Oct 1, 2014 at 6:37 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Nick, We can always take built-in rules. Back when we added this Prashant Sharma actually did some great work that lets us write our own style rules in cases where rules don't exist. You can see some existing rules here: https://github.com/apache/spark/tree/master/project/ spark-style/src/main/scala/org/apache/spark/scalastyle Prashant has over time contributed a lot of our custom rules upstream to stalastyle, so now there are only a couple there. - Patrick On Wed, Oct 1, 2014 at 2:36 PM, Ted Yu yuzhih...@gmail.com wrote: Please take a look at WhitespaceEndOfLineChecker under: http://www.scalastyle.org/rules-0.1.0.html Cheers On Wed, Oct 1, 2014 at 2:01 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: As discussed here https://github.com/apache/spark/pull/2619, it would be good to extend our Scala style checks to programmatically enforce as many of our style rules as possible. Does anyone know if it's relatively straightforward to enforce additional rules like the no trailing spaces rule mentioned in the linked PR? Nick
Re: Local tests logging to log4j
Thanks Sean...trying them out... On Tue, Oct 7, 2014 at 12:24 PM, Sean Owen so...@cloudera.com wrote: What has worked for me is to bundle log4j.properties in the root of the application's .jar file, since log4j will look for it there, and configuring log4j will turn off Spark's default log4j configuration. I don't think conf/log4j.properties is going to do anything by itself, but -Dlog4j.configuration=/path/to/file should cause it read a config file on the file system. But for messing with a local build of Spark, just edit core/src/main/resources/org/apache/spark/log4j-defaults.properties and rebuild. Yes I think your syntax is OK; here's some of mine where I turn off a bunch of INFO messages: log4j.rootLogger=INFO, stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{ISO8601} %-5p %c{1}:%L %m%n log4j.logger.org.apache.hadoop=WARN log4j.logger.org.apache.kafka=WARN log4j.logger.kafka=WARN log4j.logger.akka=WARN log4j.logger.org.apache.spark=WARN log4j.logger.org.apache.spark.storage.BlockManager=ERROR log4j.logger.org.apache.zookeeper=WARN log4j.logger.org.eclipse.jetty=WARN log4j.logger.org.I0Itec.zkclient=WARN On Tue, Oct 7, 2014 at 7:42 PM, Debasish Das debasish.da...@gmail.com wrote: Hi, I have added some changes to ALS tests and I am re-running tests as: mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn -DwildcardSuites=org.apache.spark.mllib.recommendation.ALSSuite test I have some INFO logs in the code which I want to see on my console. They work fine if I add println. I copied conf/log4j.properties.template to conf/log4j.properties The options are: log4j.rootCategory=INFO, console log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.target=System.err I still don't see the INFO msgs on the console. Any idea if I am setting up my log4j properties correctly ? Thanks. Deb
Re: Spark on Mesos 0.20
Sure, could you point me to the example? The only thing I could find was https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/LogQuery.scala So do you mean running it like: MASTER=mesos://xxx*:5050* ./run-example LogQuery I tried that and I can see the job run and the tasks complete on the slave nodes, but the client process seems to hang forever, it's probably a different problem. BTW, only a dozen or so tasks kick off. I actually haven't done much with Scala and Spark (it's been all python). Fi Fairiz Fi Azizi On Tue, Oct 7, 2014 at 6:29 AM, RJ Nowling rnowl...@gmail.com wrote: I was able to reproduce it on a small 4 node cluster (1 mesos master and 3 mesos slaves) with relatively low-end specs. As I said, I just ran the log query examples with the fine-grained mesos mode. Spark 1.1.0 and mesos 0.20.1. Fairiz, could you try running the logquery example included with Spark and see what you get? Thanks! On Mon, Oct 6, 2014 at 8:07 PM, Fairiz Azizi code...@gmail.com wrote: That's what great about Spark, the community is so active! :) I compiled Mesos 0.20.1 from the source tarball. Using the Mapr3 Spark 1.1.0 distribution from the Spark downloads page (spark-1.1.0-bin-mapr3.tgz). I see no problems for the workloads we are trying. However, the cluster is small (less than 100 cores across 3 nodes). The workloads reads in just a few gigabytes from HDFS, via an ipython notebook spark shell. thanks, Fi Fairiz Fi Azizi On Mon, Oct 6, 2014 at 9:20 AM, Timothy Chen tnac...@gmail.com wrote: Ok I created SPARK-3817 to track this, will try to repro it as well. Tim On Mon, Oct 6, 2014 at 6:08 AM, RJ Nowling rnowl...@gmail.com wrote: I've recently run into this issue as well. I get it from running Spark examples such as log query. Maybe that'll help reproduce the issue. On Monday, October 6, 2014, Gurvinder Singh gurvinder.si...@uninett.no wrote: The issue does not occur if the task at hand has small number of map tasks. I have a task which has 978 map tasks and I see this error as 14/10/06 09:34:40 ERROR BlockManagerMasterActor: Got two different block manager registrations on 20140711-081617-711206558-5050-2543-5 Here is the log from the mesos-slave where this container was running. http://pastebin.com/Q1Cuzm6Q If you look for the code from where error produced by spark, you will see that it simply exit and saying in comments this should never happen, lets just quit :-) - Gurvinder On 10/06/2014 09:30 AM, Timothy Chen wrote: (Hit enter too soon...) What is your setup and steps to repro this? Tim On Mon, Oct 6, 2014 at 12:30 AM, Timothy Chen tnac...@gmail.com wrote: Hi Gurvinder, I tried fine grain mode before and didn't get into that problem. On Sun, Oct 5, 2014 at 11:44 PM, Gurvinder Singh gurvinder.si...@uninett.no wrote: On 10/06/2014 08:19 AM, Fairiz Azizi wrote: The Spark online docs indicate that Spark is compatible with Mesos 0.18.1 I've gotten it to work just fine on 0.18.1 and 0.18.2 Has anyone tried Spark on a newer version of Mesos, i.e. Mesos v0.20.0? -Fi Yeah we are using Spark 1.1.0 with Mesos 0.20.1. It runs fine in coarse mode, in fine grain mode there is an issue with blockmanager names conflict. I have been waiting for it to be fixed but it is still there. -Gurvinder - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org -- em rnowl...@gmail.com c 954.496.2314 -- em rnowl...@gmail.com c 954.496.2314
Unneeded branches/tags
Just curious: Are there branches and/or tags on the repo that we don’t need anymore? What are the scala-2.9 and streaming branches for, for example? And do we still need branches for older versions of Spark that we are not backporting stuff to, like branch-0.5? Nick
Re: Unneeded branches/tags
Those branches are no longer active. However, I don't think we can delete branches from github due to the way ASF mirroring works. I might be wrong there. On Tue, Oct 7, 2014 at 6:25 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Just curious: Are there branches and/or tags on the repo that we don’t need anymore? What are the scala-2.9 and streaming branches for, for example? And do we still need branches for older versions of Spark that we are not backporting stuff to, like branch-0.5? Nick
Re: Unneeded branches/tags
Actually - weirdly - we can delete old tags and it works with the mirroring. Nick if you put together a list of un-needed tags I can delete them. On Tue, Oct 7, 2014 at 6:27 PM, Reynold Xin r...@databricks.com wrote: Those branches are no longer active. However, I don't think we can delete branches from github due to the way ASF mirroring works. I might be wrong there. On Tue, Oct 7, 2014 at 6:25 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Just curious: Are there branches and/or tags on the repo that we don't need anymore? What are the scala-2.9 and streaming branches for, for example? And do we still need branches for older versions of Spark that we are not backporting stuff to, like branch-0.5? Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org