java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -9223372036842471144
Hello, I have a Spark app which I run with master local[3]. When running without any persist calls, it seems to work fine, but as soon as I add persist calls (at default storage level), it fails at the first persist call with the message below. Unfortunately, I can't post the code. Polling the JVM memory stats while the app is running seems to indicate that the JVM has not yet grown to its maximum size. Any advice? Thanks! Best, Oliver 14/10/28 10:51:30 INFO storage.MemoryStore: ensureFreeSpace(-9223372036842471144) called with curMem=1760, maxMem=3523372646 14/10/28 10:51:30 INFO storage.MemoryStore: Block rdd_1_2 stored as values in memory (estimated size -9223372036842471400.0 B, free -9223372033343709200.0 B) 14/10/28 10:51:30 ERROR executor.Executor: Exception in task 2.0 in stage 0.0 (TID 2) java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -9223372036842471144 at scala.Predef$.require(Predef.scala:233) at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767) at org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) 14/10/28 10:51:30 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, PROCESS_LOCAL, 3961 bytes) 14/10/28 10:51:30 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3) 14/10/28 10:51:30 INFO spark.CacheManager: Partition rdd_1_3 not found, computing it 14/10/28 10:51:30 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 (TID 2, localhost): java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -9223372036842471144 scala.Predef$.require(Predef.scala:233) org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767) org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625) org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) java.lang.Thread.run(Unknown Source) 14/10/28 10:51:30 ERROR scheduler.TaskSetManager: Task 2 in stage 0.0 failed 1 times; aborting job 14/10/28 10:51:30 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 14/10/28 10:51:30 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled 14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 0.0 in stage 0.0 (TID 0) 14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 1.0 in stage 0.0 (TID 1) 14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 3.0 in stage 0.0 (TID 3) 14/10/28 10:51:30 INFO scheduler.DAGScheduler: Failed to run count at X Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost): java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -9223372036842471144 scala.Predef$.require(Predef.scala:233) org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55) org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767) org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625) org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167) org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) org.apache.spark.rdd.RDD.iterator(RDD.scala:227) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
What is akka-actor_2.10-2.2.3-shaded-protobuf.jar?
Hello, My SBT pulls in, among others, the following dependency for Spark 1.1.0: akka-actor_2.10-2.2.3-shaded-protobuf.jar What is this? How is this different from the regular Akka Actor JAR? How do I reconcile with other libs that use Akka, such as Play? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Spark as a Library
Hello, Suppose I want to use Spark from an application that I already submit to run in another container (e.g. Tomcat). Is this at all possible? Or do I have to split the app into two components, and submit one to Spark and one to the other container? In that case, what is the preferred way for the two components to communicate with each other? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Spark as a Library
Hello, Thanks for the response and great to hear it is possible. But how do I connect to Spark without using the submit script? I know how to start up a master and some workers and then connect to the master by packaging the app that contains the SparkContext and then submitting the package with the spark-submit script in standalone-mode. But I don’t want to submit the app that contains the SparkContext via the script, because I want that app to be running on a web server. So, what are other ways to connect to Spark? I can’t find in the docs anything other than using the script. Thanks! Best, Oliver From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: Tuesday, September 16, 2014 1:31 PM To: Ruebenacker, Oliver A; user@spark.apache.org Subject: Re: Spark as a Library If you want to run the computation on just one machine (using Spark's local mode), it can probably run in a container. Otherwise you can create a SparkContext there and connect it to a cluster outside. Note that I haven't tried this though, so the security policies of the container might be too restrictive. In that case you'd have to run the app outside and expose an RPC interface between them. Matei On September 16, 2014 at 8:17:08 AM, Ruebenacker, Oliver A (oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com) wrote: Hello, Suppose I want to use Spark from an application that I already submit to run in another container (e.g. Tomcat). Is this at all possible? Or do I have to split the app into two components, and submit one to Spark and one to the other container? In that case, what is the preferred way for the two components to communicate with each other? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Web UI
Hello, Thanks for the explanation. So events are stored internally as JSON, but there is no official support for having Spark serve that JSON via HTTP? So if I wanted to write an app that monitors Spark, I would either have to scrape the web UI in HTML or rely on unofficial JSON features? That is quite surprising, because I would expect dumping out the JSON would be easier for Spark developers to implement than converting it to HTML. Do I get that right? Should I make a feature request? Thanks! Best, Oliver From: Andrew Or [mailto:and...@databricks.com] Sent: Thursday, September 04, 2014 2:11 PM To: Ruebenacker, Oliver A Cc: Akhil Das; Wonha Ryu; user@spark.apache.org Subject: Re: Web UI Hi all, The JSON version of the web UI is not officially supported; I don't believe this is documented anywhere. The alternative is to set `spark.eventLog.enabled` to true before running your application. This will create JSON SparkListenerEvents with details about each task and stage as a log file. Then you can easily reconstruct the web UI after the application has exited. This is what the standalone Master and the History Server does, actually. For local mode, you can use the latter to generate your UI after the fact. (This is documented here: http://spark.apache.org/docs/latest/monitoring.html). -Andrew 2014-09-04 5:28 GMT-07:00 Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com: Hello, Thanks for the link – this is for standalone, though, and most URLs don’t work for local. I will look into deploying as standalone on a single node for testing and development. Best, Oliver From: Akhil Das [mailto:ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com] Sent: Thursday, September 04, 2014 3:09 AM To: Ruebenacker, Oliver A Cc: Wonha Ryu; user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Web UI Hi You can see this dochttps://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security for all the available webUI ports. Yes there are ways to get the data metrics in Json format, One of them is below: http://webUI:8080/json/ Or simply curl webUI:8080/json/ There are some PRs about it you can read it over here https://github.com/apache/spark/pull/1682 Thanks Best Regards On Thu, Sep 4, 2014 at 2:24 AM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, Interestingly, http://localhost:4040/metrics/json/ gives some numbers, but only a few which never seem to change during the application’s lifetime. Either the web UI has some very strange limitations, or there are some URLs yet to be discovered that do something interesting. Best, Oliver From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com] Sent: Wednesday, September 03, 2014 4:27 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Web UI Hey Oliver, IIRC there's no JSON endpoint for application web UI. They only exist for cluster master and worker. - Wonha On Wed, Sep 3, 2014 at 12:58 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, Thanks for the help! But I tried starting with “–master local[4]” and when I load http://localhost:4040/json I just get forwarded to http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON. Best, Oliver From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com] Sent: Wednesday, September 03, 2014 3:36 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Web UI Hi Oliver, Spark standalone master and worker support '/json' endpoint in web UI, which returns some of the information in JSON format. I wasn't able to find relevant documentation, though. - Wonha On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, What is included in the Spark web UI? What are the available URLs? Can the information be obtained in a machine-readable way (e.g. JSON, XML, etc)? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.comhttp://www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received
RE: Programatically running of the Spark Jobs.
Hello, Can this be used as a library from within another application? Thanks! Best, Oliver From: Matt Chu [mailto:m...@kabam.com] Sent: Thursday, September 04, 2014 2:46 AM To: Vicky Kak Cc: user Subject: Re: Programatically running of the Spark Jobs. https://github.com/spark-jobserver/spark-jobserver Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added it to our prototype stack, and will begin trying it out soon. Note that you can only do standalone or Mesos; YARN isn't quite there yet. (The repo just moved from https://github.com/ooyala/spark-jobserver, so don't trust Google on this one (yet); development is happening in the first repo.) On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak vicky@gmail.commailto:vicky@gmail.com wrote: I have been able to submit the spark jobs using the submit script but I would like to do it via code. I am unable to search anything matching to my need. I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be have to write some utility that passes the parameters required for this class. I would be interested to know how community is doing. Thanks, Vicky *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Web UI
Hello, Thanks for the link – this is for standalone, though, and most URLs don’t work for local. I will look into deploying as standalone on a single node for testing and development. Best, Oliver From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Thursday, September 04, 2014 3:09 AM To: Ruebenacker, Oliver A Cc: Wonha Ryu; user@spark.apache.org Subject: Re: Web UI Hi You can see this dochttps://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security for all the available webUI ports. Yes there are ways to get the data metrics in Json format, One of them is below: http://webUI:8080/json/ Or simply curl webUI:8080/json/ There are some PRs about it you can read it over here https://github.com/apache/spark/pull/1682 Thanks Best Regards On Thu, Sep 4, 2014 at 2:24 AM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, Interestingly, http://localhost:4040/metrics/json/ gives some numbers, but only a few which never seem to change during the application’s lifetime. Either the web UI has some very strange limitations, or there are some URLs yet to be discovered that do something interesting. Best, Oliver From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com] Sent: Wednesday, September 03, 2014 4:27 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Web UI Hey Oliver, IIRC there's no JSON endpoint for application web UI. They only exist for cluster master and worker. - Wonha On Wed, Sep 3, 2014 at 12:58 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, Thanks for the help! But I tried starting with “–master local[4]” and when I load http://localhost:4040/json I just get forwarded to http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON. Best, Oliver From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com] Sent: Wednesday, September 03, 2014 3:36 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Web UI Hi Oliver, Spark standalone master and worker support '/json' endpoint in web UI, which returns some of the information in JSON format. I wasn't able to find relevant documentation, though. - Wonha On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, What is included in the Spark web UI? What are the available URLs? Can the information be obtained in a machine-readable way (e.g. JSON, XML, etc)? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.comhttp://www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses
Is cluster manager same as master?
Hello, Is cluster manager mentioned herehttps://spark.apache.org/docs/latest/cluster-overview.html the same thing as master mentioned herehttps://spark.apache.org/docs/latest/spark-standalone.html#starting-a-cluster-manually? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Setting Java properties for Standalone on Windows 7?
Hello, I'm running Spark on Windows 7 as standalone, with everything on the same machine. No Hadoop installed. My app throws exception and worker reports: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. I had the same problem earlier when deploying local. I understand this is a bughttps://issues.apache.org/jira/browse/SPARK-2356 and I tried a workaroundhttp://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7 which worked for local deployment, but it does not work for standalone. I also tried setting the Hadoop home directory via SPARK_DEMON_JAVA_OPTS and restarted everything, but no change. Any idea how to cure this by setting Java properties or otherwise? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Reduce truncates RDD in standalone, but fine when local.
Hello, In the app below, when I run it with local[1] or local [3], I get the expected result - a list of the square roots of the numbers from 1 to 20. When I try the same app as standalone with one or two workers on the same machine, it will only print 1.0. Adding print statements into the reduce function reveals that three times it calculated Set(1.0) ++ Set(1.0) to yield Set(1.0). Any ideas? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com package sandbox import org.apache.spark.SparkConf import org.apache.spark.SparkContext object SquareRoots extends App { def sqrt(x: Double, nIters: Long) = { var iIter: Long = 0 var root = 1.0 while (iIter nIters) { iIter += 1 root = 0.5 * (root + x / root) } root } def format(x:Double) :String = { val string = + x if(string.length 5) { string.substring(0,5) } else { string } } val nNums = 20 val nIters = 10 // for 1e9, runs about 50-55 secs per stage on my laptop with --master local[4] val nStages = 10 System.setProperty(hadoop.home.dir, c:\\Users\\ruebenac\\winutil\\) val conf = new SparkConf().setAppName(Square roots) val sc = new SparkContext(conf) val logPrefix = [###] def log(line: String) = { println(logPrefix + line) } log(Let's go!) for (iStage - 0 to nStages) { log(Starting stage + iStage) val nums = sc.parallelize((1 to nNums).map(_.toDouble)) val roots = nums.map(sqrt(_, nIters)).map(Set(_)).reduce((roots1, roots2) = roots1 ++ roots2).toList.sorted log(Square roots from 1 to + nNums + in + nIters + iterations:) log(roots.map(format(_)).mkString( )) log(Completed stage + iStage) } log(Done!) } *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Reduce truncates RDD in standalone, but fine when local.
Hello, I tracked it down to the field nIters being uninitialized when passed to the reduce job while running standalone, but initialized when running local. Must be some strange interaction between Spark and scala.App. If I move the reduce job into a method and make nIters a local field, it works fine. Best, Oliver From: Ruebenacker, Oliver A [mailto:oliver.ruebenac...@altisource.com] Sent: Thursday, September 04, 2014 4:15 PM To: user@spark.apache.org Subject: Reduce truncates RDD in standalone, but fine when local. Hello, In the app below, when I run it with local[1] or local [3], I get the expected result - a list of the square roots of the numbers from 1 to 20. When I try the same app as standalone with one or two workers on the same machine, it will only print 1.0. Adding print statements into the reduce function reveals that three times it calculated Set(1.0) ++ Set(1.0) to yield Set(1.0). Any ideas? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com package sandbox import org.apache.spark.SparkConf import org.apache.spark.SparkContext object SquareRoots extends App { def sqrt(x: Double, nIters: Long) = { var iIter: Long = 0 var root = 1.0 while (iIter nIters) { iIter += 1 root = 0.5 * (root + x / root) } root } def format(x:Double) :String = { val string = + x if(string.length 5) { string.substring(0,5) } else { string } } val nNums = 20 val nIters = 10 // for 1e9, runs about 50-55 secs per stage on my laptop with --master local[4] val nStages = 10 System.setProperty(hadoop.home.dir, c:\\Users\\ruebenac\\winutil\\) val conf = new SparkConf().setAppName(Square roots) val sc = new SparkContext(conf) val logPrefix = [###] def log(line: String) = { println(logPrefix + line) } log(Let's go!) for (iStage - 0 to nStages) { log(Starting stage + iStage) val nums = sc.parallelize((1 to nNums).map(_.toDouble)) val roots = nums.map(sqrt(_, nIters)).map(Set(_)).reduce((roots1, roots2) = roots1 ++ roots2).toList.sorted log(Square roots from 1 to + nNums + in + nIters + iterations:) log(roots.map(format(_)).mkString( )) log(Completed stage + iStage) } log(Done!) } *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
Web UI
Hello, What is included in the Spark web UI? What are the available URLs? Can the information be obtained in a machine-readable way (e.g. JSON, XML, etc)? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Web UI
Hello, Thanks for the help! But I tried starting with “–master local[4]” and when I load http://localhost:4040/json I just get forwarded to http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON. Best, Oliver From: Wonha Ryu [mailto:wonha@gmail.com] Sent: Wednesday, September 03, 2014 3:36 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.org Subject: Re: Web UI Hi Oliver, Spark standalone master and worker support '/json' endpoint in web UI, which returns some of the information in JSON format. I wasn't able to find relevant documentation, though. - Wonha On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, What is included in the Spark web UI? What are the available URLs? Can the information be obtained in a machine-readable way (e.g. JSON, XML, etc)? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.comhttp://www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: Web UI
Hello, Interestingly, http://localhost:4040/metrics/json/ gives some numbers, but only a few which never seem to change during the application’s lifetime. Either the web UI has some very strange limitations, or there are some URLs yet to be discovered that do something interesting. Best, Oliver From: Wonha Ryu [mailto:wonha@gmail.com] Sent: Wednesday, September 03, 2014 4:27 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.org Subject: Re: Web UI Hey Oliver, IIRC there's no JSON endpoint for application web UI. They only exist for cluster master and worker. - Wonha On Wed, Sep 3, 2014 at 12:58 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, Thanks for the help! But I tried starting with “–master local[4]” and when I load http://localhost:4040/json I just get forwarded to http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON. Best, Oliver From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com] Sent: Wednesday, September 03, 2014 3:36 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: Web UI Hi Oliver, Spark standalone master and worker support '/json' endpoint in web UI, which returns some of the information in JSON format. I wasn't able to find relevant documentation, though. - Wonha On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com wrote: Hello, What is included in the Spark web UI? What are the available URLs? Can the information be obtained in a machine-readable way (e.g. JSON, XML, etc)? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.comhttp://www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. *** *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
If master is local, where are master and workers?
Hello, If launched with local as master, where are master and workers? Do they each have a web UI? How can they be monitored? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource(tm) 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | www.Altisource.com *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***
RE: If master is local, where are master and workers?
How can that single process be monitored? Thanks! -Original Message- From: Marcelo Vanzin [mailto:van...@cloudera.com] Sent: Wednesday, September 03, 2014 6:32 PM To: Ruebenacker, Oliver A Cc: user@spark.apache.org Subject: Re: If master is local, where are master and workers? local means everything runs in the same process; that means there is no need for master and worker daemons to start processes. On Wed, Sep 3, 2014 at 3:12 PM, Ruebenacker, Oliver A oliver.ruebenac...@altisource.com wrote: Hello, If launched with “local” as master, where are master and workers? Do they each have a web UI? How can they be monitored? Thanks! Best, Oliver Oliver Ruebenacker | Solutions Architect Altisource™ 290 Congress St, 7th Floor | Boston, Massachusetts 02210 P: (617) 728-5582 | ext: 275585 oliver.ruebenac...@altisource.com | www.Altisource.com ** * This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ** * -- Marcelo *** This email message and any attachments are intended solely for the use of the addressee. If you are not the intended recipient, you are prohibited from reading, disclosing, reproducing, distributing, disseminating or otherwise using this transmission. If you have received this message in error, please promptly notify the sender by reply email and immediately delete this message from your system. This message and any attachments may contain information that is confidential, privileged or exempt from disclosure. Delivery of this message to any person other than the intended recipient is not intended to waive any right or privilege. Message transmission is not guaranteed to be secure or free of software viruses. ***