java.lang.IllegalArgumentException: requirement failed: sizeInBytes was negative: -9223372036842471144

2014-10-28 Thread Ruebenacker, Oliver A

 Hello,

  I have a Spark app which I run with master local[3]. When running without 
any persist calls, it seems to work fine, but as soon as I add persist calls 
(at default storage level), it fails at the first persist call with the message 
below. Unfortunately, I can't post the code. Polling the JVM memory stats while 
the app is running seems to indicate that the JVM has not yet grown to its 
maximum size.

  Any advice? Thanks!

 Best, Oliver

14/10/28 10:51:30 INFO storage.MemoryStore: 
ensureFreeSpace(-9223372036842471144) called with curMem=1760, maxMem=3523372646
14/10/28 10:51:30 INFO storage.MemoryStore: Block rdd_1_2 stored as values in 
memory (estimated size -9223372036842471400.0 B, free -9223372033343709200.0 B)
14/10/28 10:51:30 ERROR executor.Executor: Exception in task 2.0 in stage 0.0 
(TID 2)
java.lang.IllegalArgumentException: requirement failed: sizeInBytes was 
negative: -9223372036842471144
   at scala.Predef$.require(Predef.scala:233)
   at org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767)
   at org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)
   at 
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)
   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
   at org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
   at org.apache.spark.scheduler.Task.run(Task.scala:54)
   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
14/10/28 10:51:30 INFO scheduler.TaskSetManager: Starting task 3.0 in stage 0.0 
(TID 3, localhost, PROCESS_LOCAL, 3961 bytes)
14/10/28 10:51:30 INFO executor.Executor: Running task 3.0 in stage 0.0 (TID 3)
14/10/28 10:51:30 INFO spark.CacheManager: Partition rdd_1_3 not found, 
computing it
14/10/28 10:51:30 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0 
(TID 2, localhost): java.lang.IllegalArgumentException: requirement failed: 
sizeInBytes was negative: -9223372036842471144
scala.Predef$.require(Predef.scala:233)
org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767)
org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)
14/10/28 10:51:30 ERROR scheduler.TaskSetManager: Task 2 in stage 0.0 failed 1 
times; aborting job
14/10/28 10:51:30 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
14/10/28 10:51:30 INFO scheduler.TaskSchedulerImpl: Stage 0 was cancelled
14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 0.0 
in stage 0.0 (TID 0)
14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 1.0 
in stage 0.0 (TID 1)
14/10/28 10:51:30 INFO executor.Executor: Executor is trying to kill task 3.0 
in stage 0.0 (TID 3)
14/10/28 10:51:30 INFO scheduler.DAGScheduler: Failed to run count at X
Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost 
task 2.0 in stage 0.0 (TID 2, localhost): java.lang.IllegalArgumentException: 
requirement failed: sizeInBytes was negative: -9223372036842471144
scala.Predef$.require(Predef.scala:233)
org.apache.spark.storage.BlockInfo.markReady(BlockInfo.scala:55)
org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:767)
org.apache.spark.storage.BlockManager.putArray(BlockManager.scala:625)
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:167)
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

What is akka-actor_2.10-2.2.3-shaded-protobuf.jar?

2014-10-17 Thread Ruebenacker, Oliver A

 Hello,

  My SBT pulls in, among others, the following dependency for Spark 1.1.0:

  akka-actor_2.10-2.2.3-shaded-protobuf.jar

  What is this? How is this different from the regular Akka Actor JAR? How do I 
reconcile with other libs that use Akka, such as Play?

  Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


Spark as a Library

2014-09-16 Thread Ruebenacker, Oliver A

 Hello,

  Suppose I want to use Spark from an application that I already submit to run 
in another container (e.g. Tomcat). Is this at all possible? Or do I have to 
split the app into two components, and submit one to Spark and one to the other 
container? In that case, what is the preferred way for the two components to 
communicate with each other? Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: Spark as a Library

2014-09-16 Thread Ruebenacker, Oliver A

 Hello,

  Thanks for the response and great to hear it is possible. But how do I 
connect to Spark without using the submit script?

  I know how to start up a master and some workers and then connect to the 
master by packaging the app that contains the SparkContext and then submitting 
the package with the spark-submit script in standalone-mode. But I don’t want 
to submit the app that contains the SparkContext via the script, because I want 
that app to be running on a web server. So, what are other ways to connect to 
Spark? I can’t find in the docs anything other than using the script. Thanks!

 Best, Oliver

From: Matei Zaharia [mailto:matei.zaha...@gmail.com]
Sent: Tuesday, September 16, 2014 1:31 PM
To: Ruebenacker, Oliver A; user@spark.apache.org
Subject: Re: Spark as a Library

If you want to run the computation on just one machine (using Spark's local 
mode), it can probably run in a container. Otherwise you can create a 
SparkContext there and connect it to a cluster outside. Note that I haven't 
tried this though, so the security policies of the container might be too 
restrictive. In that case you'd have to run the app outside and expose an RPC 
interface between them.

Matei


On September 16, 2014 at 8:17:08 AM, Ruebenacker, Oliver A 
(oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com) 
wrote:

 Hello,

  Suppose I want to use Spark from an application that I already submit to run 
in another container (e.g. Tomcat). Is this at all possible? Or do I have to 
split the app into two components, and submit one to Spark and one to the other 
container? In that case, what is the preferred way for the two components to 
communicate with each other? Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource™
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***
***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: Web UI

2014-09-05 Thread Ruebenacker, Oliver A

 Hello,

  Thanks for the explanation. So events are stored internally as JSON, but 
there is no official support for having Spark serve that JSON via HTTP? So if I 
wanted to write an app that monitors Spark, I would either have to scrape the 
web UI in HTML or rely on unofficial JSON features? That is quite surprising, 
because I would expect dumping out the JSON would be easier for Spark 
developers to implement than converting it to HTML.

  Do I get that right? Should I make a feature request? Thanks!

 Best, Oliver

From: Andrew Or [mailto:and...@databricks.com]
Sent: Thursday, September 04, 2014 2:11 PM
To: Ruebenacker, Oliver A
Cc: Akhil Das; Wonha Ryu; user@spark.apache.org
Subject: Re: Web UI

Hi all,

The JSON version of the web UI is not officially supported; I don't believe 
this is documented anywhere.

The alternative is to set `spark.eventLog.enabled` to true before running your 
application. This will create JSON SparkListenerEvents with details about each 
task and stage as a log file. Then you can easily reconstruct the web UI after 
the application has exited. This is what the standalone Master and the History 
Server does, actually. For local mode, you can use the latter to generate your 
UI after the fact. (This is documented here: 
http://spark.apache.org/docs/latest/monitoring.html).

-Andrew

2014-09-04 5:28 GMT-07:00 Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com:

 Hello,

  Thanks for the link – this is for standalone, though, and most URLs don’t 
work for local.
  I will look into deploying as standalone on a single node for testing and 
development.

 Best, Oliver

From: Akhil Das 
[mailto:ak...@sigmoidanalytics.commailto:ak...@sigmoidanalytics.com]
Sent: Thursday, September 04, 2014 3:09 AM
To: Ruebenacker, Oliver A
Cc: Wonha Ryu; user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Web UI

Hi

You can see this 
dochttps://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security
 for all the available webUI ports.

Yes there are ways to get the data metrics in Json format, One of them is below:

​​
http://webUI:8080/json/  Or simply
​​
curl webUI:8080/json/

There are some PRs about it you can read it over here 
https://github.com/apache/spark/pull/1682

Thanks
Best Regards

On Thu, Sep 4, 2014 at 2:24 AM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  Interestingly, http://localhost:4040/metrics/json/ gives some numbers, but 
only a few which never seem to change during the application’s lifetime.

  Either the web UI has some very strange limitations, or there are some URLs 
yet to be discovered that do something interesting.

 Best,
 Oliver


From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com]
Sent: Wednesday, September 03, 2014 4:27 PM

To: Ruebenacker, Oliver A
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Web UI

Hey Oliver,

IIRC there's no JSON endpoint for application web UI. They only exist for 
cluster master and worker.

- Wonha


On Wed, Sep 3, 2014 at 12:58 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  Thanks for the help! But I tried starting with “–master local[4]” and when I 
load http://localhost:4040/json I just get forwarded to 
http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON.

 Best,
 Oliver


From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com]
Sent: Wednesday, September 03, 2014 3:36 PM
To: Ruebenacker, Oliver A
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Web UI

Hi Oliver,

Spark standalone master and worker support '/json' endpoint in web UI, which 
returns some of the information in JSON format.
I wasn't able to find relevant documentation, though.

- Wonha

On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  What is included in the Spark web UI? What are the available URLs? Can the 
information be obtained in a machine-readable way (e.g. JSON, XML, etc)?

  Thanks!

 Best,
 Oliver

Oliver Ruebenacker | Solutions Architect

Altisource™
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.comhttp://www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received

RE: Programatically running of the Spark Jobs.

2014-09-04 Thread Ruebenacker, Oliver A

 Hello,

  Can this be used as a library from within another application?
  Thanks!

 Best, Oliver

From: Matt Chu [mailto:m...@kabam.com]
Sent: Thursday, September 04, 2014 2:46 AM
To: Vicky Kak
Cc: user
Subject: Re: Programatically running of the Spark Jobs.

https://github.com/spark-jobserver/spark-jobserver

Ooyala's Spark jobserver is the current de facto standard, IIUC. I just added 
it to our prototype stack, and will begin trying it out soon. Note that you can 
only do standalone or Mesos; YARN isn't quite there yet.

(The repo just moved from https://github.com/ooyala/spark-jobserver, so don't 
trust Google on this one (yet); development is happening in the first repo.)


On Wed, Sep 3, 2014 at 11:39 PM, Vicky Kak 
vicky@gmail.commailto:vicky@gmail.com wrote:
I have been able to submit the spark jobs using the submit script but I would 
like to do it via code.
I am unable to search anything matching to my need.
I am thinking of using org.apache.spark.deploy.SparkSubmit to do so, may be 
have to write some utility that passes the parameters required for this class.
I would be interested to know how community is doing.
Thanks,
Vicky

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: Web UI

2014-09-04 Thread Ruebenacker, Oliver A

 Hello,

  Thanks for the link – this is for standalone, though, and most URLs don’t 
work for local.
  I will look into deploying as standalone on a single node for testing and 
development.

 Best, Oliver

From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Thursday, September 04, 2014 3:09 AM
To: Ruebenacker, Oliver A
Cc: Wonha Ryu; user@spark.apache.org
Subject: Re: Web UI

Hi

You can see this 
dochttps://spark.apache.org/docs/latest/spark-standalone.html#configuring-ports-for-network-security
 for all the available webUI ports.

Yes there are ways to get the data metrics in Json format, One of them is below:

​​
http://webUI:8080/json/  Or simply
​​
curl webUI:8080/json/

There are some PRs about it you can read it over here 
https://github.com/apache/spark/pull/1682

Thanks
Best Regards

On Thu, Sep 4, 2014 at 2:24 AM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  Interestingly, http://localhost:4040/metrics/json/ gives some numbers, but 
only a few which never seem to change during the application’s lifetime.

  Either the web UI has some very strange limitations, or there are some URLs 
yet to be discovered that do something interesting.

 Best,
 Oliver


From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com]
Sent: Wednesday, September 03, 2014 4:27 PM

To: Ruebenacker, Oliver A
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Web UI

Hey Oliver,

IIRC there's no JSON endpoint for application web UI. They only exist for 
cluster master and worker.

- Wonha


On Wed, Sep 3, 2014 at 12:58 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  Thanks for the help! But I tried starting with “–master local[4]” and when I 
load http://localhost:4040/json I just get forwarded to 
http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON.

 Best,
 Oliver


From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com]
Sent: Wednesday, September 03, 2014 3:36 PM
To: Ruebenacker, Oliver A
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Web UI

Hi Oliver,

Spark standalone master and worker support '/json' endpoint in web UI, which 
returns some of the information in JSON format.
I wasn't able to find relevant documentation, though.

- Wonha

On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  What is included in the Spark web UI? What are the available URLs? Can the 
information be obtained in a machine-readable way (e.g. JSON, XML, etc)?

  Thanks!

 Best,
 Oliver

Oliver Ruebenacker | Solutions Architect

Altisource™
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.comhttp://www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system.
This message and any attachments may contain information that is confidential, 
privileged or exempt from disclosure. Delivery of this message to any person 
other than the intended recipient is not intended to waive any right or 
privilege. Message transmission is not guaranteed to be secure or free of 
software viruses.
***

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system.
This message and any attachments may contain information that is confidential, 
privileged or exempt from disclosure. Delivery of this message to any person 
other than the intended recipient is not intended to waive any right or 
privilege. Message transmission is not guaranteed to be secure or free of 
software viruses

Is cluster manager same as master?

2014-09-04 Thread Ruebenacker, Oliver A

 Hello,

  Is cluster manager mentioned 
herehttps://spark.apache.org/docs/latest/cluster-overview.html the same thing 
as master mentioned 
herehttps://spark.apache.org/docs/latest/spark-standalone.html#starting-a-cluster-manually?
 Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


Setting Java properties for Standalone on Windows 7?

2014-09-04 Thread Ruebenacker, Oliver A

 Hello,

  I'm running Spark on Windows 7 as standalone, with everything on the same 
machine. No Hadoop installed. My app throws exception and worker reports:
  Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
  I had the same problem earlier when deploying local. I understand this is a 
bughttps://issues.apache.org/jira/browse/SPARK-2356 and I tried a 
workaroundhttp://qnalist.com/questions/4994960/run-spark-unit-test-on-windows-7
 which worked for local deployment, but it does not work for standalone. I also 
tried setting the Hadoop home directory via SPARK_DEMON_JAVA_OPTS and restarted 
everything, but no change.

  Any idea how to cure this by setting Java properties or otherwise? Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


Reduce truncates RDD in standalone, but fine when local.

2014-09-04 Thread Ruebenacker, Oliver A

 Hello,

  In the app below, when I run it with local[1] or local [3], I get the 
expected result - a list of the square roots of the numbers from 1 to 20.
 When I try the same app as standalone with one or two workers on the same 
machine, it will only print 1.0.
  Adding print statements into the reduce function reveals that three times it 
calculated Set(1.0) ++ Set(1.0) to yield Set(1.0).
  Any ideas? Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

package sandbox

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object SquareRoots extends App {

  def sqrt(x: Double, nIters: Long) = {
var iIter: Long = 0
var root = 1.0
while (iIter  nIters) {
  iIter += 1
  root = 0.5 * (root + x / root)
}
root
  }

  def format(x:Double) :String = {
val string =  + x
if(string.length  5) { string.substring(0,5) } else { string }
  }

  val nNums = 20
  val nIters = 10 // for 1e9, runs about 50-55 secs per stage on my 
laptop with --master local[4]
  val nStages = 10

  System.setProperty(hadoop.home.dir, c:\\Users\\ruebenac\\winutil\\)

  val conf = new SparkConf().setAppName(Square roots)
  val sc = new SparkContext(conf)

  val logPrefix =  [###] 

  def log(line: String) = { println(logPrefix + line) }

  log(Let's go!)
  for (iStage - 0 to nStages) {
log(Starting stage  + iStage)
val nums = sc.parallelize((1 to nNums).map(_.toDouble))
val roots = nums.map(sqrt(_, nIters)).map(Set(_)).reduce((roots1, roots2) 
= roots1 ++ roots2).toList.sorted
log(Square roots from 1 to  + nNums +  in  + nIters +  iterations:)
log(roots.map(format(_)).mkString( ))
log(Completed stage  + iStage)
  }
  log(Done!)
}
***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: Reduce truncates RDD in standalone, but fine when local.

2014-09-04 Thread Ruebenacker, Oliver A

 Hello,

  I tracked it down to the field nIters being uninitialized when passed to the 
reduce job while running standalone, but initialized when running local. Must 
be some strange interaction between Spark and scala.App. If I move the reduce 
job into a method and make nIters a local field, it works fine.

 Best, Oliver


From: Ruebenacker, Oliver A [mailto:oliver.ruebenac...@altisource.com]
Sent: Thursday, September 04, 2014 4:15 PM
To: user@spark.apache.org
Subject: Reduce truncates RDD in standalone, but fine when local.


 Hello,

  In the app below, when I run it with local[1] or local [3], I get the 
expected result - a list of the square roots of the numbers from 1 to 20.
 When I try the same app as standalone with one or two workers on the same 
machine, it will only print 1.0.
  Adding print statements into the reduce function reveals that three times it 
calculated Set(1.0) ++ Set(1.0) to yield Set(1.0).
  Any ideas? Thanks!

 Best, Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

package sandbox

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object SquareRoots extends App {

  def sqrt(x: Double, nIters: Long) = {
var iIter: Long = 0
var root = 1.0
while (iIter  nIters) {
  iIter += 1
  root = 0.5 * (root + x / root)
}
root
  }

  def format(x:Double) :String = {
val string =  + x
if(string.length  5) { string.substring(0,5) } else { string }
  }

  val nNums = 20
  val nIters = 10 // for 1e9, runs about 50-55 secs per stage on my 
laptop with --master local[4]
  val nStages = 10

  System.setProperty(hadoop.home.dir, c:\\Users\\ruebenac\\winutil\\)

  val conf = new SparkConf().setAppName(Square roots)
  val sc = new SparkContext(conf)

  val logPrefix =  [###] 

  def log(line: String) = { println(logPrefix + line) }

  log(Let's go!)
  for (iStage - 0 to nStages) {
log(Starting stage  + iStage)
val nums = sc.parallelize((1 to nNums).map(_.toDouble))
val roots = nums.map(sqrt(_, nIters)).map(Set(_)).reduce((roots1, roots2) 
= roots1 ++ roots2).toList.sorted
log(Square roots from 1 to  + nNums +  in  + nIters +  iterations:)
log(roots.map(format(_)).mkString( ))
log(Completed stage  + iStage)
  }
  log(Done!)
}
***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***
***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


Web UI

2014-09-03 Thread Ruebenacker, Oliver A

 Hello,

  What is included in the Spark web UI? What are the available URLs? Can the 
information be obtained in a machine-readable way (e.g. JSON, XML, etc)?

  Thanks!

 Best,
 Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: Web UI

2014-09-03 Thread Ruebenacker, Oliver A

 Hello,

  Thanks for the help! But I tried starting with “–master local[4]” and when I 
load http://localhost:4040/json I just get forwarded to 
http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON.

 Best,
 Oliver


From: Wonha Ryu [mailto:wonha@gmail.com]
Sent: Wednesday, September 03, 2014 3:36 PM
To: Ruebenacker, Oliver A
Cc: user@spark.apache.org
Subject: Re: Web UI

Hi Oliver,

Spark standalone master and worker support '/json' endpoint in web UI, which 
returns some of the information in JSON format.
I wasn't able to find relevant documentation, though.

- Wonha

On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  What is included in the Spark web UI? What are the available URLs? Can the 
information be obtained in a machine-readable way (e.g. JSON, XML, etc)?

  Thanks!

 Best,
 Oliver

Oliver Ruebenacker | Solutions Architect

Altisource™
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.comhttp://www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system.
This message and any attachments may contain information that is confidential, 
privileged or exempt from disclosure. Delivery of this message to any person 
other than the intended recipient is not intended to waive any right or 
privilege. Message transmission is not guaranteed to be secure or free of 
software viruses.
***

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: Web UI

2014-09-03 Thread Ruebenacker, Oliver A

 Hello,

  Interestingly, http://localhost:4040/metrics/json/ gives some numbers, but 
only a few which never seem to change during the application’s lifetime.

  Either the web UI has some very strange limitations, or there are some URLs 
yet to be discovered that do something interesting.

 Best,
 Oliver


From: Wonha Ryu [mailto:wonha@gmail.com]
Sent: Wednesday, September 03, 2014 4:27 PM
To: Ruebenacker, Oliver A
Cc: user@spark.apache.org
Subject: Re: Web UI

Hey Oliver,

IIRC there's no JSON endpoint for application web UI. They only exist for 
cluster master and worker.

- Wonha


On Wed, Sep 3, 2014 at 12:58 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  Thanks for the help! But I tried starting with “–master local[4]” and when I 
load http://localhost:4040/json I just get forwarded to 
http://localhost:4040/stages/, and it’s all human-readable HTML, no JSON.

 Best,
 Oliver


From: Wonha Ryu [mailto:wonha@gmail.commailto:wonha@gmail.com]
Sent: Wednesday, September 03, 2014 3:36 PM
To: Ruebenacker, Oliver A
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Web UI

Hi Oliver,

Spark standalone master and worker support '/json' endpoint in web UI, which 
returns some of the information in JSON format.
I wasn't able to find relevant documentation, though.

- Wonha

On Wed, Sep 3, 2014 at 12:12 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com 
wrote:

 Hello,

  What is included in the Spark web UI? What are the available URLs? Can the 
information be obtained in a machine-readable way (e.g. JSON, XML, etc)?

  Thanks!

 Best,
 Oliver

Oliver Ruebenacker | Solutions Architect

Altisource™
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582tel:%28617%29%20728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.comhttp://www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system.
This message and any attachments may contain information that is confidential, 
privileged or exempt from disclosure. Delivery of this message to any person 
other than the intended recipient is not intended to waive any right or 
privilege. Message transmission is not guaranteed to be secure or free of 
software viruses.
***

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system.
This message and any attachments may contain information that is confidential, 
privileged or exempt from disclosure. Delivery of this message to any person 
other than the intended recipient is not intended to waive any right or 
privilege. Message transmission is not guaranteed to be secure or free of 
software viruses.
***

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


If master is local, where are master and workers?

2014-09-03 Thread Ruebenacker, Oliver A

 Hello,

  If launched with local as master, where are master and workers? Do they 
each have a web UI? How can they be monitored?

  Thanks!

 Best,
 Oliver

Oliver Ruebenacker | Solutions Architect

Altisource(tm)
290 Congress St, 7th Floor | Boston, Massachusetts 02210
P: (617) 728-5582 | ext: 275585
oliver.ruebenac...@altisource.commailto:oliver.ruebenac...@altisource.com | 
www.Altisource.com

***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***


RE: If master is local, where are master and workers?

2014-09-03 Thread Ruebenacker, Oliver A
  How can that single process be monitored? Thanks!

-Original Message-
From: Marcelo Vanzin [mailto:van...@cloudera.com] 
Sent: Wednesday, September 03, 2014 6:32 PM
To: Ruebenacker, Oliver A
Cc: user@spark.apache.org
Subject: Re: If master is local, where are master and workers?

local means everything runs in the same process; that means there is no need 
for master and worker daemons to start processes.

On Wed, Sep 3, 2014 at 3:12 PM, Ruebenacker, Oliver A 
oliver.ruebenac...@altisource.com wrote:


  Hello,



   If launched with “local” as master, where are master and workers? Do 
 they each have a web UI? How can they be monitored?



   Thanks!



  Best,

  Oliver



 Oliver Ruebenacker | Solutions Architect



 Altisource™

 290 Congress St, 7th Floor | Boston, Massachusetts 02210

 P: (617) 728-5582 | ext: 275585

 oliver.ruebenac...@altisource.com | www.Altisource.com



 **
 *

 This email message and any attachments are intended solely for the use 
 of the addressee. If you are not the intended recipient, you are 
 prohibited from reading, disclosing, reproducing, distributing, 
 disseminating or otherwise using this transmission. If you have 
 received this message in error, please promptly notify the sender by 
 reply email and immediately delete this message from your system.

 This message and any attachments may contain information that is 
 confidential, privileged or exempt from disclosure. Delivery of this 
 message to any person other than the intended recipient is not 
 intended to waive any right or privilege. Message transmission is not 
 guaranteed to be secure or free of software viruses.
 **
 *



--
Marcelo
***

This email message and any attachments are intended solely for the use of the 
addressee. If you are not the intended recipient, you are prohibited from 
reading, disclosing, reproducing, distributing, disseminating or otherwise 
using this transmission. If you have received this message in error, please 
promptly notify the sender by reply email and immediately delete this message 
from your system. This message and any attachments may contain information that 
is confidential, privileged or exempt from disclosure. Delivery of this message 
to any person other than the intended recipient is not intended to waive any 
right or privilege. Message transmission is not guaranteed to be secure or free 
of software viruses.
***