RE: [sql]enable spark sql cli support spark sql

2014-08-15 Thread Cheng, Hao
If so, probably we need to add the SQL dialects switching support for 
SparkSQLCLI, as Fei suggested. What do you think the priority for this?

-Original Message-
From: Cheng Lian [mailto:lian.cs@gmail.com] 
Sent: Friday, August 15, 2014 1:57 PM
To: Cheng, Hao
Cc: scwf; dev@spark.apache.org
Subject: Re: [sql]enable spark sql cli support spark sql

In the long run, as Michael suggested in his Spark Summit 14 talk, we'd like to 
implement SQL-92, maybe with the help of Optiq.

On Aug 15, 2014, at 1:13 PM, Cheng, Hao hao.ch...@intel.com wrote:

 Actually the SQL Parser (another SQL dialect in SparkSQL) is quite weak, and 
 only support some basic queries, not sure what's the plan for its enhancement.
 
 -Original Message-
 From: scwf [mailto:wangf...@huawei.com]
 Sent: Friday, August 15, 2014 11:22 AM
 To: dev@spark.apache.org
 Subject: [sql]enable spark sql cli support spark sql
 
 hi all,
   now spark sql cli only support spark hql, i think we can enable this cli to 
 support spark sql, do you think it's necessary?
 
 --
 
 Best Regards
 Fei Wang
 
 --
 --
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
 additional commands, e-mail: dev-h...@spark.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For 
 additional commands, e-mail: dev-h...@spark.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: -1s on pull requests?

2014-08-15 Thread Nicholas Chammas
On Sun, Aug 3, 2014 at 4:35 PM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 Include the commit hash in the tests have started/completed messages, so
 that it's clear what code exactly is/has been tested for each test cycle.


This is now captured in this JIRA issue
https://issues.apache.org/jira/browse/SPARK-2912 and completed in this PR
https://github.com/apache/spark/pull/1816 which has been merged in to
master.

Example of old style: tests starting
https://github.com/apache/spark/pull/1819#issuecomment-51416510 / tests
finished https://github.com/apache/spark/pull/1819#issuecomment-51417477
(with
new classes)

Example of new style: tests starting
https://github.com/apache/spark/pull/1816#issuecomment-51855254 / tests
finished https://github.com/apache/spark/pull/1816#issuecomment-51855255
(with
new classes)

Nick


Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-15 Thread Debasish Das
I am still a bit confused that why this issue did not show up in 0.9...at
that time there was no spark-submit and the context was constructed with
low level calls...

Kryo register for ALS was always in my application code..

Was this bug introduced in 1.0 or it was always there ?
 On Aug 14, 2014 5:48 PM, Reynold Xin r...@databricks.com wrote:

 Here: https://github.com/apache/spark/pull/1948



 On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das debasish.da...@gmail.com
 wrote:

 Is there a fix that I can test ? I have the flows setup for both
 standalone and YARN runs...

 Thanks.
 Deb



 On Thu, Aug 14, 2014 at 10:59 AM, Reynold Xin r...@databricks.com
 wrote:

 Yes, I understand it might not work for custom serializer, but that is a
 much less common path.

 Basically I want a quick fix for 1.1 release (which is coming up soon).
 I would not be comfortable making big changes to class path late into the
 release cycle. We can do that for 1.2.





 On Thu, Aug 14, 2014 at 2:35 AM, Graham Dennis graham.den...@gmail.com
 wrote:

 That should work, but would you also make these changes to the
 JavaSerializer?  The API of these is the same so that you can select one or
 the other (or in theory a custom serializer)?  This also wouldn't address
 the problem of shipping custom *serializers* (not kryo registrators) in
 user jars.

 On 14 August 2014 19:23, Reynold Xin r...@databricks.com wrote:

 Graham,

 SparkEnv only creates a KryoSerializer, but as I understand that
 serializer doesn't actually initializes the registrator since that is only
 called when newKryo() is called when KryoSerializerInstance is 
 initialized.

 Basically I'm thinking a quick fix for 1.2:

 1. Add a classLoader field to KryoSerializer; initialize new
 KryoSerializerInstance with that class loader

  2. Set that classLoader to the executor's class loader when Executor
 is initialized.

 Then all deser calls should be using the executor's class loader.




 On Thu, Aug 14, 2014 at 12:53 AM, Graham Dennis 
 graham.den...@gmail.com wrote:

 Hi Reynold,

 That would solve this specific issue, but you'd need to be careful
 that you never created a serialiser instance before the first task is
 received.  Currently in Executor.TaskRunner.run a closure serialiser
 instance is created before any application jars are downloaded, but that
 could be moved.  To me, this seems a little fragile.

 However there is a related issue where you can't ship a custom
 serialiser in an application jar because the serialiser is instantiated
 when the SparkEnv object is created, which is before any tasks are 
 received
 by the executor.  The above approach wouldn't help with this problem.
  Additionally, the YARN scheduler currently uses this approach of adding
 the application jar to the Executor classpath, so it would make things a
 bit more uniform.

 Cheers,
 Graham


 On 14 August 2014 17:37, Reynold Xin r...@databricks.com wrote:

 Graham,

 Thanks for working on this. This is an important bug to fix.

  I don't have the whole context and obviously I haven't spent
 nearly as much time on this as you have, but I'm wondering what if we
 always pass the executor's ClassLoader to the Kryo serializer? Will that
 solve this problem?




 On Wed, Aug 13, 2014 at 11:59 PM, Graham Dennis 
 graham.den...@gmail.com wrote:

 Hi Deb,

 The only alternative serialiser is the JavaSerialiser (the
 default).  Theoretically Spark supports custom serialisers, but due to 
 a
 related issue, custom serialisers currently can't live in application 
 jars
 and must be available to all executors at launch.  My PR fixes this 
 issue
 as well, allowing custom serialisers to be shipped in application jars.

 Graham


 On 14 August 2014 16:56, Debasish Das debasish.da...@gmail.com
 wrote:

 Sorry I just saw Graham's email after sending my previous email
 about this bug...

 I have been seeing this same issue on our ALS runs last week but I
 thought it was due my hacky way to run mllib 1.1 snapshot on core 
 1.0...

 What's the status of this PR ? Will this fix be back-ported to
 1.0.1 as we are running 1.0.1 stable standalone cluster ?

 Till the PR merges does it make sense to not use Kryo ? What are
 the other recommended efficient serializers ?

 Thanks.
 Deb


 On Wed, Aug 13, 2014 at 2:47 PM, Graham Dennis 
 graham.den...@gmail.com wrote:

 I now have a complete pull request for this issue that I'd like
 to get
 reviewed and committed.  The PR is available here:
 https://github.com/apache/spark/pull/1890 and includes a
 testcase for the
 issue I described.  I've also submitted a related PR (
 https://github.com/apache/spark/pull/1827) that causes
 exceptions raised
 while attempting to run the custom kryo registrator not to be
 swallowed.

 Thanks,
 Graham


 On 12 August 2014 18:44, Graham Dennis graham.den...@gmail.com
 wrote:

  I've submitted a work-in-progress pull request for this issue
 that I'd
  like feedback on.  See
 https://github.com/apache/spark/pull/1890 . I've
  

Tests failing

2014-08-15 Thread Patrick Wendell
Hi All,

I noticed that all PR tests run overnight had failed due to timeouts. The
patch that updates the netty shuffle I believe somehow inflated to the
build time significantly. That patch had been tested, but one change was
made before it was merged that was not tested.

I've reverted the patch for now to see if it brings the build times back
down.

- Patrick


Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run against a 1.0.1 cluster

2014-08-15 Thread Mingyu Kim
Thanks for your response. I think I misinterpreted the
stability/compatibility guarantee with 1.0 release. It seems like the
compatibility is only at the API level.

This is interesting because it means any system/product that is built on top
of Spark and uses Spark with a long-running SparkContext connecting to the
cluster over network, will need to make sure it has the exact same version
of Spark jar as the cluster, even to the patch version. This would be
analogous to having to compile Spark against a very specific version of
Hadoop, as opposed to currently being able to use the Spark package with
CDH4 against most of the CDH4 Hadoop clusters.

Is it correct that Spark is focusing and prioritizing around the
spark-submit use cases than the aforementioned use cases? I just wanted to
better understand the future direction/prioritization of spark.

Thanks,
Mingyu

From:  Patrick Wendell pwend...@gmail.com
Date:  Thursday, August 14, 2014 at 6:32 PM
To:  Gary Malouf malouf.g...@gmail.com
Cc:  Mingyu Kim m...@palantir.com, dev@spark.apache.org
dev@spark.apache.org
Subject:  Re: [SPARK-3050] Spark program running with 1.0.2 jar cannot run
against a 1.0.1 cluster

I commented on the bug. For driver mode, you'll need to get the
corresponding version of spark-submit for Spark 1.0.2.


On Thu, Aug 14, 2014 at 3:43 PM, Gary Malouf malouf.g...@gmail.com wrote:
 To be clear, is it 'compiled' against 1.0.2 or it packaged with it?
 
 
 On Thu, Aug 14, 2014 at 6:39 PM, Mingyu Kim m...@palantir.com wrote:
 
  I ran a really simple code that runs with Spark 1.0.2 jar and connects to
  a Spark 1.0.1 cluster, but it fails with java.io.InvalidClassException. I
  filed the bug at https://issues.apache.org/jira/browse/SPARK-3050
 https://urldefense.proofpoint.com/v1/url?u=https://issues.apache.org/jira/br
 owse/SPARK-3050k=fDZpZZQMmYwf27OU23GmAQ%3D%3D%0Ar=UKDOcu6qL3KsoZhpOohNBR1uc
 PNmWnbd3eEJ9hVUdMk%3D%0Am=qvQ59wZwD7EuezjTuLzmNTRUamDRDnI7%2F0%2BnULtXk4k%3D
 %0As=b7abf7638a3e6fac2ddac9d8f0ca52f1a92945465abfb2e2d996a96d2301fec5 .
 
  I assumed the minor and patch releases shouldn¹t break compatibility. Is
  that correct?
 
  Thanks,
  Mingyu
 





smime.p7s
Description: S/MIME cryptographic signature


Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Hi Xiangrui,
I wasn't setting spark.driver.memory. I'll try that and report back.

In terms of this running on the cluster, my assumption was that calling foreach 
on an array(I converted samples using toArray) would mean counts is propagated 
locally. The object would then be serialized to executors fully propagated. Is 
this correct?

I'm actually trying to load a trie and used the hashmap as an example of 
loading data into an object that needs to be serialized. Is there a better way 
of doing this?

- jerry


 On Aug 15, 2014, at 8:36 AM, Xiangrui Meng [via Apache Spark Developers 
 List] ml-node+s1001551n7866...@n3.nabble.com wrote:
 
 Did you set driver memory? You can confirm it in the Executors tab of 
 the WebUI. Btw, the code may only work in local mode. In a cluster 
 mode, counts will be serialized to remote workers and the result is 
 not fetched by the driver after foreach. You can use RDD.countByValue 
 instead. -Xiangrui 
 
 On Fri, Aug 15, 2014 at 8:18 AM, jerryye [hidden email] wrote:
 
  Hi All, 
  I'm not sure if I should file a JIRA or if I'm missing something obvious 
  since the test code I'm trying is so simple. I've isolated the problem I'm 
  seeing to a memory issue but I don't know what parameter I need to tweak, 
  it 
  does seem related to spark.akka.frameSize. If I sample my RDD with 35% of 
  the data, everything runs to completion, with more than 35%, it fails. In 
  standalone mode, I can run on the full RDD without any problems. 
  
  // works 
  val samples = sc.textFile(s3n://geonames).sample(false,0.35) // 64MB, 
  2849439 Lines 
  
  // fails 
  val samples = sc.textFile(s3n://geonames).sample(false,0.4) // 64MB, 
  2849439 Lines 
  
  Any ideas? 
  
  1) RDD size is causing the problem. The code below as is fails but if I 
  swap 
  smallSample for samples, the code runs end to end on both cluster and 
  standalone. 
  2) The error I get is: 
  rg.apache.spark.SparkException: Job aborted due to stage failure: Task 
  3.0:1 
  failed 4 times, most recent failure: TID 12 on host 
  ip-10-251-14-74.us-west-2.compute.internal failed for unknown reason 
  Driver stacktrace: 
  at 
  org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
   
  at 
  org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1028)
   
  3) Using the 1.1.0 branch the driver freezes instead of aborting with the 
  previous error in #2. 
  4) In 1.1.0, changing spark.akka.frameSize also has the effect of no 
  progress in the driver. 
  
  Code: 
  val smallSample = sc.parallelize(Array(foo word, bar word, baz word)) 
  
  val samples = sc.textFile(s3n://geonames) // 64MB, 2849439 Lines of short 
  strings 
  
  val counts = new collection.mutable.HashMap[String, 
  Int].withDefaultValue(0) 
  
  samples.toArray.foreach(counts(_) += 1) 
  
  val result = samples.map( 
l = (l, counts.get(l)) 
  ) 
  
  result.count 
  
  Settings (with or without Kryo doesn't matter): 
  export SPARK_JAVA_OPTS=-Xms5g -Xmx10g -XX:MaxPermSize=10g 
  export SPARK_MEM=10g 
  spark.akka.frameSize 40 
  #spark.serializer org.apache.spark.serializer.KryoSerializer 
  #spark.kryoserializer.buffer.mb 1000 
  spark.executor.memory 58315m 
  spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/ 
  spark.executor.extraClassPath /root/ephemeral-hdfs/conf 
  
  
  
  -- 
  View this message in context: 
  http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865.html
  Sent from the Apache Spark Developers List mailing list archive at 
  Nabble.com. 
  
  - 
  To unsubscribe, e-mail: [hidden email] 
  For additional commands, e-mail: [hidden email] 
 
 
 - 
 To unsubscribe, e-mail: [hidden email] 
 For additional commands, e-mail: [hidden email] 
 
 
 
 If you reply to this email, your message will be added to the discussion 
 below:
 http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7866.html
 To start a new topic under Apache Spark Developers List, email 
 ml-node+s1001551n1...@n3.nabble.com 
 To unsubscribe from spark.akka.frameSize stalls job in 1.1.0, click here.
 NAML




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7871.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: Tests failing

2014-08-15 Thread Shivaram Venkataraman
Also I think Jenkins doesn't post build timeouts to github. Is there anyway
we can fix that ?
On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote:

 Hi All,

 I noticed that all PR tests run overnight had failed due to timeouts. The
 patch that updates the netty shuffle I believe somehow inflated to the
 build time significantly. That patch had been tested, but one change was
 made before it was merged that was not tested.

 I've reverted the patch for now to see if it brings the build times back
 down.

 - Patrick



Re: Tests failing

2014-08-15 Thread Nicholas Chammas
Shivaram,

Can you point us to an example of that happening? The Jenkins console
output, that is.

Nick


On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman 
shiva...@eecs.berkeley.edu wrote:

 Also I think Jenkins doesn't post build timeouts to github. Is there anyway
 we can fix that ?
 On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote:

  Hi All,
 
  I noticed that all PR tests run overnight had failed due to timeouts. The
  patch that updates the netty shuffle I believe somehow inflated to the
  build time significantly. That patch had been tested, but one change was
  made before it was merged that was not tested.
 
  I've reverted the patch for now to see if it brings the build times back
  down.
 
  - Patrick
 



Re: spark.akka.frameSize stalls job in 1.1.0

2014-08-15 Thread jerryye
Setting spark.driver.memory has no effect. It's still hanging trying to
compute result.count when I'm sampling greater than 35% regardless of what
value of spark.driver.memory I'm setting.

Here's my settings:
export SPARK_JAVA_OPTS=-Xms5g -Xmx10g -XX:MaxPermSize=10g
export SPARK_MEM=10g

in conf/spark-defaults:
spark.driver.memory 1500
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.mb 500
spark.executor.memory 58315m
spark.executor.extraLibraryPath /root/ephemeral-hdfs/lib/native/
spark.executor.extraClassPath /root/ephemeral-hdfs/conf



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/spark-akka-frameSize-stalls-job-in-1-1-0-tp7865p7877.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Tests failing

2014-08-15 Thread Patrick Wendell
Hey Nicholas,

Yeah so Jenkins has it's own timeout mechanism and it will just kill the
entire build after 120 minutes. But since run-tests is sitting in the
middle of the tests, it can't actually post a failure message.

I think run-tests-jenkins should just wrap the call to run-tests in a call
in its own timeout. It might be possible to just use this:

http://linux.die.net/man/1/timeout

- Patrick


On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 OK, I've captured this in SPARK-3076
 https://issues.apache.org/jira/browse/SPARK-3076.

 Patrick,

 Is the problem that this run-tests
 https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151
  step
 times out, and that is currently not handled gracefully? To be more
 specific, it hangs for 120 minutes, times out, but the parent script for
 some reason is also terminated. Does that sound right?

 Nick


 On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 Jenkins runs for this PR https://github.com/apache/spark/pull/1960 timed
 out without notification. The relevant Jenkins logs are at


 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull


 On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Shivaram,

 Can you point us to an example of that happening? The Jenkins console
 output, that is.

 Nick


 On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 Also I think Jenkins doesn't post build timeouts to github. Is there
 anyway
 we can fix that ?
 On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote:

  Hi All,
 
  I noticed that all PR tests run overnight had failed due to timeouts.
 The
  patch that updates the netty shuffle I believe somehow inflated to the
  build time significantly. That patch had been tested, but one change
 was
  made before it was merged that was not tested.
 
  I've reverted the patch for now to see if it brings the build times
 back
  down.
 
  - Patrick
 







Re: Tests failing

2014-08-15 Thread Nicholas Chammas
So 2 hours is a hard cap on how long a build can run. Okie doke.

Perhaps then I'll wrap the run-tests step as you suggest and limit it to
100 minutes or something, and cleanly report if it times out.

Sound good?


On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Nicholas,

 Yeah so Jenkins has it's own timeout mechanism and it will just kill the
 entire build after 120 minutes. But since run-tests is sitting in the
 middle of the tests, it can't actually post a failure message.

 I think run-tests-jenkins should just wrap the call to run-tests in a call
 in its own timeout. It might be possible to just use this:

 http://linux.die.net/man/1/timeout

 - Patrick


 On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 OK, I've captured this in SPARK-3076
 https://issues.apache.org/jira/browse/SPARK-3076.

 Patrick,

 Is the problem that this run-tests
 https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151
  step
 times out, and that is currently not handled gracefully? To be more
 specific, it hangs for 120 minutes, times out, but the parent script for
 some reason is also terminated. Does that sound right?

 Nick


 On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 Jenkins runs for this PR https://github.com/apache/spark/pull/1960
 timed out without notification. The relevant Jenkins logs are at


 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull


 On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Shivaram,

 Can you point us to an example of that happening? The Jenkins console
 output, that is.

 Nick


 On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 Also I think Jenkins doesn't post build timeouts to github. Is there
 anyway
 we can fix that ?
 On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com wrote:

  Hi All,
 
  I noticed that all PR tests run overnight had failed due to
 timeouts. The
  patch that updates the netty shuffle I believe somehow inflated to
 the
  build time significantly. That patch had been tested, but one change
 was
  made before it was merged that was not tested.
 
  I've reverted the patch for now to see if it brings the build times
 back
  down.
 
  - Patrick
 








Re: Tests failing

2014-08-15 Thread Patrick Wendell
Yeah I was thinking something like that. Basically we should just have a
variable for the timeout and I can make sure it's under the configured
Jenkins time.


On Fri, Aug 15, 2014 at 1:55 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 So 2 hours is a hard cap on how long a build can run. Okie doke.

 Perhaps then I'll wrap the run-tests step as you suggest and limit it to
 100 minutes or something, and cleanly report if it times out.

 Sound good?


 On Fri, Aug 15, 2014 at 4:43 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Hey Nicholas,

 Yeah so Jenkins has it's own timeout mechanism and it will just kill the
 entire build after 120 minutes. But since run-tests is sitting in the
 middle of the tests, it can't actually post a failure message.

 I think run-tests-jenkins should just wrap the call to run-tests in a
 call in its own timeout. It might be possible to just use this:

 http://linux.die.net/man/1/timeout

 - Patrick


 On Fri, Aug 15, 2014 at 1:31 PM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 OK, I've captured this in SPARK-3076
 https://issues.apache.org/jira/browse/SPARK-3076.

 Patrick,

 Is the problem that this run-tests
 https://github.com/apache/spark/blob/0afe5cb65a195d2f14e8dfcefdbec5dac023651f/dev/run-tests-jenkins#L151
  step
 times out, and that is currently not handled gracefully? To be more
 specific, it hangs for 120 minutes, times out, but the parent script for
 some reason is also terminated. Does that sound right?

 Nick


 On Fri, Aug 15, 2014 at 3:33 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 Jenkins runs for this PR https://github.com/apache/spark/pull/1960
 timed out without notification. The relevant Jenkins logs are at


 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18588/consoleFull

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18592/consoleFull

 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18597/consoleFull


 On Fri, Aug 15, 2014 at 11:44 AM, Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 Shivaram,

 Can you point us to an example of that happening? The Jenkins console
 output, that is.

 Nick


 On Fri, Aug 15, 2014 at 2:28 PM, Shivaram Venkataraman 
 shiva...@eecs.berkeley.edu wrote:

 Also I think Jenkins doesn't post build timeouts to github. Is there
 anyway
 we can fix that ?
 On Aug 15, 2014 9:04 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  Hi All,
 
  I noticed that all PR tests run overnight had failed due to
 timeouts. The
  patch that updates the netty shuffle I believe somehow inflated to
 the
  build time significantly. That patch had been tested, but one
 change was
  made before it was merged that was not tested.
 
  I've reverted the patch for now to see if it brings the build times
 back
  down.
 
  - Patrick
 









Re: Dynamic variables in Spark

2014-08-15 Thread Neil Ferguson
I've opened SPARK-3051 (https://issues.apache.org/jira/browse/SPARK-3051)
based on this thread.

Neil


On Thu, Jul 24, 2014 at 10:30 PM, Neil Ferguson nfergu...@gmail.com wrote:

 That would work well for me! Do you think it would be necessary to specify
 which accumulators should be available in the registry, or would we just
 broadcast all named accumulators registered in SparkContext and make them
 available in the registry?

 Anyway, I'm happy to make the necessary changes (unless someone else wants
 to).


 On Thu, Jul 24, 2014 at 10:17 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 What if we have a registry for accumulators, where you can access them
 statically by name?

 - Patrick

 On Thu, Jul 24, 2014 at 1:51 PM, Neil Ferguson nfergu...@gmail.com
 wrote:
  I realised that my last reply wasn't very clear -- let me try and
 clarify.
 
  The patch for named accumulators looks very useful, however in
 Shivaram's
  example he was able to retrieve the named task metrics (statically)
 from a
  TaskMetrics object, as follows:
 
  TaskMetrics.get(f1-time)
 
  However, I don't think this would be possible with the named
 accumulators
  -- I believe they'd need to be passed to every function that needs
 them,
  which I think would be cumbersome in any application of reasonable
  complexity.
 
  This is what I was trying to solve with my proposal for dynamic
 variables
  in Spark. However, the ability to retrieve named accumulators from a
  thread-local would work just as well for my use case. I'd be happy to
  implement either solution if there's interest.
 
  Alternatively, if I'm missing some other way to accomplish this please
 let
  me know.
 
  On a (slight) aside, I now think it would be possible to implement
 dynamic
  variables by broadcasting them. I was looking at Reynold's PR [1] to
  broadcast the RDD object, and I think it would be possible to take a
  similar approach -- that is, broadcast the serialized form, and
 deserialize
  when executing each task.
 
  [1] https://github.com/apache/spark/pull/1498
 
 
 
 
 
 
 
  On Wed, Jul 23, 2014 at 8:30 AM, Neil Ferguson nfergu...@gmail.com
 wrote:
 
  Hi Patrick.
 
  That looks very useful. The thing that seems to be missing from
 Shivaram's
  example is the ability to access TaskMetrics statically (this is the
 same
  problem that I am trying to solve with dynamic variables).
 
 
 
  You mention defining an accumulator on the RDD. Perhaps I am missing
  something here, but my understanding was that accumulators are defined
 in
  SparkContext and are not part of the RDD. Is that correct?
 
  Neil
 
  On Tue, Jul 22, 2014 at 22:21, Patrick Wendell pwend...@gmail.com
  =mailto:pwend...@gmail.com; wrote:
 
  Shivaram,
 
  You should take a look at this patch which adds support for naming
  accumulators - this is likely to get merged in soon. I actually
  started this patch by supporting named TaskMetrics similar to what
 you
  have there, but then I realized there is too much semantic overlap
  with accumulators, so I just went that route.
 
  For instance, it would be nice if any user-defined metrics are
  accessible at the driver program.
 
  https://github.com/apache/spark/pull/1309
 
  In your example, you could just define an accumulator here on the RDD
  and you'd see the incremental update in the web UI automatically.
 
  - Patrick
 
  On Tue, Jul 22, 2014 at 2:09 PM, Shivaram Venkataraman
  shiva...@eecs.berkeley.edu wrote:
   From reading Neil's first e-mail, I think the motivation is to get
 some
   metrics in ADAM ? -- I've run into a similar use-case with having
   user-defined metrics in long-running tasks and I think a nice way
 to
  solve
   this would be to have user-defined TaskMetrics.
  
   To state my problem more clearly, lets say you have two functions
 you
  use
   in a map call and want to measure how much time each of them takes.
 For
   example, if you have a code block like the one below and you want
 to
   measure how much time f1 takes as a fraction of the task.
  
   a.map { l =
   val f = f1(l)
   ... some work here ...
   }
  
   It would be really cool if we could do something like
  
   a.map { l =
   val start = System.nanoTime
   val f = f1(l)
   TaskMetrics.get(f1-time).add(System.nanoTime - start)
   }
  
   These task metrics have a different purpose from Accumulators in
 the
  sense
   that we don't need to track lineage, perform commutative operations
  etc.
   Further we also have a bunch of code in place to aggregate task
 metrics
   across a stage etc. So it would be great if we could also populate
  these in
   the UI and show median/max etc.
   I think counters [1] in Hadoop served a similar purpose.
  
   Thanks
   Shivaram
  
   [1]
  
 
 https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-8/counters
  
  
  
   On Tue, Jul 22, 2014 at 1:43 PM, Neil Ferguson nfergu...@gmail.com

  wrote:
  
   Hi Reynold
  
   Thanks for your reply.
  
   Accumulators are, of course, stored in