Re: Executor shutdown hooks?

2016-04-06 Thread Hemant Bhanawat
As part of PR https://github.com/apache/spark/pull/11723, I have added a killAllTasks function that can be used to kill (rather interrupt) individual tasks before an executor exits. If this PR is accepted, for doing task level cleanups, we can add a call to this function before executor exits. The

Re: Executor shutdown hooks?

2016-04-06 Thread Reynold Xin
On Wed, Apr 6, 2016 at 4:39 PM, Sung Hwan Chung wrote: > My option so far seems to be using JVM's shutdown hook, but I was > wondering if Spark itself had an API for tasks. > Spark would be using that under the hood anyway, so you might as well just use the jvm

Re: Executor shutdown hooks?

2016-04-06 Thread Sung Hwan Chung
What I meant is 'application'. I.e., when we manually terminate an application that was submitted via spark-submit. When we manually kill an application, it seems that individual tasks do not receive the interruptException. That interruptException seems to work iff we cancel the job through

Re: Switch RDD-based MLlib APIs to maintenance mode in Spark 2.0

2016-04-06 Thread DB Tsai
+1 for renaming the jar file. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID: 0xAF08DF8D On Tue, Apr 5, 2016 at 8:02 PM, Chris Fregly wrote: > perhaps renaming to Spark ML would actually clear up code and

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-06 Thread Josh Rosen
Sure, I'll take a look. Planning to do full verification in a bit. On Wed, Apr 6, 2016 at 12:54 PM Ted Yu wrote: > Josh: > Can you check spark-1.6.1-bin-hadoop2.4.tgz ? > > $ tar zxf spark-1.6.1-bin-hadoop2.4.tgz > > gzip: stdin: not in gzip format > tar: Child returned

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-06 Thread Ted Yu
Josh: Can you check spark-1.6.1-bin-hadoop2.4.tgz ? $ tar zxf spark-1.6.1-bin-hadoop2.4.tgz gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error is not recoverable: exiting now $ ls -l !$ ls -l spark-1.6.1-bin-hadoop2.4.tgz -rw-r--r--. 1 hbase hadoop 323614720 Apr 5 19:25

Re: Executor shutdown hooks?

2016-04-06 Thread Mark Hamstra
Why would the Executors shutdown when the Job is terminated? Executors are bound to Applications, not Jobs. Furthermore, unless spark.job.interruptOnCancel is set to true, canceling the Job at the Application and DAGScheduler level won't actually interrupt the Tasks running on the Executors. If

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-06 Thread Nicholas Chammas
Thank you Josh! I confirmed that the Spark 1.6.1 / Hadoop 2.6 package on S3 is now working, and the SHA512 checks out. On Wed, Apr 6, 2016 at 3:19 PM Josh Rosen wrote: > I downloaded the Spark 1.6.1 artifacts from the Apache mirror network and > re-uploaded them to the

Executor shutdown hooks?

2016-04-06 Thread Sung Hwan Chung
Hi, I'm looking for ways to add shutdown hooks to executors : i.e., when a Job is forcefully terminated before it finishes. The scenario goes likes this : executors are running a long running job within a 'map' function. The user decides to terminate the job, then the mappers should perform some

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-06 Thread Josh Rosen
I downloaded the Spark 1.6.1 artifacts from the Apache mirror network and re-uploaded them to the spark-related-packages S3 bucket, so hopefully these packages should be fixed now. On Mon, Apr 4, 2016 at 3:37 PM Nicholas Chammas wrote: > Thanks, that was the command.

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-06 Thread Mark Hamstra
I agree with your general logic and understanding of semver. That is why if we are going to violate the strictures of semver, I'd only be happy doing so if support for Java 7 and/or Scala 2.10 were clearly understood to be deprecated already in the 2.0.0 release -- i.e. from the outset not to be

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-06 Thread Mark Hamstra
> > are you sure that library is being properly maintained? > Almost by definition the answer to that is "No; a library that hasn't been upgraded to Scala 2.11 is not being properly maintained." That means that a user of such a library is already facing the choice of whether to take on the

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-06 Thread Mridul Muralidharan
In general, I agree - it is preferable to break backward compatibility (where unavoidable) only at major versions. Unfortunately, this usually is planned better - with earlier versions announcing intent of the change - deprecation across multiple releases, defaults changed, etc. >From the thread,

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-06 Thread Sean Owen
Answering for myself: I assume everyone is following http://semver.org/ semantic versioning. If not, would be good to hear an alternative theory. For semver, strictly speaking, minor releases should be backwards-compatible for callers. Are things like stopping support for Java 8 or Scala 2.10

Re: Discuss: commit to Scala 2.10 support for Spark 2.x lifecycle

2016-04-06 Thread Dean Wampler
A few other reasons to drop 2.10 support sooner rather than later. - We at Lightbend are evaluating some fundamental changes to the REPL to make it work better for large heaps, especially for Spark. There are other recent and planned enhancements. This work will be benefit notebook

Agenda Announced for Spark Summit 2016 in San Francisco

2016-04-06 Thread Scott walent
Spark Summit 2016 (www.spark-summit.org/2016) will be held from June 6-8 at the Union Square Hilton in San Francisco, and the recently released agenda features a stellar lineup of community talks led by top engineers, architects, data scientists, researchers, entrepreneurs and analysts from UC

Re: ClassCastException when extracting and collecting DF array column type

2016-04-06 Thread Nick Pentreath
Ah I got it - Seq[(Int, Float)] is actually represented as Seq[Row] (seq of struct type) internally. So a further extraction is required, e.g. row => row.getSeq[Row](1).map { r => r.getInt(0) } On Wed, 6 Apr 2016 at 13:35 Nick Pentreath wrote: > Hi there, > > In

Re: RDD Partitions not distributed evenly to executors

2016-04-06 Thread Mike Hynes
Hello All (and Devs in particular), Thank you again for your further responses. Please find a detailed email below which identifies the cause (I believe) of the partition imbalance problem, which occurs in spark 1.5, 1.6, and a 2.0-SNAPSHOT. This is followed by follow-up questions for the dev

Big Data Interview FAQ

2016-04-06 Thread Chaturvedi Chola
Hello Team The below is a very good book on Big Data for interview preparation. https://notionpress.com/read/big-data-interview-faqs Thanks, Chaturvedi.

ClassCastException when extracting and collecting DF array column type

2016-04-06 Thread Nick Pentreath
Hi there, In writing some tests for a PR I'm working on, with a more complex array type in a DF, I ran into this issue (running off latest master). Any thoughts? *// create DF with a column of Array[(Int, Double)]* val df = sc.parallelize(Seq( (0, Array((1, 6.0), (1, 4.0))), (1, Array((1, 3.0),