Re: Too many executors are created

2015-10-11 Thread Akhil Das
For some reason the executors are getting killed,

15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
app-20150929120924-/24463 is now EXITED (Command exited with code 1)

Can you paste your spark-submit command? You can also look in the executor
logs and see whats going on.

Thanks
Best Regards

On Wed, Sep 30, 2015 at 12:53 AM, Ulanov, Alexander <
alexander.ula...@hpe.com> wrote:

> Dear Spark developers,
>
>
>
> I have created a simple Spark application for spark submit. It calls a
> machine learning library from Spark MLlib that is executed in a number of
> iterations that correspond to the same number of task in Spark. It seems
> that Spark creates an executor for each task and then removes it. The
> following messages indicate this in my log:
>
>
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24463 is now RUNNING
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24463 is now EXITED (Command exited with code 1)
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor
> app-20150929120924-/24463 removed: Command exited with code 1
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove
> non-existent executor 24463
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added:
> app-20150929120924-/24464 on worker-20150929120330-16.111.35.101-46374 (
> 16.111.35.101:46374) with 12 cores
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150929120924-/24464 on hostPort 16.111.35.101:46374 with 12
> cores, 30.0 GB RAM
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24464 is now LOADING
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24464 is now RUNNING
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24464 is now EXITED (Command exited with code 1)
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor
> app-20150929120924-/24464 removed: Command exited with code 1
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove
> non-existent executor 24464
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added:
> app-20150929120924-/24465 on worker-20150929120330-16.111.35.101-46374 (
> 16.111.35.101:46374) with 12 cores
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150929120924-/24465 on hostPort 16.111.35.101:46374 with 12
> cores, 30.0 GB RAM
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24465 is now LOADING
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24465 is now EXITED (Command exited with code 1)
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Executor
> app-20150929120924-/24465 removed: Command exited with code 1
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Asked to remove
> non-existent executor 24465
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor added:
> app-20150929120924-/24466 on worker-20150929120330-16.111.35.101-46374 (
> 16.111.35.101:46374) with 12 cores
>
> 15/09/29 12:21:02 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150929120924-/24466 on hostPort 16.111.35.101:46374 with 12
> cores, 30.0 GB RAM
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24466 is now LOADING
>
> 15/09/29 12:21:02 INFO AppClient$ClientEndpoint: Executor updated:
> app-20150929120924-/24466 is now RUNNING
>
>
>
> It end up creating and removing thousands of executors. Is this a normal
> behavior?
>
>
>
> If I run the same code within spark-shell, this does not happen. Could you
> suggest what might be wrong in my setting?
>
>
>
> Best regards, Alexander
>


No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Disha Shrivastava
Dear Spark developers,

I am trying to study the effect of increasing number of cores ( CPU's) on
speedup and accuracy ( scalability with spark ANN ) performance for the
MNIST dataset using ANN implementation provided in the latest spark release.

I have formed a cluster of 5 machines with 88 cores in total.The thing
which is troubling me is that even if I have more than 2 workers in my
spark cluster the job gets divided only to 2 workers.( executors) which
Spark takes by default and hence it takes the same time . I know we can set
the number of partitions manually using sc.parallelize(train_data,10)
suppose which then divides the data in 10 partitions and all the workers
are involved in the computation.I am using the below code:


import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
import org.apache.spark.mllib.util.MLUtils
import org.apache.spark.sql.Row

// Load training data
val data = MLUtils.loadLibSVMFile(sc, "data/1_libsvm").toDF()
// Split the data into train and test
val splits = data.randomSplit(Array(0.7, 0.3), seed = 1234L)
val train = splits(0)
val test = splits(1)
//val tr=sc.parallelize(train,10);
// specify layers for the neural network:
// input layer of size 4 (features), two intermediate of size 5 and 4 and
output of size 3 (classes)
val layers = Array[Int](784,160,10)
// create the trainer and set its parameters
val trainer = new
MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100)
// train the model
val model = trainer.fit(train)
// compute precision on the test set
val result = model.transform(test)
val predictionAndLabels = result.select("prediction", "label")
val evaluator = new
MulticlassClassificationEvaluator().setMetricName("precision")
println("Precision:" + evaluator.evaluate(predictionAndLabels))

Can you please suggest me how can I ensure that the data/task is divided
equally to all the worker machines?

Thanks and Regards,
Disha Shrivastava
Masters student, IIT Delhi


Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Still confused. Why are you saying we didn't vote on an archive? refer
to the email I linked, which includes both the git tag and a link to
all generated artifacts (also in my email).

So, there are two things at play here:

First, I am not sure what you mean that a source distro can't have
binary files. It's supposed to have the source code of Spark, and
shouldn't contain binary Spark. Nothing you listed are Spark binaries.
However, a distribution might have a lot of things in it that support
the source build, like copies of tools, test files, etc.  That
explains I think the first couple lines that you identified.

Still, I am curious why you are saying that would invalidate a source
release? I have never heard anything like that.

Second, I do think there are some binaries in here that aren't
supposed to be there, like the build/ directory stuff. IIRC these were
included accidentally and won't be in the next release. At least, I
don't see why they need to be bundled. These are just local copies of
third party tools though, and don't really matter. As it happens, the
licenses that get distributed with the source distro even cover all of
this stuff. I think that's not supposed to be there, but, also don't
see it's 'invalid' as a result.


On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno  wrote:
> On 10/11/2015 05:29 PM, Sean Owen wrote:
>> Of course, but what's making you think this was a binary-only
>> distribution?
>
> I'm not saying binary-only, I am saying your source release contains
> binary programs, which would invalidate a release vote. Is there a
> release candidate package, that is voted on (saying you have a git tag
> does not satisfy this criteria, you need to vote on an actual archive of
> files, otherwise there is no cogent proof of the release being from that
> specific git tag).
>
> Here's what I found in your source release:
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/sql/hive/src/test/resources/TestUDTF.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-reflect.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/zinc-0.3.5.3/lib/sbt-interface.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/zinc-0.3.5.3/lib/compiler-interface-sources.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/zinc-0.3.5.3/lib/incremental-compiler.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-compiler.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/zinc-0.3.5.3/lib/zinc.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-library.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/misc/scala-devel/plugins/continuations.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/scala-reflect.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/akka-actors.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/typesafe-config.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/scala-actors-migration.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/scala-actors.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/scalap.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/scala-swing.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/scala-compiler.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/lib/scala-library.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/src/scala-reflect-src.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/src/scala-swing-src.jar
>
> Binary application (application/jar; charset=binary) found in
> spark-1.5.1/build/scala-2.10.4/src/scalap-src.jar
>
> Binary application (application/jar; charset=binary) found in
> 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
Out of curiosity: How can you vote on a release that contains 34 binary files? 
Surely a source code release should only contain source code and not binaries, 
as you cannot verify the content of these.

Looking forward to a response.

With regards,
Daniel.

On 10/2/2015, 4:42:31 AM, Reynold Xin  wrote: 
> Hi All,
> 
> Spark 1.5.1 is a maintenance release containing stability fixes. This
> release is based on the branch-1.5 maintenance branch of Spark. We
> *strongly recommend* all 1.5.0 users to upgrade to this release.
> 
> The full list of bug fixes is here: http://s.apache.org/spark-1.5.1
> 
> http://spark.apache.org/releases/spark-release-1-5-1.html
> 
> 
> (note: it can take a few hours for everything to be propagated, so you
> might get 404 on some download links, but everything should be in maven
> central already)
> 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
The Spark releases include a source distribution and several binary
distributions. This is pretty normal for Apache projects. What are you
referring to here?

On Sun, Oct 11, 2015 at 3:26 PM, Daniel Gruno  wrote:
> Out of curiosity: How can you vote on a release that contains 34 binary 
> files? Surely a source code release should only contain source code and not 
> binaries, as you cannot verify the content of these.
>
> Looking forward to a response.
>
> With regards,
> Daniel.
>
> On 10/2/2015, 4:42:31 AM, Reynold Xin  wrote:
>> Hi All,
>>
>> Spark 1.5.1 is a maintenance release containing stability fixes. This
>> release is based on the branch-1.5 maintenance branch of Spark. We
>> *strongly recommend* all 1.5.0 users to upgrade to this release.
>>
>> The full list of bug fixes is here: http://s.apache.org/spark-1.5.1
>>
>> http://spark.apache.org/releases/spark-release-1-5-1.html
>>
>>
>> (note: it can take a few hours for everything to be propagated, so you
>> might get 404 on some download links, but everything should be in maven
>> central already)
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
On 10/11/2015 05:12 PM, Sean Owen wrote:
> The Spark releases include a source distribution and several binary
> distributions. This is pretty normal for Apache projects. What are you
> referring to here?

Surely the _source_ distribution does not contain binaries? How else can
you vote on a release if you don't know what it contains?

You can produce convenience downloads that contain binary files, yes,
but surely you need a source-only package which is the one you vote on,
that does not contain any binaries. Do you have such a thing? And where
may I find it?

With regards,
Daniel.

> 
> On Sun, Oct 11, 2015 at 3:26 PM, Daniel Gruno  wrote:
>> Out of curiosity: How can you vote on a release that contains 34 binary 
>> files? Surely a source code release should only contain source code and not 
>> binaries, as you cannot verify the content of these.
>>
>> Looking forward to a response.
>>
>> With regards,
>> Daniel.
>>
>> On 10/2/2015, 4:42:31 AM, Reynold Xin  wrote:
>>> Hi All,
>>>
>>> Spark 1.5.1 is a maintenance release containing stability fixes. This
>>> release is based on the branch-1.5 maintenance branch of Spark. We
>>> *strongly recommend* all 1.5.0 users to upgrade to this release.
>>>
>>> The full list of bug fixes is here: http://s.apache.org/spark-1.5.1
>>>
>>> http://spark.apache.org/releases/spark-release-1-5-1.html
>>>
>>>
>>> (note: it can take a few hours for everything to be propagated, so you
>>> might get 404 on some download links, but everything should be in maven
>>> central already)
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Operations with cached RDD

2015-10-11 Thread Nitin Goyal
The problem is not that zipWithIndex is executed again. "groupBy" triggered
hash partitioning on your keys and a shuffle happened due to that and
that's why you are seeing 2 stages. You can confirm this by clicking on
latter "zipWithIndex" stage and input data has "(memory)" written which
means input data has been fetched from memory (your cached RDD).

As far as lineage/call site is concerned, I think there was a change in
spark 1.3 which excluded some classes from appearing in call site (I know
that some Spark SQL related were removed for sure).

Thanks
-Nitin


On Sat, Oct 10, 2015 at 5:05 AM, Ulanov, Alexander  wrote:

> Dear Spark developers,
>
>
>
> I am trying to understand how Spark UI displays operation with the cached
> RDD.
>
>
>
> For example, the following code caches an rdd:
>
> >> val rdd = sc.parallelize(1 to 5, 5).zipWithIndex.cache
>
> >> rdd.count
>
> The Jobs tab shows me that the RDD is evaluated:
>
> : 1 count at :24  2015/10/09 16:15:430.4
> s   1/1
>
> : 0 zipWithIndex at  :21 2015/10/09 16:15:38
> 0.6 s   1/1
>
> An I can observe this rdd in the Storage tab of Spark UI:
>
> : ZippedWithIndexRDD  Memory Deserialized 1x Replicated
>
>
>
> Then I want to make an operation over the cached RDD. I run the following
> code:
>
> >> val g = rdd.groupByKey()
>
> >> g.count
>
> The Jobs tab shows me a new Job:
>
> : 2 count at :26
>
> Inside this Job there are two stages:
>
> : 3 count at :26 +details 2015/10/09 16:16:18   0.2 s
> 5/5
>
> : 2 zipWithIndex at :21
>
> It shows that zipWithIndex is executed again. It does not seem to be
> reasonable, because the rdd is cached, and zipWithIndex is already executed
> previously.
>
>
>
> Could you explain why if I perform an operation followed by an action on a
> cached RDD, then the last operation in the lineage of the cached RDD is
> shown to be executed in the Spark UI?
>
>
>
>
>
> Best regards, Alexander
>



-- 
Regards
Nitin Goyal


Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Daniel: we did not vote on a tag. Please again read the VOTE email I
linked to you:

http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none

among other things, it contains a link to the concrete source (and
binary) distribution under vote:

http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/

You can still examine it, sure.

Dependencies are *not* bundled in the source release. You're again
misunderstanding what you are seeing. Read my email again.

I am still pretty confused about what the problem is. This is entirely
business as usual for ASF projects. I'll follow up with you offline if
you have any more doubts.

On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno  wrote:
> Here's my issue:
>
> How am I to audit that the dependencies you bundle are in fact what you
> claim they are?  How do I know they don't contain malware or - in light
> of recent events - emissions test rigging? ;)
>
> I am not interested in a git tag - that means nothing in the ASF voting
> process, you cannot vote on a tag, only on a release candidate. The VCS
> in use is irrelevant in this issue. If you can point me to a release
> candidate archive that was voted upon and does not contain binary
> applications, all is well.
>
> If there is no such thing, and we cannot come to an understanding, I
> will exercise my ASF Members' rights and bring this to the attention of
> the board of directors and ask for a clarification of the legality of this.
>
> I find it highly irregular. Perhaps it is something some projects do in
> the Java community, but that doesn't make it permissible in my view.
>
> With regards,
> Daniel.
>
>
> On 10/11/2015 05:42 PM, Sean Owen wrote:
>> Still confused. Why are you saying we didn't vote on an archive? refer
>> to the email I linked, which includes both the git tag and a link to
>> all generated artifacts (also in my email).
>>
>> So, there are two things at play here:
>>
>> First, I am not sure what you mean that a source distro can't have
>> binary files. It's supposed to have the source code of Spark, and
>> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
>> However, a distribution might have a lot of things in it that support
>> the source build, like copies of tools, test files, etc.  That
>> explains I think the first couple lines that you identified.
>>
>> Still, I am curious why you are saying that would invalidate a source
>> release? I have never heard anything like that.
>>
>> Second, I do think there are some binaries in here that aren't
>> supposed to be there, like the build/ directory stuff. IIRC these were
>> included accidentally and won't be in the next release. At least, I
>> don't see why they need to be bundled. These are just local copies of
>> third party tools though, and don't really matter. As it happens, the
>> licenses that get distributed with the source distro even cover all of
>> this stuff. I think that's not supposed to be there, but, also don't
>> see it's 'invalid' as a result.
>>
>>
>> On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno  wrote:
>>> On 10/11/2015 05:29 PM, Sean Owen wrote:
 Of course, but what's making you think this was a binary-only
 distribution?
>>>
>>> I'm not saying binary-only, I am saying your source release contains
>>> binary programs, which would invalidate a release vote. Is there a
>>> release candidate package, that is voted on (saying you have a git tag
>>> does not satisfy this criteria, you need to vote on an actual archive of
>>> files, otherwise there is no cogent proof of the release being from that
>>> specific git tag).
>>>
>>> Here's what I found in your source release:
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/sql/hive/src/test/resources/TestUDTF.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-reflect.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/build/zinc-0.3.5.3/lib/sbt-interface.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/build/zinc-0.3.5.3/lib/compiler-interface-sources.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/build/zinc-0.3.5.3/lib/incremental-compiler.jar
>>>
>>> Binary application (application/jar; charset=binary) found in
>>> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-compiler.jar
>>>
>>> Binary application 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Nicholas Chammas
You can find the source tagged for release on GitHub
, as was clearly
linked to in the thread to vote on the release (titled "[VOTE] Release
Apache Spark 1.5.1 (RC1)").

Is there something about that thread that was unclear?

Nick


On Sun, Oct 11, 2015 at 11:23 AM Daniel Gruno  wrote:

> On 10/11/2015 05:12 PM, Sean Owen wrote:
> > The Spark releases include a source distribution and several binary
> > distributions. This is pretty normal for Apache projects. What are you
> > referring to here?
>
> Surely the _source_ distribution does not contain binaries? How else can
> you vote on a release if you don't know what it contains?
>
> You can produce convenience downloads that contain binary files, yes,
> but surely you need a source-only package which is the one you vote on,
> that does not contain any binaries. Do you have such a thing? And where
> may I find it?
>
> With regards,
> Daniel.
>
> >
> > On Sun, Oct 11, 2015 at 3:26 PM, Daniel Gruno 
> wrote:
> >> Out of curiosity: How can you vote on a release that contains 34 binary
> files? Surely a source code release should only contain source code and not
> binaries, as you cannot verify the content of these.
> >>
> >> Looking forward to a response.
> >>
> >> With regards,
> >> Daniel.
> >>
> >> On 10/2/2015, 4:42:31 AM, Reynold Xin  wrote:
> >>> Hi All,
> >>>
> >>> Spark 1.5.1 is a maintenance release containing stability fixes. This
> >>> release is based on the branch-1.5 maintenance branch of Spark. We
> >>> *strongly recommend* all 1.5.0 users to upgrade to this release.
> >>>
> >>> The full list of bug fixes is here: http://s.apache.org/spark-1.5.1
> >>>
> >>> http://spark.apache.org/releases/spark-release-1-5-1.html
> >>>
> >>>
> >>> (note: it can take a few hours for everything to be propagated, so you
> >>> might get 404 on some download links, but everything should be in maven
> >>> central already)
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Of course, but what's making you think this was a binary-only
distribution? The downloads page points you directly to the source
distro: http://spark.apache.org/downloads.html

Look for the last vote, and you'll find it was of course a vote on
source (and binary) artifacts:
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/

On Sun, Oct 11, 2015 at 4:23 PM, Daniel Gruno  wrote:
> On 10/11/2015 05:12 PM, Sean Owen wrote:
>> The Spark releases include a source distribution and several binary
>> distributions. This is pretty normal for Apache projects. What are you
>> referring to here?
>
> Surely the _source_ distribution does not contain binaries? How else can
> you vote on a release if you don't know what it contains?
>
> You can produce convenience downloads that contain binary files, yes,
> but surely you need a source-only package which is the one you vote on,
> that does not contain any binaries. Do you have such a thing? And where
> may I find it?
>
> With regards,
> Daniel.
>
>>
>> On Sun, Oct 11, 2015 at 3:26 PM, Daniel Gruno  wrote:
>>> Out of curiosity: How can you vote on a release that contains 34 binary 
>>> files? Surely a source code release should only contain source code and not 
>>> binaries, as you cannot verify the content of these.
>>>
>>> Looking forward to a response.
>>>
>>> With regards,
>>> Daniel.
>>>
>>> On 10/2/2015, 4:42:31 AM, Reynold Xin  wrote:
 Hi All,

 Spark 1.5.1 is a maintenance release containing stability fixes. This
 release is based on the branch-1.5 maintenance branch of Spark. We
 *strongly recommend* all 1.5.0 users to upgrade to this release.

 The full list of bug fixes is here: http://s.apache.org/spark-1.5.1

 http://spark.apache.org/releases/spark-release-1-5-1.html


 (note: it can take a few hours for everything to be propagated, so you
 might get 404 on some download links, but everything should be in maven
 central already)

>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
On 10/11/2015 05:29 PM, Sean Owen wrote:
> Of course, but what's making you think this was a binary-only
> distribution? 

I'm not saying binary-only, I am saying your source release contains
binary programs, which would invalidate a release vote. Is there a
release candidate package, that is voted on (saying you have a git tag
does not satisfy this criteria, you need to vote on an actual archive of
files, otherwise there is no cogent proof of the release being from that
specific git tag).

Here's what I found in your source release:

Binary application (application/jar; charset=binary) found in
spark-1.5.1/sql/hive/src/test/resources/data/files/TestSerDe.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/sql/hive/src/test/resources/TestUDTF.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/zinc-0.3.5.3/lib/scala-reflect.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/zinc-0.3.5.3/lib/sbt-interface.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/zinc-0.3.5.3/lib/compiler-interface-sources.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/zinc-0.3.5.3/lib/incremental-compiler.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/zinc-0.3.5.3/lib/scala-compiler.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/zinc-0.3.5.3/lib/zinc.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/zinc-0.3.5.3/lib/scala-library.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/misc/scala-devel/plugins/continuations.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/scala-reflect.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/akka-actors.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/typesafe-config.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/scala-actors-migration.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/scala-actors.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/scalap.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/scala-swing.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/scala-compiler.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/lib/scala-library.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/scala-reflect-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/scala-swing-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/scalap-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/scala-actors-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/scala-partest-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/scala-library-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/fjbg-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/scala-compiler-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/scala-2.10.4/src/msil-src.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/apache-maven-3.3.3/boot/plexus-classworlds-2.5.2.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/apache-maven-3.3.3/lib/guava-18.0.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/apache-maven-3.3.3/lib/wagon-http-2.9-shaded.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/apache-maven-3.3.3/lib/jsr250-api-1.0.jar

Binary application (application/jar; charset=binary) found in
spark-1.5.1/build/apache-maven-3.3.3/lib/javax.inject-1.jar



The downloads page points you directly to the source
> distro: http://spark.apache.org/downloads.html
> 
> Look for the last vote, and you'll find it was of course a vote on
> source (and binary) artifacts:
> 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Daniel Gruno
Here's my issue:

How am I to audit that the dependencies you bundle are in fact what you
claim they are?  How do I know they don't contain malware or - in light
of recent events - emissions test rigging? ;)

I am not interested in a git tag - that means nothing in the ASF voting
process, you cannot vote on a tag, only on a release candidate. The VCS
in use is irrelevant in this issue. If you can point me to a release
candidate archive that was voted upon and does not contain binary
applications, all is well.

If there is no such thing, and we cannot come to an understanding, I
will exercise my ASF Members' rights and bring this to the attention of
the board of directors and ask for a clarification of the legality of this.

I find it highly irregular. Perhaps it is something some projects do in
the Java community, but that doesn't make it permissible in my view.

With regards,
Daniel.


On 10/11/2015 05:42 PM, Sean Owen wrote:
> Still confused. Why are you saying we didn't vote on an archive? refer
> to the email I linked, which includes both the git tag and a link to
> all generated artifacts (also in my email).
> 
> So, there are two things at play here:
> 
> First, I am not sure what you mean that a source distro can't have
> binary files. It's supposed to have the source code of Spark, and
> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
> However, a distribution might have a lot of things in it that support
> the source build, like copies of tools, test files, etc.  That
> explains I think the first couple lines that you identified.
> 
> Still, I am curious why you are saying that would invalidate a source
> release? I have never heard anything like that.
> 
> Second, I do think there are some binaries in here that aren't
> supposed to be there, like the build/ directory stuff. IIRC these were
> included accidentally and won't be in the next release. At least, I
> don't see why they need to be bundled. These are just local copies of
> third party tools though, and don't really matter. As it happens, the
> licenses that get distributed with the source distro even cover all of
> this stuff. I think that's not supposed to be there, but, also don't
> see it's 'invalid' as a result.
> 
> 
> On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno  wrote:
>> On 10/11/2015 05:29 PM, Sean Owen wrote:
>>> Of course, but what's making you think this was a binary-only
>>> distribution?
>>
>> I'm not saying binary-only, I am saying your source release contains
>> binary programs, which would invalidate a release vote. Is there a
>> release candidate package, that is voted on (saying you have a git tag
>> does not satisfy this criteria, you need to vote on an actual archive of
>> files, otherwise there is no cogent proof of the release being from that
>> specific git tag).
>>
>> Here's what I found in your source release:
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/sql/hive/src/test/resources/TestUDTF.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/R/pkg/inst/test_support/sparktestjar_2.10-1.0.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-reflect.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/zinc-0.3.5.3/lib/sbt-interface.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/zinc-0.3.5.3/lib/compiler-interface-sources.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/zinc-0.3.5.3/lib/incremental-compiler.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-compiler.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/zinc-0.3.5.3/lib/zinc.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/zinc-0.3.5.3/lib/scala-library.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/scala-2.10.4/misc/scala-devel/plugins/continuations.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/scala-2.10.4/lib/scala-reflect.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/scala-2.10.4/lib/akka-actors.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/scala-2.10.4/lib/typesafe-config.jar
>>
>> Binary application (application/jar; charset=binary) found in
>> spark-1.5.1/build/scala-2.10.4/lib/scala-actors-migration.jar
>>
>> Binary application 

Re: No speedup in MultiLayerPerceptronClassifier with increase in number of cores

2015-10-11 Thread Mike Hynes
Having only 2 workers for 5 machines would be your problem: you
probably want 1 worker per physical machine, which entails running the
spark-daemon.sh script to start a worker on those machines.
The partitioning is agnositic to how many executors are available for
running the tasks, so you can't do scalability tests in the manner
you're thinking by changing the partitioning.

On 10/11/15, Disha Shrivastava  wrote:
> Dear Spark developers,
>
> I am trying to study the effect of increasing number of cores ( CPU's) on
> speedup and accuracy ( scalability with spark ANN ) performance for the
> MNIST dataset using ANN implementation provided in the latest spark
> release.
>
> I have formed a cluster of 5 machines with 88 cores in total.The thing
> which is troubling me is that even if I have more than 2 workers in my
> spark cluster the job gets divided only to 2 workers.( executors) which
> Spark takes by default and hence it takes the same time . I know we can set
> the number of partitions manually using sc.parallelize(train_data,10)
> suppose which then divides the data in 10 partitions and all the workers
> are involved in the computation.I am using the below code:
>
>
> import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
> import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
> import org.apache.spark.mllib.util.MLUtils
> import org.apache.spark.sql.Row
>
> // Load training data
> val data = MLUtils.loadLibSVMFile(sc, "data/1_libsvm").toDF()
> // Split the data into train and test
> val splits = data.randomSplit(Array(0.7, 0.3), seed = 1234L)
> val train = splits(0)
> val test = splits(1)
> //val tr=sc.parallelize(train,10);
> // specify layers for the neural network:
> // input layer of size 4 (features), two intermediate of size 5 and 4 and
> output of size 3 (classes)
> val layers = Array[Int](784,160,10)
> // create the trainer and set its parameters
> val trainer = new
> MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100)
> // train the model
> val model = trainer.fit(train)
> // compute precision on the test set
> val result = model.transform(test)
> val predictionAndLabels = result.select("prediction", "label")
> val evaluator = new
> MulticlassClassificationEvaluator().setMetricName("precision")
> println("Precision:" + evaluator.evaluate(predictionAndLabels))
>
> Can you please suggest me how can I ensure that the data/task is divided
> equally to all the worker machines?
>
> Thanks and Regards,
> Disha Shrivastava
> Masters student, IIT Delhi
>


-- 
Thanks,
Mike

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
Agree, but we are talking about the build/ bit right?

I don't agree that it invalidates the release, which is probably the more
important idea. As a point of process, you would not want to modify and
republish the artifact that was already released after being voted on -
unless it was invalid in which case we spin up 1.5.1.1 or something.

But that build/ directory should go in future releases.

I think he is talking about more than this though and the other jars look
like they are part of tests, and still nothing to do with Spark binaries.
Those can and should stay.

On Mon, Oct 12, 2015, 5:35 AM Patrick Wendell  wrote:

> I think Daniel is correct here. The source artifact incorrectly includes
> jars. It is inadvertent and not part of our intended release process. This
> was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
> updating our build scripts to fix it. However, our build environment was
> not using the most current version of the build scripts. See related links:
>
> https://issues.apache.org/jira/browse/SPARK-10511
> https://github.com/apache/spark/pull/8774/files
>
> I can update our build environment and we can repackage the Spark 1.5.1
> source tarball. To not include sources.
>
>
> - Patrick
>
> On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen  wrote:
>
>> Daniel: we did not vote on a tag. Please again read the VOTE email I
>> linked to you:
>>
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>>
>> among other things, it contains a link to the concrete source (and
>> binary) distribution under vote:
>>
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>
>> You can still examine it, sure.
>>
>> Dependencies are *not* bundled in the source release. You're again
>> misunderstanding what you are seeing. Read my email again.
>>
>> I am still pretty confused about what the problem is. This is entirely
>> business as usual for ASF projects. I'll follow up with you offline if
>> you have any more doubts.
>>
>> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno 
>> wrote:
>> > Here's my issue:
>> >
>> > How am I to audit that the dependencies you bundle are in fact what you
>> > claim they are?  How do I know they don't contain malware or - in light
>> > of recent events - emissions test rigging? ;)
>> >
>> > I am not interested in a git tag - that means nothing in the ASF voting
>> > process, you cannot vote on a tag, only on a release candidate. The VCS
>> > in use is irrelevant in this issue. If you can point me to a release
>> > candidate archive that was voted upon and does not contain binary
>> > applications, all is well.
>> >
>> > If there is no such thing, and we cannot come to an understanding, I
>> > will exercise my ASF Members' rights and bring this to the attention of
>> > the board of directors and ask for a clarification of the legality of
>> this.
>> >
>> > I find it highly irregular. Perhaps it is something some projects do in
>> > the Java community, but that doesn't make it permissible in my view.
>> >
>> > With regards,
>> > Daniel.
>> >
>> >
>> > On 10/11/2015 05:42 PM, Sean Owen wrote:
>> >> Still confused. Why are you saying we didn't vote on an archive? refer
>> >> to the email I linked, which includes both the git tag and a link to
>> >> all generated artifacts (also in my email).
>> >>
>> >> So, there are two things at play here:
>> >>
>> >> First, I am not sure what you mean that a source distro can't have
>> >> binary files. It's supposed to have the source code of Spark, and
>> >> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
>> >> However, a distribution might have a lot of things in it that support
>> >> the source build, like copies of tools, test files, etc.  That
>> >> explains I think the first couple lines that you identified.
>> >>
>> >> Still, I am curious why you are saying that would invalidate a source
>> >> release? I have never heard anything like that.
>> >>
>> >> Second, I do think there are some binaries in here that aren't
>> >> supposed to be there, like the build/ directory stuff. IIRC these were
>> >> included accidentally and won't be in the next release. At least, I
>> >> don't see why they need to be bundled. These are just local copies of
>> >> third party tools though, and don't really matter. As it happens, the
>> >> licenses that get distributed with the source distro even cover all of
>> >> this stuff. I think that's not supposed to be there, but, also don't
>> >> see it's 'invalid' as a result.
>> >>
>> >>
>> >> On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno 
>> wrote:
>> >>> On 10/11/2015 05:29 PM, Sean Owen wrote:
>>  Of course, but what's making you think this was a binary-only
>>  distribution?
>> >>>
>> >>> I'm not saying binary-only, I am saying your source release contains
>> >>> binary programs, which would 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
I think Daniel is correct here. The source artifact incorrectly includes
jars. It is inadvertent and not part of our intended release process. This
was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
updating our build scripts to fix it. However, our build environment was
not using the most current version of the build scripts. See related links:

https://issues.apache.org/jira/browse/SPARK-10511
https://github.com/apache/spark/pull/8774/files

I can update our build environment and we can repackage the Spark 1.5.1
source tarball. To not include sources.

- Patrick

On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen  wrote:

> Daniel: we did not vote on a tag. Please again read the VOTE email I
> linked to you:
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>
> among other things, it contains a link to the concrete source (and
> binary) distribution under vote:
>
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>
> You can still examine it, sure.
>
> Dependencies are *not* bundled in the source release. You're again
> misunderstanding what you are seeing. Read my email again.
>
> I am still pretty confused about what the problem is. This is entirely
> business as usual for ASF projects. I'll follow up with you offline if
> you have any more doubts.
>
> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno 
> wrote:
> > Here's my issue:
> >
> > How am I to audit that the dependencies you bundle are in fact what you
> > claim they are?  How do I know they don't contain malware or - in light
> > of recent events - emissions test rigging? ;)
> >
> > I am not interested in a git tag - that means nothing in the ASF voting
> > process, you cannot vote on a tag, only on a release candidate. The VCS
> > in use is irrelevant in this issue. If you can point me to a release
> > candidate archive that was voted upon and does not contain binary
> > applications, all is well.
> >
> > If there is no such thing, and we cannot come to an understanding, I
> > will exercise my ASF Members' rights and bring this to the attention of
> > the board of directors and ask for a clarification of the legality of
> this.
> >
> > I find it highly irregular. Perhaps it is something some projects do in
> > the Java community, but that doesn't make it permissible in my view.
> >
> > With regards,
> > Daniel.
> >
> >
> > On 10/11/2015 05:42 PM, Sean Owen wrote:
> >> Still confused. Why are you saying we didn't vote on an archive? refer
> >> to the email I linked, which includes both the git tag and a link to
> >> all generated artifacts (also in my email).
> >>
> >> So, there are two things at play here:
> >>
> >> First, I am not sure what you mean that a source distro can't have
> >> binary files. It's supposed to have the source code of Spark, and
> >> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
> >> However, a distribution might have a lot of things in it that support
> >> the source build, like copies of tools, test files, etc.  That
> >> explains I think the first couple lines that you identified.
> >>
> >> Still, I am curious why you are saying that would invalidate a source
> >> release? I have never heard anything like that.
> >>
> >> Second, I do think there are some binaries in here that aren't
> >> supposed to be there, like the build/ directory stuff. IIRC these were
> >> included accidentally and won't be in the next release. At least, I
> >> don't see why they need to be bundled. These are just local copies of
> >> third party tools though, and don't really matter. As it happens, the
> >> licenses that get distributed with the source distro even cover all of
> >> this stuff. I think that's not supposed to be there, but, also don't
> >> see it's 'invalid' as a result.
> >>
> >>
> >> On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno 
> wrote:
> >>> On 10/11/2015 05:29 PM, Sean Owen wrote:
>  Of course, but what's making you think this was a binary-only
>  distribution?
> >>>
> >>> I'm not saying binary-only, I am saying your source release contains
> >>> binary programs, which would invalidate a release vote. Is there a
> >>> release candidate package, that is voted on (saying you have a git tag
> >>> does not satisfy this criteria, you need to vote on an actual archive
> of
> >>> files, otherwise there is no cogent proof of the release being from
> that
> >>> specific git tag).
> >>>
> >>> Here's what I found in your source release:
> >>>
> >>> Binary application (application/jar; charset=binary) found in
> >>> spark-1.5.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
> >>>
> >>> Binary application (application/jar; charset=binary) found in
> >>>
> spark-1.5.1/sql/hive/src/test/resources/regression-test-SPARK-8489/test.jar
> >>>
> >>> Binary application (application/jar; charset=binary) found in
> >>> 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
*to not include binaries.

On Sun, Oct 11, 2015 at 9:35 PM, Patrick Wendell  wrote:

> I think Daniel is correct here. The source artifact incorrectly includes
> jars. It is inadvertent and not part of our intended release process. This
> was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
> updating our build scripts to fix it. However, our build environment was
> not using the most current version of the build scripts. See related links:
>
> https://issues.apache.org/jira/browse/SPARK-10511
> https://github.com/apache/spark/pull/8774/files
>
> I can update our build environment and we can repackage the Spark 1.5.1
> source tarball. To not include sources.
>
> - Patrick
>
> On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen  wrote:
>
>> Daniel: we did not vote on a tag. Please again read the VOTE email I
>> linked to you:
>>
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>>
>> among other things, it contains a link to the concrete source (and
>> binary) distribution under vote:
>>
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>
>> You can still examine it, sure.
>>
>> Dependencies are *not* bundled in the source release. You're again
>> misunderstanding what you are seeing. Read my email again.
>>
>> I am still pretty confused about what the problem is. This is entirely
>> business as usual for ASF projects. I'll follow up with you offline if
>> you have any more doubts.
>>
>> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno 
>> wrote:
>> > Here's my issue:
>> >
>> > How am I to audit that the dependencies you bundle are in fact what you
>> > claim they are?  How do I know they don't contain malware or - in light
>> > of recent events - emissions test rigging? ;)
>> >
>> > I am not interested in a git tag - that means nothing in the ASF voting
>> > process, you cannot vote on a tag, only on a release candidate. The VCS
>> > in use is irrelevant in this issue. If you can point me to a release
>> > candidate archive that was voted upon and does not contain binary
>> > applications, all is well.
>> >
>> > If there is no such thing, and we cannot come to an understanding, I
>> > will exercise my ASF Members' rights and bring this to the attention of
>> > the board of directors and ask for a clarification of the legality of
>> this.
>> >
>> > I find it highly irregular. Perhaps it is something some projects do in
>> > the Java community, but that doesn't make it permissible in my view.
>> >
>> > With regards,
>> > Daniel.
>> >
>> >
>> > On 10/11/2015 05:42 PM, Sean Owen wrote:
>> >> Still confused. Why are you saying we didn't vote on an archive? refer
>> >> to the email I linked, which includes both the git tag and a link to
>> >> all generated artifacts (also in my email).
>> >>
>> >> So, there are two things at play here:
>> >>
>> >> First, I am not sure what you mean that a source distro can't have
>> >> binary files. It's supposed to have the source code of Spark, and
>> >> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
>> >> However, a distribution might have a lot of things in it that support
>> >> the source build, like copies of tools, test files, etc.  That
>> >> explains I think the first couple lines that you identified.
>> >>
>> >> Still, I am curious why you are saying that would invalidate a source
>> >> release? I have never heard anything like that.
>> >>
>> >> Second, I do think there are some binaries in here that aren't
>> >> supposed to be there, like the build/ directory stuff. IIRC these were
>> >> included accidentally and won't be in the next release. At least, I
>> >> don't see why they need to be bundled. These are just local copies of
>> >> third party tools though, and don't really matter. As it happens, the
>> >> licenses that get distributed with the source distro even cover all of
>> >> this stuff. I think that's not supposed to be there, but, also don't
>> >> see it's 'invalid' as a result.
>> >>
>> >>
>> >> On Sun, Oct 11, 2015 at 4:33 PM, Daniel Gruno 
>> wrote:
>> >>> On 10/11/2015 05:29 PM, Sean Owen wrote:
>>  Of course, but what's making you think this was a binary-only
>>  distribution?
>> >>>
>> >>> I'm not saying binary-only, I am saying your source release contains
>> >>> binary programs, which would invalidate a release vote. Is there a
>> >>> release candidate package, that is voted on (saying you have a git tag
>> >>> does not satisfy this criteria, you need to vote on an actual archive
>> of
>> >>> files, otherwise there is no cogent proof of the release being from
>> that
>> >>> specific git tag).
>> >>>
>> >>> Here's what I found in your source release:
>> >>>
>> >>> Binary application (application/jar; charset=binary) found in
>> >>> spark-1.5.1/sql/hive/src/test/resources/data/files/TestSerDe.jar
>> >>>
>> >>> Binary application 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
Oh I see - yes it's the build/. I always thought release votes related to a
source tag rather than specific binaries. But maybe we can just fix it in
1.5.2 if there is concern about mutating binaries. It seems reasonable to
me.

For tests... in the past we've tried to avoid having jars inside of the
source tree, including some effort to generate jars on the fly which a lot
of our tests use. I am not sure whether it's a firm policy that you can't
have jars in test folders, though. If it is, we could probably do some
magic to get rid of these few ones that have crept in.

- Patrick

On Sun, Oct 11, 2015 at 9:57 PM, Sean Owen  wrote:

> Agree, but we are talking about the build/ bit right?
>
> I don't agree that it invalidates the release, which is probably the more
> important idea. As a point of process, you would not want to modify and
> republish the artifact that was already released after being voted on -
> unless it was invalid in which case we spin up 1.5.1.1 or something.
>
> But that build/ directory should go in future releases.
>
> I think he is talking about more than this though and the other jars look
> like they are part of tests, and still nothing to do with Spark binaries.
> Those can and should stay.
>
> On Mon, Oct 12, 2015, 5:35 AM Patrick Wendell  wrote:
>
>> I think Daniel is correct here. The source artifact incorrectly includes
>> jars. It is inadvertent and not part of our intended release process. This
>> was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
>> updating our build scripts to fix it. However, our build environment was
>> not using the most current version of the build scripts. See related links:
>>
>> https://issues.apache.org/jira/browse/SPARK-10511
>> https://github.com/apache/spark/pull/8774/files
>>
>> I can update our build environment and we can repackage the Spark 1.5.1
>> source tarball. To not include sources.
>>
>>
>> - Patrick
>>
>> On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen  wrote:
>>
>>> Daniel: we did not vote on a tag. Please again read the VOTE email I
>>> linked to you:
>>>
>>>
>>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>>>
>>> among other things, it contains a link to the concrete source (and
>>> binary) distribution under vote:
>>>
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>>>
>>> You can still examine it, sure.
>>>
>>> Dependencies are *not* bundled in the source release. You're again
>>> misunderstanding what you are seeing. Read my email again.
>>>
>>> I am still pretty confused about what the problem is. This is entirely
>>> business as usual for ASF projects. I'll follow up with you offline if
>>> you have any more doubts.
>>>
>>> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno 
>>> wrote:
>>> > Here's my issue:
>>> >
>>> > How am I to audit that the dependencies you bundle are in fact what you
>>> > claim they are?  How do I know they don't contain malware or - in light
>>> > of recent events - emissions test rigging? ;)
>>> >
>>> > I am not interested in a git tag - that means nothing in the ASF voting
>>> > process, you cannot vote on a tag, only on a release candidate. The VCS
>>> > in use is irrelevant in this issue. If you can point me to a release
>>> > candidate archive that was voted upon and does not contain binary
>>> > applications, all is well.
>>> >
>>> > If there is no such thing, and we cannot come to an understanding, I
>>> > will exercise my ASF Members' rights and bring this to the attention of
>>> > the board of directors and ask for a clarification of the legality of
>>> this.
>>> >
>>> > I find it highly irregular. Perhaps it is something some projects do in
>>> > the Java community, but that doesn't make it permissible in my view.
>>> >
>>> > With regards,
>>> > Daniel.
>>> >
>>> >
>>> > On 10/11/2015 05:42 PM, Sean Owen wrote:
>>> >> Still confused. Why are you saying we didn't vote on an archive? refer
>>> >> to the email I linked, which includes both the git tag and a link to
>>> >> all generated artifacts (also in my email).
>>> >>
>>> >> So, there are two things at play here:
>>> >>
>>> >> First, I am not sure what you mean that a source distro can't have
>>> >> binary files. It's supposed to have the source code of Spark, and
>>> >> shouldn't contain binary Spark. Nothing you listed are Spark binaries.
>>> >> However, a distribution might have a lot of things in it that support
>>> >> the source build, like copies of tools, test files, etc.  That
>>> >> explains I think the first couple lines that you identified.
>>> >>
>>> >> Still, I am curious why you are saying that would invalidate a source
>>> >> release? I have never heard anything like that.
>>> >>
>>> >> Second, I do think there are some binaries in here that aren't
>>> >> supposed to be there, like the build/ directory stuff. IIRC these were
>>> >> 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Patrick Wendell
Yeah I mean I definitely think we're not violating the *spirit* of the "no
binaries" policy, in that we do not include any binary code that is used at
runtime. This is because the binaries we distribute relate only to build
and testing.

Whether we are violating the *letter* of the policy, I'm not so sure. In
the very strictest interpretation of "there cannot be any binary files in
your downloaded tarball" - we aren't honoring that. We got a lot of people
complaining about the sbt jar for instance when we were in the incubator. I
found those complaints a little pedantic, but we ended up removing it from
our source tree and adding things to download it for the user.

- Patrick

On Sun, Oct 11, 2015 at 10:12 PM, Sean Owen  wrote:

> No we are voting on the artifacts being released (too) in principle.
> Although of course the artifacts should be a deterministic function of the
> source at a certain point in time.
>
> I think the concern is about putting Spark binaries or its dependencies
> into a source release. That should not happen, but it is not what has
> happened here.
>
> On Mon, Oct 12, 2015, 6:03 AM Patrick Wendell  wrote:
>
>> Oh I see - yes it's the build/. I always thought release votes related to
>> a source tag rather than specific binaries. But maybe we can just fix it in
>> 1.5.2 if there is concern about mutating binaries. It seems reasonable to
>> me.
>>
>> For tests... in the past we've tried to avoid having jars inside of the
>> source tree, including some effort to generate jars on the fly which a lot
>> of our tests use. I am not sure whether it's a firm policy that you can't
>> have jars in test folders, though. If it is, we could probably do some
>> magic to get rid of these few ones that have crept in.
>>
>> - Patrick
>>
>> On Sun, Oct 11, 2015 at 9:57 PM, Sean Owen  wrote:
>>
>>> Agree, but we are talking about the build/ bit right?
>>>
>>> I don't agree that it invalidates the release, which is probably the
>>> more important idea. As a point of process, you would not want to modify
>>> and republish the artifact that was already released after being voted on -
>>> unless it was invalid in which case we spin up 1.5.1.1 or something.
>>>
>>> But that build/ directory should go in future releases.
>>>
>>> I think he is talking about more than this though and the other jars
>>> look like they are part of tests, and still nothing to do with Spark
>>> binaries. Those can and should stay.
>>>
>>> On Mon, Oct 12, 2015, 5:35 AM Patrick Wendell 
>>> wrote:
>>>
 I think Daniel is correct here. The source artifact incorrectly
 includes jars. It is inadvertent and not part of our intended release
 process. This was something I noticed in Spark 1.5.0 and filed a JIRA and
 was fixed by updating our build scripts to fix it. However, our build
 environment was not using the most current version of the build scripts.
 See related links:

 https://issues.apache.org/jira/browse/SPARK-10511
 https://github.com/apache/spark/pull/8774/files

 I can update our build environment and we can repackage the Spark 1.5.1
 source tarball. To not include sources.


 - Patrick

 On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen  wrote:

> Daniel: we did not vote on a tag. Please again read the VOTE email I
> linked to you:
>
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none
>
> among other things, it contains a link to the concrete source (and
> binary) distribution under vote:
>
> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/
>
> You can still examine it, sure.
>
> Dependencies are *not* bundled in the source release. You're again
> misunderstanding what you are seeing. Read my email again.
>
> I am still pretty confused about what the problem is. This is entirely
> business as usual for ASF projects. I'll follow up with you offline if
> you have any more doubts.
>
> On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno 
> wrote:
> > Here's my issue:
> >
> > How am I to audit that the dependencies you bundle are in fact what
> you
> > claim they are?  How do I know they don't contain malware or - in
> light
> > of recent events - emissions test rigging? ;)
> >
> > I am not interested in a git tag - that means nothing in the ASF
> voting
> > process, you cannot vote on a tag, only on a release candidate. The
> VCS
> > in use is irrelevant in this issue. If you can point me to a release
> > candidate archive that was voted upon and does not contain binary
> > applications, all is well.
> >
> > If there is no such thing, and we cannot come to an understanding, I
> > will exercise my ASF 

taking the heap dump when an executor goes OOM

2015-10-11 Thread Niranda Perera
Hi all,

is there a way for me to get the heap-dump hprof of an executor jvm, when
it goes out of memory?

is this currently supported or do I have to change some configurations?

cheers

-- 
Niranda
@n1r44 
+94-71-554-8430
https://pythagoreanscript.wordpress.com/


yarn-cluster mode throwing NullPointerException

2015-10-11 Thread Rachana Srivastava
I am trying to submit a job using yarn-cluster mode using spark-submit command. 
 My code works fine when I use yarn-client mode.

Cloudera Version:
CDH-5.4.7-1.cdh5.4.7.p0.3

Command Submitted:
spark-submit --class "com.markmonitor.antifraud.ce.KafkaURLStreaming"  \
--driver-java-options 
"-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties" \
--conf 
"spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
 \
--conf 
"spark.executor.extraJavaOptions=-Dlog4j.configuration=file:///etc/spark/myconf/log4j.sample.properties"
 \
--num-executors 2 \
--executor-cores 2 \
../target/mm-XXX-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar \
yarn-cluster 10 "XXX:2181" "XXX:9092" groups kafkaurl 5 \
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeature.properties"
 \
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/urlFeatureContent.properties"
 \
"hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/ec2-user/hdfsOutputNEWScript/OUTPUTYarn2"
  false


Log Details:
INFO : org.apache.spark.SparkContext - Running Spark version 1.3.0
INFO : org.apache.spark.SecurityManager - Changing view acls to: ec2-user
INFO : org.apache.spark.SecurityManager - Changing modify acls to: ec2-user
INFO : org.apache.spark.SecurityManager - SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(ec2-user); users 
with modify permissions: Set(ec2-user)
INFO : akka.event.slf4j.Slf4jLogger - Slf4jLogger started
INFO : Remoting - Starting remoting
INFO : Remoting - Remoting started; listening on addresses 
:[akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579]
INFO : Remoting - Remoting now listens on addresses: 
[akka.tcp://sparkdri...@ip-10-0-0-xxx.us-west-2.compute.internal:49579]
INFO : org.apache.spark.util.Utils - Successfully started service 'sparkDriver' 
on port 49579.
INFO : org.apache.spark.SparkEnv - Registering MapOutputTracker
INFO : org.apache.spark.SparkEnv - Registering BlockManagerMaster
INFO : org.apache.spark.storage.DiskBlockManager - Created local directory at 
/tmp/spark-1c805495-c7c4-471d-973f-b1ae0e2c8ff9/blockmgr-fff1946f-a716-40fc-a62d-bacba5b17638
INFO : org.apache.spark.storage.MemoryStore - MemoryStore started with capacity 
265.4 MB
INFO : org.apache.spark.HttpFileServer - HTTP File server directory is 
/tmp/spark-8ed6f513-854f-4ee4-95ea-87185364eeaf/httpd-75cee1e7-af7a-4c82-a9ff-a124ce7ca7ae
INFO : org.apache.spark.HttpServer - Starting HTTP Server
INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
INFO : org.spark-project.jetty.server.AbstractConnector - Started 
SocketConnector@0.0.0.0:46671
INFO : org.apache.spark.util.Utils - Successfully started service 'HTTP file 
server' on port 46671.
INFO : org.apache.spark.SparkEnv - Registering OutputCommitCoordinator
INFO : org.spark-project.jetty.server.Server - jetty-8.y.z-SNAPSHOT
INFO : org.spark-project.jetty.server.AbstractConnector - Started 
SelectChannelConnector@0.0.0.0:4040
INFO : org.apache.spark.util.Utils - Successfully started service 'SparkUI' on 
port 4040.
INFO : org.apache.spark.ui.SparkUI - Started SparkUI at 
http://ip-10-0-0-XXX.us-west-2.compute.internal:4040
INFO : org.apache.spark.SparkContext - Added JAR 
file:/home/ec2-user/CE/correlationengine/scripts/../target/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
 at 
http://10.0.0.XXX:46671/jars/mm-anti-fraud-ce-0.0.1-SNAPSHOT-jar-with-dependencies.jar
 with timestamp 1444620509463
INFO : org.apache.spark.scheduler.cluster.YarnClusterScheduler - Created 
YarnClusterScheduler
ERROR: org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend - 
Application ID is not set.
INFO : org.apache.spark.network.netty.NettyBlockTransferService - Server 
created on 33880
INFO : org.apache.spark.storage.BlockManagerMaster - Trying to register 
BlockManager
INFO : org.apache.spark.storage.BlockManagerMasterActor - Registering block 
manager ip-10-0-0-XXX.us-west-2.compute.internal:33880 with 265.4 MB RAM, 
BlockManagerId(, ip-10-0-0-XXX.us-west-2.compute.internal, 33880)
INFO : org.apache.spark.storage.BlockManagerMaster - Registered BlockManager
INFO : org.apache.spark.scheduler.EventLoggingListener - Logging events to 
hdfs://ip-10-0-0-XXX.us-west-2.compute.internal:8020/user/spark/applicationHistory/spark-application-1444620509497
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.spark.deploy.yarn.ApplicationMaster$.sparkContextInitialized(ApplicationMaster.scala:580)
at 
org.apache.spark.scheduler.cluster.YarnClusterScheduler.postStartHook(YarnClusterScheduler.scala:32)
at org.apache.spark.SparkContext.(SparkContext.scala:541)
at 
org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:61)
at 
com.markmonitor.antifraud.ce.KafkaURLStreaming.main(KafkaURLStreaming.java:91)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Sean Owen
No we are voting on the artifacts being released (too) in principle.
Although of course the artifacts should be a deterministic function of the
source at a certain point in time.

I think the concern is about putting Spark binaries or its dependencies
into a source release. That should not happen, but it is not what has
happened here.

On Mon, Oct 12, 2015, 6:03 AM Patrick Wendell  wrote:

> Oh I see - yes it's the build/. I always thought release votes related to
> a source tag rather than specific binaries. But maybe we can just fix it in
> 1.5.2 if there is concern about mutating binaries. It seems reasonable to
> me.
>
> For tests... in the past we've tried to avoid having jars inside of the
> source tree, including some effort to generate jars on the fly which a lot
> of our tests use. I am not sure whether it's a firm policy that you can't
> have jars in test folders, though. If it is, we could probably do some
> magic to get rid of these few ones that have crept in.
>
> - Patrick
>
> On Sun, Oct 11, 2015 at 9:57 PM, Sean Owen  wrote:
>
>> Agree, but we are talking about the build/ bit right?
>>
>> I don't agree that it invalidates the release, which is probably the more
>> important idea. As a point of process, you would not want to modify and
>> republish the artifact that was already released after being voted on -
>> unless it was invalid in which case we spin up 1.5.1.1 or something.
>>
>> But that build/ directory should go in future releases.
>>
>> I think he is talking about more than this though and the other jars look
>> like they are part of tests, and still nothing to do with Spark binaries.
>> Those can and should stay.
>>
>> On Mon, Oct 12, 2015, 5:35 AM Patrick Wendell  wrote:
>>
>>> I think Daniel is correct here. The source artifact incorrectly includes
>>> jars. It is inadvertent and not part of our intended release process. This
>>> was something I noticed in Spark 1.5.0 and filed a JIRA and was fixed by
>>> updating our build scripts to fix it. However, our build environment was
>>> not using the most current version of the build scripts. See related links:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-10511
>>> https://github.com/apache/spark/pull/8774/files
>>>
>>> I can update our build environment and we can repackage the Spark 1.5.1
>>> source tarball. To not include sources.
>>>
>>>
>>> - Patrick
>>>
>>> On Sun, Oct 11, 2015 at 8:53 AM, Sean Owen  wrote:
>>>
 Daniel: we did not vote on a tag. Please again read the VOTE email I
 linked to you:


 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-5-1-RC1-tt14310.html#none

 among other things, it contains a link to the concrete source (and
 binary) distribution under vote:

 http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/

 You can still examine it, sure.

 Dependencies are *not* bundled in the source release. You're again
 misunderstanding what you are seeing. Read my email again.

 I am still pretty confused about what the problem is. This is entirely
 business as usual for ASF projects. I'll follow up with you offline if
 you have any more doubts.

 On Sun, Oct 11, 2015 at 4:49 PM, Daniel Gruno 
 wrote:
 > Here's my issue:
 >
 > How am I to audit that the dependencies you bundle are in fact what
 you
 > claim they are?  How do I know they don't contain malware or - in
 light
 > of recent events - emissions test rigging? ;)
 >
 > I am not interested in a git tag - that means nothing in the ASF
 voting
 > process, you cannot vote on a tag, only on a release candidate. The
 VCS
 > in use is irrelevant in this issue. If you can point me to a release
 > candidate archive that was voted upon and does not contain binary
 > applications, all is well.
 >
 > If there is no such thing, and we cannot come to an understanding, I
 > will exercise my ASF Members' rights and bring this to the attention
 of
 > the board of directors and ask for a clarification of the legality of
 this.
 >
 > I find it highly irregular. Perhaps it is something some projects do
 in
 > the Java community, but that doesn't make it permissible in my view.
 >
 > With regards,
 > Daniel.
 >
 >
 > On 10/11/2015 05:42 PM, Sean Owen wrote:
 >> Still confused. Why are you saying we didn't vote on an archive?
 refer
 >> to the email I linked, which includes both the git tag and a link to
 >> all generated artifacts (also in my email).
 >>
 >> So, there are two things at play here:
 >>
 >> First, I am not sure what you mean that a source distro can't have
 >> binary files. It's supposed to have the source code of Spark, and
 >> shouldn't contain binary