[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/304#issuecomment-40335789
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14109/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1216. Add a OneHotEncoder for handling c...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/304#issuecomment-40335788
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Decision Tree documentation for MLlib programm...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/402#issuecomment-40336644
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Decision Tree documentation for MLlib programm...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/402#issuecomment-40336645
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14110/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] SPARK-1430: Support sparse data in Pytho...

2014-04-14 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/341#discussion_r11573232
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 
---
@@ -185,4 +193,39 @@ class SparseVector(
   }
 
   private[mllib] override def toBreeze: BV[Double] = new 
BSV[Double](indices, values, size)
+
+  override def apply(pos: Int): Double = {
+// A more efficient apply() than creating a new Breeze vector
--- End diff --

Good point, I'll remove this and split() because they're no longer needed. 
They were needed when we passed vectors with the label included from Python 
instead of passing LabeledPoint.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: misleading task number of groupByKey

2014-04-14 Thread CrazyJvm
GitHub user CrazyJvm opened a pull request:

https://github.com/apache/spark/pull/403

misleading task number of groupByKey

By default, this uses only 8 parallel tasks to do the grouping. is a big 
misleading. Please refer to https://github.com/apache/spark/pull/389 

detail is as following code :
code
  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
for (r - bySize if r.partitioner.isDefined) {
  return r.partitioner.get
}
if (rdd.context.conf.contains(spark.default.parallelism)) {
  new HashPartitioner(rdd.context.defaultParallelism)
} else {
  new HashPartitioner(bySize.head.partitions.size)
}
  }
/code

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CrazyJvm/spark patch-4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #403


commit 156833643d9ea1479222e9033164e92a9846351c
Author: Chen Chao crazy...@gmail.com
Date:   2014-04-14T07:39:50Z

misleading task number of groupByKey

By default, this uses only 8 parallel tasks to do the grouping. is a big 
misleading. Please refer to https://github.com/apache/spark/pull/389 

detail is as following code :
code
  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
for (r - bySize if r.partitioner.isDefined) {
  return r.partitioner.get
}
if (rdd.context.conf.contains(spark.default.parallelism)) {
  new HashPartitioner(rdd.context.defaultParallelism)
} else {
  new HashPartitioner(bySize.head.partitions.size)
}
  }
/code




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: misleading task number of groupByKey

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/403#issuecomment-40340359
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1477: Add the lifecycle interface

2014-04-14 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/379#issuecomment-40344734
  
We are currently a little swamped with Spark 1.0 stuff, we will definitely 
take a look soon. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-40362749
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-40362755
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/404

SPARK-1488. Resolve scalac feature warnings during build

For your consideration: scalac currently notes a number of feature warnings 
during compilation:

```
[warn] there were 65 feature warning(s); re-run with -feature for details
```

Warnings are like:

```
[warn] 
/Users/srowen/Documents/spark/core/src/main/scala/org/apache/spark/SparkContext.scala:1261:
 implicit conversion method rddToPairRDDFunctions should be enabled
[warn] by making the implicit value scala.language.implicitConversions 
visible.
[warn] This can be achieved by adding the import clause 'import 
scala.language.implicitConversions'
[warn] or by setting the compiler option -language:implicitConversions.
[warn] See the Scala docs for value scala.language.implicitConversions for 
a discussion
[warn] why the feature should be explicitly enabled.
[warn]   implicit def rddToPairRDDFunctions[K: ClassTag, V: ClassTag](rdd: 
RDD[(K, V)]) =
[warn]^
```

scalac is suggesting that it's just best practice to explicitly enable 
certain language features by importing them where used.

This PR simply adds the imports it suggests (and squashes one other Java 
warning along the way). This leaves just deprecation warnings in the build.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-1488

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #404


commit 39bc83115d5a55527e4f893fd480039896b6a63f
Author: Sean Owen so...@cloudera.com
Date:   2014-04-08T11:24:28Z

Enable -feature in scalac to emit language feature warnings

commit 859898002573f24c53d458db3e61b91b3c9da841
Author: Sean Owen so...@cloudera.com
Date:   2014-04-08T12:09:45Z

Quiet scalac warnings about language features by explicitly importing 
language features.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/404#issuecomment-40364966
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/404#issuecomment-40364977
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-40366369
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1293 [SQL] [WIP] Parquet support for nes...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/360#issuecomment-40366371
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14111/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...

2014-04-14 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/396#issuecomment-40369872
  
Spark shouldn't be using it directly since it got marked as private in the 
Hadoop 2.2 release. I believe Spark was using that api before the 2.2 release 
so it was easy to miss. 
Also when it was changed it to private,  MapReduce was not updated to stop 
using it, so Hadoop is breaking its own api rules.   

These functions are utility functions and could be used by many types of 
applications so ideally some new class in YARN with these functions is created 
that is public.

I think we should commit this pr (after review) since spark on yarn can't 
be run against 2.4 release now and then if a new Yarn utility class is created 
we can look at using that.  




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1465: Spark compilation is broken with t...

2014-04-14 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/396#issuecomment-40369924
  
Also note I filed https://issues.apache.org/jira/browse/SPARK-1472 to go 
through the rest of the YARN apis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/404#issuecomment-40369322
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14112/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/404#issuecomment-40369320
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Add Shortest-path computations to graphx.lib w...

2014-04-14 Thread andy327
Github user andy327 commented on the pull request:

https://github.com/apache/spark/pull/10#issuecomment-40374419
  
Alternatively, it can be done without the added algebird dependency, if 
that's desired..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40386021
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40386037
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/18#issuecomment-40388669
  
@pwendell Could you help merge this PR into both master and branch-1.0? 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40390542
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1408 Modify Spark on Yarn to point to th...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/362#issuecomment-40390543
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14113/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1310: Start adding k-fold cross validati...

2014-04-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/18#issuecomment-40392687
  
@mengxr @holdenk this does not merge cleanly at the moment - there are some 
conflicts in MLUtils.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread tmalaska
GitHub user tmalaska opened a pull request:

https://github.com/apache/spark/pull/405

SPARK-1478

Initial Version

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmalaska/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/405.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #405


commit c433827db5dfda6f5b1b6aa11e45447525b4aac4
Author: tmalaska ted.mala...@cloudera.com
Date:   2014-04-14T17:37:01Z

SPARK-1478




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40395599
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [BUGFIX] In-memory columnar storage bug fixes

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/374#issuecomment-40403308
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-14 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40410340
  
I'll look at it some more tmorrow, but this needs to be rebased to current 
master -- e.g.,

diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index e637ddc..9657cbf 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -482,12 +482,19 @@ class DAGScheduler(
 
   private[scheduler] def doCancelAllJobs() {
 // Cancel all running jobs.
-runningStages.map(_.jobId).foreach(handleJobCancellation)
+runningStages.map(_.jobId).foreach(handleJobCancellation(_, as part 
of cancellation of all jobs))
 activeJobs.clear() // These should already be empty by this point,
 jobIdToActiveJob.clear() // but just in case we lost track of some 
jobs...
   }
 
   /**
+   * Cancel all jobs associated with a running or scheduled stage.
+   */
+  def cancelStage(stageId: Int) {
+eventProcessActor ! StageCancelled(stageId)
+  }
+
+  /**
* Resubmit any failed stages. Ordinarily called after a small amount of 
time has passed since
* the last fetch failure.
*/
@@ -849,11 +856,23 @@ class DAGScheduler(
 }
   }
 
-  private[scheduler] def handleJobCancellation(jobId: Int) {
+  private[scheduler] def handleStageCancellation(stageId: Int) {
+if (stageIdToJobIds.contains(stageId)) {
+  val jobsThatUseStage: Array[Int] = stageIdToJobIds(stageId).toArray
+  jobsThatUseStage.foreach(jobId = {
+handleJobCancellation(jobId, because Stage %s was 
cancelled.format(stageId))
+  })
+} else {
+  logInfo(No active jobs to kill for Stage  + stageId)
+}
+  }
+
+  private[scheduler] def handleJobCancellation(jobId: Int, reason: String 
= ) {
 if (!jobIdToStageIds.contains(jobId)) {
   logDebug(Trying to cancel unregistered job  + jobId)
 } else {
-  failJobAndIndependentStages(jobIdToActiveJob(jobId), sJob $jobId 
cancelled, None)
+  failJobAndIndependentStages(jobIdToActiveJob(jobId),
+sJob $jobId cancelled $reason, None)
 }
   }
 
@@ -1060,6 +1079,9 @@ private[scheduler] class 
DAGSchedulerEventProcessActor(dagScheduler: DAGSchedule
 dagScheduler.submitStage(finalStage)
   }
 
+case StageCancelled(stageId) =
+  dagScheduler.handleStageCancellation(stageId)
+
 case JobCancelled(jobId) =
   dagScheduler.handleJobCancellation(jobId)
 
@@ -1069,7 +1091,7 @@ private[scheduler] class 
DAGSchedulerEventProcessActor(dagScheduler: DAGSchedule
   val activeInGroup = dagScheduler.activeJobs.filter(activeJob =
 groupId == 
activeJob.properties.get(SparkContext.SPARK_JOB_GROUP_ID))
   val jobIds = activeInGroup.map(_.jobId)
-  jobIds.foreach(dagScheduler.handleJobCancellation)
+  jobIds.foreach(dagScheduler.handleJobCancellation(_, sas part of 
cancelled job group %groupId))
 
 case AllJobsCancelled =
   dagScheduler.doCancelAllJobs()


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/353#discussion_r11605070
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.optimization
+
+import scala.collection.mutable.ArrayBuffer
+
+import breeze.linalg.{DenseVector = BDV, axpy}
+import breeze.optimize.{CachedDiffFunction, DiffFunction}
+
+import org.apache.spark.Logging
+import org.apache.spark.rdd.RDD
+import org.apache.spark.mllib.linalg.{Vectors, Vector}
+
+/**
+ * Class used to solve an optimization problem using Limited-memory BFGS.
+ * Reference: [[http://en.wikipedia.org/wiki/Limited-memory_BFGS]]
+ * @param gradient Gradient function to be used.
+ * @param updater Updater to be used to update weights after every 
iteration.
+ */
+class LBFGS(private var gradient: Gradient, private var updater: Updater)
+  extends Optimizer with Logging {
+
+  private var numCorrections = 10
+  private var convergenceTol = 1E-4
+  private var maxNumIterations = 100
+  private var regParam = 0.0
+  private var miniBatchFraction = 1.0
+
+  /**
+   * Set the number of corrections used in the LBFGS update. Default 10.
+   * Values of numCorrections less than 3 are not recommended; large values
+   * of numCorrections will result in excessive computing time.
+   * 3  numCorrections  10 is recommended.
+   * Restriction: numCorrections  0
+   */
+  def setNumCorrections(corrections: Int): this.type = {
+assert(corrections  0)
+this.numCorrections = corrections
+this
+  }
+
+  /**
+   * Set fraction of data to be used for each L-BFGS iteration. Default 
1.0.
+   */
+  def setMiniBatchFraction(fraction: Double): this.type = {
+this.miniBatchFraction = fraction
+this
+  }
+
+  /**
+   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
+   * Smaller value will lead to higher accuracy with the cost of more 
iterations.
+   */
+  def setConvergenceTol(tolerance: Int): this.type = {
+this.convergenceTol = tolerance
+this
+  }
+
+  /**
+   * Set the maximal number of iterations for L-BFGS. Default 100.
+   */
+  def setMaxNumIterations(iters: Int): this.type = {
+this.maxNumIterations = iters
+this
+  }
+
+  /**
+   * Set the regularization parameter. Default 0.0.
+   */
+  def setRegParam(regParam: Double): this.type = {
+this.regParam = regParam
+this
+  }
+
+  /**
+   * Set the gradient function (of the loss function of one single data 
example)
+   * to be used for L-BFGS.
+   */
+  def setGradient(gradient: Gradient): this.type = {
+this.gradient = gradient
+this
+  }
+
+  /**
+   * Set the updater function to actually perform a gradient step in a 
given direction.
+   * The updater is responsible to perform the update from the 
regularization term as well,
+   * and therefore determines what kind or regularization is used, if any.
+   */
+  def setUpdater(updater: Updater): this.type = {
+this.updater = updater
+this
+  }
+
+  override def optimize(data: RDD[(Double, Vector)], initialWeights: 
Vector): Vector = {
+val (weights, _) = LBFGS.runMiniBatchLBFGS(
+  data,
+  gradient,
+  updater,
+  numCorrections,
+  convergenceTol,
+  maxNumIterations,
+  regParam,
+  miniBatchFraction,
+  initialWeights)
+weights
+  }
+
+}
+
+/**
+ * Top-level method to run LBFGS.
+ */
+object LBFGS extends Logging {
+  /**
+   * Run Limited-memory BFGS (L-BFGS) in parallel using mini batches.
+   * In each iteration, we sample a subset (fraction miniBatchFraction) of 
the total data
+   * in order to 

[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40414083
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14117/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1235: manage the DAGScheduler EventProce...

2014-04-14 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/186#issuecomment-40414779
  
Failing RAT checks not related to this PR.  This PR runs and passes all the
tests for me locally, but I want to take another close look at it tomorrow
-- and with any luck, someone will have made Jenkins happy by then


On Mon, Apr 14, 2014 at 1:26 PM, Nan Zhu notificati...@github.com wrote:

 Eh...just rebased, but Jenkins is not happy...

 —
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/spark/pull/186#issuecomment-40413762
 .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40416735
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40416619
  
Jenkins, test this please. @tmalaska mind updating the title of the PR to 
include the title of the JIRA? It makes it easier when scanning the (long list) 
of active pull requests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40416751
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40416779
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14118/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: support leftsemijoin for sparkSQL

2014-04-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/395#issuecomment-40420386
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread tdas
Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40420675
  
@tmalaska I did a cursory pass, this looks good. I will do a more detailed 
pass soon. However, there something you should know. I am in the middle of a PR 
( #300 ) that tweaks the receiver API a little bit for greater stability and so 
a bit of your code will have a to change a little. This should go in pretty 
soon (couple of days, max).  The PR has the changes necessary for the current 
FlumeReceiver.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1474: Spark on yarn assembly doesn't inc...

2014-04-14 Thread tgravescs
GitHub user tgravescs opened a pull request:

https://github.com/apache/spark/pull/406

SPARK-1474: Spark on yarn assembly doesn't include AmIpFilter

We use org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter in spark 
on yarn but are not included it in the assembly jar.

I tested this on yarn cluster by removing the yarn jars from the classpath 
and spark runs fine now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tgravescs/spark SPARK-1474

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/406.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #406


commit 1548bf955a1d2ca410af0b447ad1bcf4840b326e
Author: Thomas Graves tgra...@apache.org
Date:   2014-04-14T17:52:20Z

SPARK-1474: Spark on yarn assembly doesn't include 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1478

2014-04-14 Thread tmalaska
Github user tmalaska commented on the pull request:

https://github.com/apache/spark/pull/405#issuecomment-40421425
  
Yeah no problem.  Thanks for taking the time to review my code.  This is my 
first time committing with Scala :)

Just let me know when ( #300 ) is done and I will re check out.  Also when 
you have time I would love to know how else I could help.

I was thinking of adding :
- encryption to the Flume Stream as is in Flume 1.4.0.
- Fail recover support when a Flume Stream host goes down and Spark starts 
up the Flume Stream on another node.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1474: Spark on yarn assembly doesn't inc...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/406#issuecomment-40421597
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1474: Spark on yarn assembly doesn't inc...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/406#issuecomment-40421710
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14119/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1474: Spark on yarn assembly doesn't inc...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/406#issuecomment-40421708
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread tmyklebu
GitHub user tmyklebu opened a pull request:

https://github.com/apache/spark/pull/407

[SPARK-1281] Improve partitioning in ALS

ALS was using HashPartitioner and explicit uses of `%` together.  Further, 
the naked use of `%` meant that, if the number of partitions corresponded with 
the stride of arithmetic progressions appearing in user and product ids, users 
and products could be mapped into buckets in an unfair or unwise way.

This pull request:
1) Makes the Partitioner an instance variable of ALS.
2) Replaces the direct uses of `%` with calls to a Partitioner.
3) Defines an anonymous Partitioner that scrambles the bits of the object's 
hashCode before reducing to the number of present buckets.

This pull request does not make the partitioner user-configurable.

I'm not all that happy about the way I did (1).  It introduces an icky 
lifetime issue and dances around it by nulling something.  However, I don't 
know a better way to make the partitioner visible everywhere it needs to be 
visible.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tmyklebu/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/407.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #407


commit c774d7d4bff91c9387d059d1189799fa0ff1f4b0
Author: Tor Myklebust tmykl...@gmail.com
Date:   2014-04-14T22:01:18Z

Make the partitioner a member variable and use it instead of modding 
directly.

commit c90b6d8e91f86cf89adf28de6f9185647c87e5c8
Author: Tor Myklebust tmykl...@gmail.com
Date:   2014-04-14T22:10:30Z

Scramble user and product ids before bucketing.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [BUGFIX] In-memory columnar storage bug fixes

2014-04-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/374#issuecomment-40425590
  
Thanks merged into master and 1.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Clean up and simplify Spark configuration

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/299#issuecomment-40425587
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40425686
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14120/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread tmyklebu
Github user tmyklebu commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40425865
  
Build failure.  Looks like a config issue in Jenkins?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Clean up and simplify Spark configuration

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/299#issuecomment-40428500
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Clean up and simplify Spark configuration

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/299#issuecomment-40428502
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14121/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40429267
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [BUGFIX] In-memory columnar storage bug fixes

2014-04-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/374


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40432076
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40432158
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40432159
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14123/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread ahirreddy
Github user ahirreddy commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40432281
  
MIMA Checker issue because we now include Hive in the assembly jar when 
building on Jenkins. See Jira SPARK-1494 for more information.
https://issues.apache.org/jira/browse/SPARK-1494


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40433881
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-14 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40434113
  
+1 from me -- I've done the URL editing that Marcelo described before.


On Tue, Apr 15, 2014 at 12:54 AM, Patrick Wendell
notificati...@github.comwrote:

 This seems like a decent idea - @andrewor14https://github.com/andrewor14
 ?

 —
 Reply to this email directly or view it on 
GitHubhttps://github.com/apache/spark/pull/408#issuecomment-40431843
 .



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: support leftsemijoin for sparkSQL

2014-04-14 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/395#issuecomment-40434377
  
Besides the BroadcastNestedLoopJoin, I think the left semi join may also 
need to be implemented in the HashJoin.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai closed the pull request at:

https://github.com/apache/spark/pull/353


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40434555
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
GitHub user dbtsai reopened a pull request:

https://github.com/apache/spark/pull/353

[SPARK-1157][MLlib] L-BFGS Optimizer based on Breeze's implementation.

This PR uses Breeze's L-BFGS implement, and Breeze dependency has already 
been introduced by Xiangrui's sparse input format work in SPARK-1212. Nice 
work, @mengxr !

When use with regularized updater, we need compute the regVal and 
regGradient (the gradient of regularized part in the cost function), and in the 
currently updater design, we can compute those two values by the following way.

Let's review how updater works when returning newWeights given the input 
parameters.

w' = w - thisIterStepSize * (gradient + regGradient(w))  Note that 
regGradient is function of w!
If we set gradient = 0, thisIterStepSize = 1, then
regGradient(w) = w - w'

As a result, for regVal, it can be computed by 

val regVal = updater.compute(
  weights,
  new DoubleMatrix(initialWeights.length, 1), 0, 1, regParam)._2
and for regGradient, it can be obtained by

  val regGradient = weights.sub(
updater.compute(weights, new DoubleMatrix(initialWeights.length, 
1), 1, 1, regParam)._1)

The PR includes the tests which compare the result with SGD with/without 
regularization.

We did a comparison between LBFGS and SGD, and often we saw 10x less
steps in LBFGS while the cost of per step is the same (just computing
the gradient).

The following is the paper by Prof. Ng at Stanford comparing different
optimizers including LBFGS and SGD. They use them in the context of
deep learning, but worth as reference.
http://cs.stanford.edu/~jngiam/papers/LeNgiamCoatesLahiriProchnowNg2011.pdf

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dbtsai/spark dbtsai-LBFGS

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/353.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #353


commit 984b18e21396eae84656e15da3539ff3b5f3bf4a
Author: DB Tsai dbt...@alpinenow.com
Date:   2014-04-05T00:06:50Z

L-BFGS Optimizer based on Breeze's implementation. Also fixed indentation 
issue in GradientDescent optimizer.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40434626
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40434691
  
Timeout for lastest jenkins run. It seems that CI is not stable now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40434890
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40434895
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: support leftsemijoin for sparkSQL

2014-04-14 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/395#issuecomment-40436922
  
I'll create a JIRA soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40437823
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/404#issuecomment-40437961
  
Aha, finally! LGTM and thanks for working on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40438216
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/407#discussion_r11617460
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -96,6 +97,7 @@ class ALS private (
 private var lambda: Double,
 private var implicitPrefs: Boolean,
 private var alpha: Double,
+private var partitioner: Partitioner = null,
--- End diff --

Do not put partitioner in constructor args. Use setters and make the 
hashPartitioner default. Also, should separate userPartitioner/numUserBlocks 
and productPartitioner/numProductBlocks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40438453
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40438446
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40438518
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14128/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1281] Improve partitioning in ALS

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/407#issuecomment-40438517
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/354#issuecomment-40439180
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/354#issuecomment-40439169
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

2014-04-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/354#issuecomment-40439248
  
Okay, I updated the API based on a conversation with @mateiz.  I also added 
the relevant function to the Java API.  We can do python in a follow up PR once 
that is merged.  Once Jenkins passes I think this is ready to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40439428
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40439431
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40439427
  
Regarding the longer test time, we should make sure that we aren't just 
comparing to times when the Hive tests weren't running at all.

Should definitely look into the increased verbosity of the logs (even 
thought that might not have been caused by this PR, but by turning the hive 
tests back on).  It is possible that we should just add more packages to 
`sql/hive/src/main/resources/log4j.properties`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40439479
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14126/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1157][MLlib] L-BFGS Optimizer based on ...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/353#issuecomment-40439478
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/354#issuecomment-40439900
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14127/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/354#issuecomment-40439899
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40440452
  
@marmbrus I see- the duration issue was just that we had stopped running 
hive tests for a bit after Aaron's build change. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40440628
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14130/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40440638
  
I manually cancelled this build since we'll need to reterst.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40440626
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/354#issuecomment-40440870
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14129/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SQL] SPARK-1424 Generalize insertIntoTable fu...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/354#issuecomment-40440869
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/404#issuecomment-40440919
  
Thanks - I've merged this. Good call.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Include stack trace for exceptions thrown by u...

2014-04-14 Thread marmbrus
GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/409

Include stack trace for exceptions thrown by user code.

It is very confusing when your code throws an exception, but the only stack 
trace show is in the DAGScheduler.  This is a simple patch to include the stack 
trace for the actual failure in the error message.  Suggestions on formatting 
welcome.

Before:
```
scala sc.parallelize(1 :: Nil).map(_ = sys.error(Ahh!)).collect()
org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0.0:3 failed 1 times (most recent failure: Exception failure in TID 3 on host 
localhost: java.lang.RuntimeException: Ahh!)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1055)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1039)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1037)
...
```

After:
```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0.0:3 failed 1 times, most recent failure: Exception failure in TID 3 on host 
localhost: java.lang.RuntimeException: Ahh!
scala.sys.package$.error(package.scala:27)
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:13)
$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(console:13)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$class.foreach(Iterator.scala:727)
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
scala.collection.AbstractIterator.to(Iterator.scala:1157)
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:676)
org.apache.spark.rdd.RDD$$anonfun$6.apply(RDD.scala:676)

org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1048)

org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1048)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:110)
org.apache.spark.scheduler.Task.run(Task.scala:50)

org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:211)
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:46)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:744)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1055)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1039)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1037)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1037)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:614)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:614)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:614)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:143)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 

[GitHub] spark pull request: Include stack trace for exceptions thrown by u...

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/409#issuecomment-40441434
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1488. Resolve scalac feature warnings du...

2014-04-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/404


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40444236
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40444359
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1374: PySpark API for SparkSQL

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/363#issuecomment-40444360
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: Make spark logo link refer to /.

2014-04-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/408#issuecomment-40444365
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   >