[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/5645#discussion_r29026826
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala
 ---
@@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
 logDebug(s"Read partition data of $this from block manager, block 
$blockId")
 iterator
   case None => // Data not found in Block Manager, grab it from write 
ahead log file
-val reader = new WriteAheadLogRandomReader(partition.segment.path, 
hadoopConf)
-val dataRead = reader.read(partition.segment)
-reader.close()
+var dataRead: ByteBuffer = null
+var writeAheadLog: WriteAheadLog = null
+try {
+  val dummyDirectory = FileUtils.getTempDirectoryPath()
--- End diff --

Why here need to use `dummyDirectory`? Assuming WAL may not be file-based, 
so I'm not sure what's the meaning we need to have this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-23 Thread sryza
Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/5676#issuecomment-95822131
  
This looks like a duplicate of SPARK-6954 (PR #5536)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-95821459
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30916/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-95821427
  
  [Test build #30916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30916/consoleFull)
 for   PR 2342 at commit 
[`d3c63c8`](https://github.com/apache/spark/commit/d3c63c84a56041756841dd0706d87c8c808e84d3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class ExecutorUIData(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...

2015-04-23 Thread aniketbhatnagar
Github user aniketbhatnagar commented on the pull request:

https://github.com/apache/spark/pull/5354#issuecomment-95819955
  
+1 from my side. having a consistent httpclient version would be so much 
better!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5680#issuecomment-95819308
  
  [Test build #30920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30920/consoleFull)
 for   PR 5680 at commit 
[`28d668f`](https://github.com/apache/spark/commit/28d668faf51495e779aa1f874ceb03a64bccf410).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7112][Streaming] Add a DirectStreamTrac...

2015-04-23 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/5680

[SPARK-7112][Streaming] Add a DirectStreamTracker to track the direct 
streams



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-7111

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5680.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5680


commit 28d668faf51495e779aa1f874ceb03a64bccf410
Author: jerryshao 
Date:   2015-04-24T06:07:54Z

Add DirectStreamTracker to track the direct streams




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7097][SQL]: Partitioned tables should o...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5668#issuecomment-95817298
  
  [Test build #30919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30919/consoleFull)
 for   PR 5668 at commit 
[`b4651fd`](https://github.com/apache/spark/commit/b4651fd80a55f016093d84cf3b00ad6c91333cef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-95813823
  
  [Test build #30918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30918/consoleFull)
 for   PR 2342 at commit 
[`b09d0c5`](https://github.com/apache/spark/commit/b09d0c5f76aa1eb2912ef625c4bd0ffa2c729d64).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95811201
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30914/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95811180
  
  [Test build #30914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30914/consoleFull)
 for   PR 5645 at commit 
[`d7cd15b`](https://github.com/apache/spark/commit/d7cd15b5cef64766a432918e54cca4750d13745b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5679#issuecomment-95810737
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-7103: Fix crash with SparkContext.union ...

2015-04-23 Thread stevencanopy
GitHub user stevencanopy opened a pull request:

https://github.com/apache/spark/pull/5679

SPARK-7103: Fix crash with SparkContext.union when RDD has no partitioner

Added a check to the SparkContext.union method to check that a partitioner 
is defined on all RDDs when instantiating a PartitionerAwareUnionRDD. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/stevencanopy/spark SPARK-7103

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5679


commit 5a3d84649b46df9fd670e951941e809e1e6d98a7
Author: Steven She 
Date:   2015-04-24T05:55:25Z

SPARK-7103: Fix crash with SparkContext.union when at least one RDD has no 
partitioner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5894][ML] Add polynomial mapper

2015-04-23 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/5245#issuecomment-95810160
  
@mengxr I do some tests on these two versions, here is the result log: (You 
can see my code 
[here](https://github.com/yinxusen/spark/blob/PerformanceTest-5894/mllib/src/main/scala/org/apache/spark/ml/feature/PolynomialMapper.scala).)

```bash
sbt "mllib/run-main org.apache.spark.ml.feature.PolynomialMapper" 
2>&1>test.log
```

> [info] Testing number of data 1024
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 48.591317ms
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 43.113877ms
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V2 
name: denseData
[info]  Elapsed time: 38.518744ms
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V2 
name: sparseData
[info]  Elapsed time: 36.946037ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 34.615637ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 39.327571ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V2 
name: denseData
[info]  Elapsed time: 35.640954ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V2 
name: sparseData
[info]  Elapsed time: 38.740797ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 37.757011ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 39.291329ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V2 
name: denseData
[info]  Elapsed time: 34.665687ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V2 
name: sparseData
[info]  Elapsed time: 37.758357ms
[info]  Testing dataset degree: 10  mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 33.307436ms
[info]  Testing dataset degree: 10  mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 37.231837ms
[info]  Testing dataset degree: 10  mapper: PolynomialMapper-V2 
name: denseData
[info]  Elapsed time: 34.794309ms
[info]  Testing dataset degree: 10  mapper: PolynomialMapper-V2 
name: sparseData
[info]  Elapsed time: 37.112773ms

> [info] Testing number of data 10240
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 76.447725ms
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 98.351862ms
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V2 
name: denseData
[info]  Elapsed time: 76.17611ms
[info]  Testing dataset degree: 2   mapper: PolynomialMapper-V2 
name: sparseData
[info]  Elapsed time: 99.099883ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 76.661511ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 99.442798ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V2 
name: denseData
[info]  Elapsed time: 76.607076ms
[info]  Testing dataset degree: 3   mapper: PolynomialMapper-V2 
name: sparseData
[info]  Elapsed time: 99.722276ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 76.337466ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 99.550001ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V2 
name: denseData
[info]  Elapsed time: 76.633637ms
[info]  Testing dataset degree: 5   mapper: PolynomialMapper-V2 
name: sparseData
[info]  Elapsed time: 98.995122ms
[info]  Testing dataset degree: 10  mapper: PolynomialMapper-V1 
name: denseData
[info]  Elapsed time: 77.281723ms
[info]  Testing dataset degree: 10  mapper: PolynomialMapper-V1 
name: sparseData
[info]  Elapsed time: 100.623104ms
[in

[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5604#issuecomment-95809888
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30917/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5604#issuecomment-95809882
  
  [Test build #30917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30917/consoleFull)
 for   PR 5604 at commit 
[`5b96e2a`](https://github.com/apache/spark/commit/5b96e2aa3e6da2a836171e4783c8199d21daed20).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WindowExpression(child: Expression, windowSpec: WindowSpec) 
extends UnaryExpression `
  * `case class WindowSpec(windowPartition: WindowPartition, windowFrame: 
Option[WindowFrame])`
  * `case class WindowPartition(partitionBy: Seq[Expression], sortBy: 
Seq[SortOrder])`
  * `case class WindowFrame(frameType: FrameType, preceding: Int, 
following: Int)`
  * `case class WindowAggregate(`
  * `case class WindowAggregate(`
  * `  case class ComputedWindow(`

 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1442][SQL][WIP] Window Function Support...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5604#issuecomment-95809470
  
  [Test build #30917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30917/consoleFull)
 for   PR 5604 at commit 
[`5b96e2a`](https://github.com/apache/spark/commit/5b96e2aa3e6da2a836171e4783c8199d21daed20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/5626#issuecomment-95806853
  
LGTM except some minor inline comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024766
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/GBTExample.scala ---
@@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import scala.collection.mutable
+import scala.language.reflectiveCalls
+
+import scopt.OptionParser
+
+import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.examples.mllib.AbstractParams
+import org.apache.spark.ml.{Pipeline, PipelineStage}
+import org.apache.spark.ml.classification.{GBTClassificationModel, 
GBTClassifier}
+import org.apache.spark.ml.feature.{StringIndexer, VectorIndexer}
+import org.apache.spark.ml.regression.{GBTRegressionModel, GBTRegressor}
+import org.apache.spark.sql.DataFrame
+
+
+/**
+ * An example runner for decision trees. Run with
+ * {{{
+ * ./bin/run-example ml.GBTExample [options]
+ * }}}
+ * Decision Trees and ensembles can take a large amount of memory.  If the 
run-example command
+ * above fails, try running via spark-submit and specifying the amount of 
memory as at least 1g.
+ * For local mode, run
+ * {{{
+ * ./bin/spark-submit --class org.apache.spark.examples.ml.GBTExample 
--driver-memory 1g
+ *   [examples JAR path] [options]
+ * }}}
+ * If you use it as a template to create your own app, please use 
`spark-submit` to submit your app.
+ */
+object GBTExample {
+
+  case class Params(
+  input: String = null,
+  testInput: String = "",
+  dataFormat: String = "libsvm",
+  algo: String = "classification",
+  maxDepth: Int = 5,
+  maxBins: Int = 32,
+  minInstancesPerNode: Int = 1,
+  minInfoGain: Double = 0.0,
+  maxIter: Int = 10,
+  fracTest: Double = 0.2,
+  cacheNodeIds: Boolean = false,
+  checkpointDir: Option[String] = None,
+  checkpointInterval: Int = 10) extends AbstractParams[Params]
+
+  def main(args: Array[String]) {
+val defaultParams = Params()
+
+val parser = new OptionParser[Params]("GBTExample") {
+  head("GBTExample: an example Gradient-Boosted Trees app.")
+  opt[String]("algo")
+.text(s"algorithm (classification, regression), default: 
${defaultParams.algo}")
+.action((x, c) => c.copy(algo = x))
+  opt[Int]("maxDepth")
+.text(s"max depth of the tree, default: ${defaultParams.maxDepth}")
+.action((x, c) => c.copy(maxDepth = x))
+  opt[Int]("maxBins")
+.text(s"max number of bins, default: ${defaultParams.maxBins}")
+.action((x, c) => c.copy(maxBins = x))
+  opt[Int]("minInstancesPerNode")
+.text(s"min number of instances required at child nodes to create 
the parent split," +
+s" default: ${defaultParams.minInstancesPerNode}")
+.action((x, c) => c.copy(minInstancesPerNode = x))
+  opt[Double]("minInfoGain")
+.text(s"min info gain required to create a split, default: 
${defaultParams.minInfoGain}")
+.action((x, c) => c.copy(minInfoGain = x))
+  opt[Int]("maxIter")
+.text(s"number of trees in ensemble, default: 
${defaultParams.maxIter}")
+.action((x, c) => c.copy(maxIter = x))
+  opt[Double]("fracTest")
+.text(s"fraction of data to hold out for testing.  If given option 
testInput, " +
+s"this option is ignored. default: ${defaultParams.fracTest}")
+.action((x, c) => c.copy(fracTest = x))
+  opt[Boolean]("cacheNodeIds")
+.text(s"whether to use node Id cache during training, " +
+s"default: ${defaultParams.cacheNodeIds}")
+.action((x, c) => c.copy(cacheNodeIds = x))
+  opt[String]("checkpointDir")
+.text(s"checkpoint directory where intermediate node Id caches 
will be stored, " +
+s"default: ${
 

[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024771
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/RandomForestRegressor.scala 
---
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.regression
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor}
+import org.apache.spark.ml.impl.tree.{RandomForestParams, 
TreeRegressorParams}
+import org.apache.spark.ml.param.{Params, ParamMap}
+import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel}
+import org.apache.spark.ml.util.MetadataUtils
+import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.{RandomForest => OldRandomForest}
+import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo, 
Strategy => OldStrategy}
+import org.apache.spark.mllib.tree.model.{RandomForestModel => 
OldRandomForestModel}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
+
+
+/**
+ * :: AlphaComponent ::
+ *
+ * [[http://en.wikipedia.org/wiki/Random_forest  Random Forest]] learning 
algorithm for regression.
+ * It supports both continuous and categorical features.
+ */
+@AlphaComponent
+final class RandomForestRegressor
+  extends Predictor[Vector, RandomForestRegressor, 
RandomForestRegressionModel]
+  with RandomForestParams with TreeRegressorParams {
+
+  // Override parameter setters from parent trait for Java API 
compatibility.
+
+  // Parameters from TreeRegressorParams:
+
+  override def setMaxDepth(value: Int): this.type = 
super.setMaxDepth(value)
+
+  override def setMaxBins(value: Int): this.type = super.setMaxBins(value)
+
+  override def setMinInstancesPerNode(value: Int): this.type =
+super.setMinInstancesPerNode(value)
+
+  override def setMinInfoGain(value: Double): this.type = 
super.setMinInfoGain(value)
+
+  override def setMaxMemoryInMB(value: Int): this.type = 
super.setMaxMemoryInMB(value)
+
+  override def setCacheNodeIds(value: Boolean): this.type = 
super.setCacheNodeIds(value)
+
+  override def setCheckpointInterval(value: Int): this.type = 
super.setCheckpointInterval(value)
+
+  override def setImpurity(value: String): this.type = 
super.setImpurity(value)
+
+  // Parameters from TreeEnsembleParams:
+
+  override def setSubsamplingRate(value: Double): this.type = 
super.setSubsamplingRate(value)
+
+  override def setSeed(value: Long): this.type = super.setSeed(value)
+
+  // Parameters from RandomForestParams:
+
+  override def setNumTrees(value: Int): this.type = 
super.setNumTrees(value)
+
+  override def setFeatureSubsetStrategy(value: String): this.type =
+super.setFeatureSubsetStrategy(value)
+
+  override protected def train(
+  dataset: DataFrame,
+  paramMap: ParamMap): RandomForestRegressionModel = {
+val categoricalFeatures: Map[Int, Int] =
+  
MetadataUtils.getCategoricalFeatures(dataset.schema(paramMap(featuresCol)))
+val oldDataset: RDD[LabeledPoint] = extractLabeledPoints(dataset, 
paramMap)
+val strategy =
+  super.getOldStrategy(categoricalFeatures, numClasses = 0, 
OldAlgo.Regression, getOldImpurity)
+val oldModel = OldRandomForest.trainRegressor(
+  oldDataset, strategy, getNumTrees, getFeatureSubsetStrategy, 
getSeed.toInt)
+RandomForestRegressionModel.fromOld(oldModel, this, paramMap, 
categoricalFeatures)
+  }
+}
+
+object RandomForestRegressor {
+  /** Accessor for supported impurity settings: variance */
+  final val supportedImpurities: Array[String] = 
TreeRegressorParams.supportedImpurities
+
+  /** Accessor for supported featureSubsetStrategy settings: auto, all, 
onethird, sqrt, log2 */
+  final val suppo

[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024769
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/DecisionTreeClassifier.scala
 ---
@@ -85,18 +82,16 @@ final class DecisionTreeClassifier
   }
 
   /** (private[ml]) Create a Strategy instance to use with the old API. */
--- End diff --

Is it useful to mention `(private[ml])` in the JavaDoc? This seems to be 
duplicated info.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024757
  
--- Diff: 
mllib/src/test/java/org/apache/spark/ml/classification/JavaGBTClassifierSuite.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.ml.impl.TreeTests;
+import org.apache.spark.mllib.classification.LogisticRegressionSuite;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.sql.DataFrame;
+
+
+public class JavaGBTClassifierSuite implements Serializable {
+
+  private transient JavaSparkContext sc;
+
+  @Before
+  public void setUp() {
+sc = new JavaSparkContext("local", "JavaGBTClassifierSuite");
+  }
+
+  @After
+  public void tearDown() {
+sc.stop();
+sc = null;
+  }
+
+  @Test
+  public void runDT() {
+int nPoints = 20;
+double A = 2.0;
+double B = -1.5;
+
+JavaRDD data = sc.parallelize(
+LogisticRegressionSuite.generateLogisticInputAsList(A, B, nPoints, 
42), 2).cache();
+Map categoricalFeatures = new HashMap();
+DataFrame dataFrame = TreeTests.setMetadata(data, categoricalFeatures, 
2);
+
+// This tests setters. Training with various options is tested in 
Scala.
+GBTClassifier rf = new GBTClassifier()
+.setMaxDepth(2)
+.setMaxBins(10)
+.setMinInstancesPerNode(5)
+.setMinInfoGain(0.0)
+.setMaxMemoryInMB(256)
+.setCacheNodeIds(false)
+.setCheckpointInterval(10)
+.setSubsamplingRate(1.0)
+.setSeed(1234)
+.setMaxIter(3)
+.setStepSize(0.1)
+.setMaxDepth(2); // duplicate setMaxDepth to check builder pattern
+for (int i = 0; i < GBTClassifier.supportedLossTypes().length; ++i) {
--- End diff --

~~~java
for (String lossType: GBTClassifier.supportedLossTypes()) {
 ...
}
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024748
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/impl/tree/treeParams.scala ---
@@ -296,5 +299,194 @@ private[ml] trait TreeRegressorParams extends Params {
 
 private[ml] object TreeRegressorParams {
   // These options should be lowercase.
-  val supportedImpurities: Array[String] = 
Array("variance").map(_.toLowerCase)
+  final val supportedImpurities: Array[String] = 
Array("variance").map(_.toLowerCase)
+}
+
+/**
+ * :: DeveloperApi ::
+ * Parameters for Decision Tree-based ensemble algorithms.
+ *
+ * Note: Marked as private and DeveloperApi since this may be made public 
in the future.
+ */
+@DeveloperApi
+private[ml] trait TreeEnsembleParams extends DecisionTreeParams with 
HasSeed {
+
+  /**
+   * Fraction of the training data used for learning each decision tree.
+   * (default = 1.0)
+   * @group param
+   */
+  final val subsamplingRate: DoubleParam = new DoubleParam(this, 
"subsamplingRate",
+"Fraction of the training data used for learning each decision tree.")
+
+  setDefault(subsamplingRate -> 1.0)
+
+  /** @group setParam */
+  def setSubsamplingRate(value: Double): this.type = {
+require(value > 0.0 && value <= 1.0,
+  s"Subsampling rate must be in range (0,1]. Bad rate: $value")
+set(subsamplingRate, value)
+this
+  }
+
+  /** @group getParam */
+  final def getSubsamplingRate: Double = getOrDefault(subsamplingRate)
+
+  /** @group setParam */
+  def setSeed(value: Long): this.type = {
--- End diff --

`= set(seed.value)` should be sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024762
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import org.scalatest.FunSuite
+
+import org.apache.spark.ml.impl.TreeTests
+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.{EnsembleTestHelper, RandomForest => 
OldRandomForest}
+import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo}
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
+
+
+/**
+ * Test suite for [[RandomForestClassifier]].
+ */
+class RandomForestClassifierSuite extends FunSuite with 
MLlibTestSparkContext {
+
+  import RandomForestClassifierSuite.compareAPIs
+
+  private var orderedLabeledPoints50_1000: RDD[LabeledPoint] = _
+  private var orderedLabeledPoints5_20: RDD[LabeledPoint] = _
+
+  override def beforeAll() {
+super.beforeAll()
+orderedLabeledPoints50_1000 =
+  
sc.parallelize(EnsembleTestHelper.generateOrderedLabeledPoints(numFeatures = 
50, 1000))
+orderedLabeledPoints5_20 =
+  
sc.parallelize(EnsembleTestHelper.generateOrderedLabeledPoints(numFeatures = 5, 
20))
+  }
+
+  
/
+  // Tests calling train()
+  
/
+
+  def binaryClassificationTestWithContinuousFeatures(rf: 
RandomForestClassifier) {
+val categoricalFeatures = Map.empty[Int, Int]
+val numClasses = 2
+val newRF = rf
+  .setImpurity("Gini")
+  .setMaxDepth(2)
+  .setNumTrees(1)
+  .setFeatureSubsetStrategy("auto")
+  .setSeed(123)
+compareAPIs(orderedLabeledPoints50_1000, newRF, categoricalFeatures, 
numClasses)
+  }
+
+  test("Binary classification with continuous features:" +
+" comparing DecisionTree vs. RandomForest(numTrees = 1)") {
+val rf = new RandomForestClassifier()
+binaryClassificationTestWithContinuousFeatures(rf)
+  }
+
+  test("Binary classification with continuous features and node Id cache:" 
+
+" comparing DecisionTree vs. RandomForest(numTrees = 1)") {
+val rf = new RandomForestClassifier()
+  .setCacheNodeIds(true)
+binaryClassificationTestWithContinuousFeatures(rf)
+  }
+
+  test("alternating categorical and continuous features with multiclass 
labels to test indexing") {
+val arr = new Array[LabeledPoint](4)
--- End diff --

~~~scala
val arr = Array(
  LabeledPoint(0.0, Vectors.dense(1.0, 0.0, 0.0, 3.0, 1.0)),
  ...)
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024737
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import com.github.fommil.netlib.BLAS.{getInstance => blas}
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor}
+import org.apache.spark.ml.impl.tree._
+import org.apache.spark.ml.param.{Param, Params, ParamMap}
+import org.apache.spark.ml.regression.DecisionTreeRegressionModel
+import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel}
+import org.apache.spark.ml.util.MetadataUtils
+import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.{GradientBoostedTrees => OldGBT}
+import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo}
+import org.apache.spark.mllib.tree.loss.{Loss => OldLoss, LogLoss => 
OldLogLoss}
+import org.apache.spark.mllib.tree.model.{GradientBoostedTreesModel => 
OldGBTModel}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
+
+
+/**
+ * :: AlphaComponent ::
+ *
+ * [[http://en.wikipedia.org/wiki/Gradient_boosting Gradient-Boosted Trees 
(GBTs)]]
+ * learning algorithm for classification.
+ * It supports binary labels, as well as both continuous and categorical 
features.
+ * Note: Multiclass labels are not currently supported.
+ */
+@AlphaComponent
+final class GBTClassifier
+  extends Predictor[Vector, GBTClassifier, GBTClassificationModel]
+  with GBTParams with TreeClassifierParams with Logging {
+
+  // Override parameter setters from parent trait for Java API 
compatibility.
+
+  // Parameters from TreeClassifierParams:
+
+  override def setMaxDepth(value: Int): this.type = 
super.setMaxDepth(value)
+
+  override def setMaxBins(value: Int): this.type = super.setMaxBins(value)
+
+  override def setMinInstancesPerNode(value: Int): this.type =
+super.setMinInstancesPerNode(value)
+
+  override def setMinInfoGain(value: Double): this.type = 
super.setMinInfoGain(value)
+
+  override def setMaxMemoryInMB(value: Int): this.type = 
super.setMaxMemoryInMB(value)
+
+  override def setCacheNodeIds(value: Boolean): this.type = 
super.setCacheNodeIds(value)
+
+  override def setCheckpointInterval(value: Int): this.type = 
super.setCheckpointInterval(value)
+
+  /**
+   * The impurity setting is ignored for GBT models.
+   * Individual trees are built using impurity "Variance."
+   */
+  override def setImpurity(value: String): this.type = {
+logWarning("GBTClassifier.setImpurity should NOT be used")
+this
+  }
+
+  // Parameters from TreeEnsembleParams:
+
+  override def setSubsamplingRate(value: Double): this.type = 
super.setSubsamplingRate(value)
+
+  override def setSeed(value: Long): this.type = {
+logWarning("The 'seed' parameter is currently ignored by Gradient 
Boosting.")
+super.setSeed(value)
+  }
+
+  // Parameters from GBTParams:
+
+  override def setMaxIter(value: Int): this.type = super.setMaxIter(value)
+
+  override def setStepSize(value: Double): this.type = 
super.setStepSize(value)
+
+  // Parameters for GBTClassifier:
+
+  /**
+   * Loss function which GBT tries to minimize. (case-insensitive)
+   * Supported: "logistic"
+   * (default = logistic)
+   * @group param
+   */
+  val lossType: Param[String] = new Param[String](this, "lossType", "Loss 
function which GBT" +
+" tries to minimize (case-insensitive). Supported options:" +
+s" ${GBTClassifier.supportedLossTypes.mkString(", ")}")
+
+  setDefault(lossT

[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024743
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala
 ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import scala.collection.mutable
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor}
+import org.apache.spark.ml.impl.tree._
+import org.apache.spark.ml.param.{Params, ParamMap}
+import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel}
+import org.apache.spark.ml.util.MetadataUtils
+import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.{RandomForest => OldRandomForest}
+import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo, 
Strategy => OldStrategy}
+import org.apache.spark.mllib.tree.model.{RandomForestModel => 
OldRandomForestModel}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
+
+
+/**
+ * :: AlphaComponent ::
+ *
+ * [[http://en.wikipedia.org/wiki/Random_forest  Random Forest]] learning 
algorithm for
+ * classification.
+ * It supports both binary and multiclass labels, as well as both 
continuous and categorical
+ * features.
+ */
+@AlphaComponent
+final class RandomForestClassifier
+  extends Predictor[Vector, RandomForestClassifier, 
RandomForestClassificationModel]
+  with RandomForestParams with TreeClassifierParams {
+
+  // Override parameter setters from parent trait for Java API 
compatibility.
+
+  // Parameters from TreeClassifierParams:
+
+  override def setMaxDepth(value: Int): this.type = 
super.setMaxDepth(value)
+
+  override def setMaxBins(value: Int): this.type = super.setMaxBins(value)
+
+  override def setMinInstancesPerNode(value: Int): this.type =
+super.setMinInstancesPerNode(value)
+
+  override def setMinInfoGain(value: Double): this.type = 
super.setMinInfoGain(value)
+
+  override def setMaxMemoryInMB(value: Int): this.type = 
super.setMaxMemoryInMB(value)
+
+  override def setCacheNodeIds(value: Boolean): this.type = 
super.setCacheNodeIds(value)
+
+  override def setCheckpointInterval(value: Int): this.type = 
super.setCheckpointInterval(value)
+
+  override def setImpurity(value: String): this.type = 
super.setImpurity(value)
+
+  // Parameters from TreeEnsembleParams:
+
+  override def setSubsamplingRate(value: Double): this.type = 
super.setSubsamplingRate(value)
+
+  override def setSeed(value: Long): this.type = super.setSeed(value)
+
+  // Parameters from RandomForestParams:
+
+  override def setNumTrees(value: Int): this.type = 
super.setNumTrees(value)
+
+  override def setFeatureSubsetStrategy(value: String): this.type =
+super.setFeatureSubsetStrategy(value)
+
+  override protected def train(
+  dataset: DataFrame,
+  paramMap: ParamMap): RandomForestClassificationModel = {
+val categoricalFeatures: Map[Int, Int] =
+  
MetadataUtils.getCategoricalFeatures(dataset.schema(paramMap(featuresCol)))
+val numClasses: Int = 
MetadataUtils.getNumClasses(dataset.schema(paramMap(labelCol))) match {
+  case Some(n: Int) => n
+  case None => throw new 
IllegalArgumentException("RandomForestClassifier was given input" +
+s" with invalid label column, without the number of classes 
specified.")
+  // TODO: Automatically index labels.
+}
+val oldDataset: RDD[LabeledPoint] = extractLabeledPoints(dataset, 
paramMap)
+val strategy =
+  super.getOldStrategy(categoricalFeatures, numClasses, 
OldAlgo.Classification, getOldImpurity)
+val oldModel = OldRandomForest.trainClassifier(
+  oldDataset, strategy, getNumTrees, getFe

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95806380
  
  [Test build #30913 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30913/consoleFull)
 for   PR 5645 at commit 
[`1a32a4b`](https://github.com/apache/spark/commit/1a32a4b5ce740721343915452974c9fb3f9a3910).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024756
  
--- Diff: 
mllib/src/test/java/org/apache/spark/ml/classification/JavaGBTClassifierSuite.java
 ---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification;
+
+import java.io.Serializable;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.ml.impl.TreeTests;
+import org.apache.spark.mllib.classification.LogisticRegressionSuite;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.sql.DataFrame;
+
+
+public class JavaGBTClassifierSuite implements Serializable {
+
+  private transient JavaSparkContext sc;
+
+  @Before
+  public void setUp() {
+sc = new JavaSparkContext("local", "JavaGBTClassifierSuite");
+  }
+
+  @After
+  public void tearDown() {
+sc.stop();
+sc = null;
+  }
+
+  @Test
+  public void runDT() {
+int nPoints = 20;
+double A = 2.0;
+double B = -1.5;
+
+JavaRDD data = sc.parallelize(
+LogisticRegressionSuite.generateLogisticInputAsList(A, B, nPoints, 
42), 2).cache();
--- End diff --

2-space indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024746
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/impl/tree/treeParams.scala ---
@@ -296,5 +299,194 @@ private[ml] trait TreeRegressorParams extends Params {
 
 private[ml] object TreeRegressorParams {
   // These options should be lowercase.
-  val supportedImpurities: Array[String] = 
Array("variance").map(_.toLowerCase)
+  final val supportedImpurities: Array[String] = 
Array("variance").map(_.toLowerCase)
+}
+
+/**
+ * :: DeveloperApi ::
+ * Parameters for Decision Tree-based ensemble algorithms.
+ *
+ * Note: Marked as private and DeveloperApi since this may be made public 
in the future.
+ */
+@DeveloperApi
+private[ml] trait TreeEnsembleParams extends DecisionTreeParams with 
HasSeed {
+
+  /**
+   * Fraction of the training data used for learning each decision tree.
+   * (default = 1.0)
+   * @group param
+   */
+  final val subsamplingRate: DoubleParam = new DoubleParam(this, 
"subsamplingRate",
+"Fraction of the training data used for learning each decision tree.")
+
+  setDefault(subsamplingRate -> 1.0)
+
+  /** @group setParam */
+  def setSubsamplingRate(value: Double): this.type = {
+require(value > 0.0 && value <= 1.0,
+  s"Subsampling rate must be in range (0,1]. Bad rate: $value")
+set(subsamplingRate, value)
+this
--- End diff --

`this` is not required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95806388
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30913/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024741
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/RandomForestClassifier.scala
 ---
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import scala.collection.mutable
+
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor}
+import org.apache.spark.ml.impl.tree._
+import org.apache.spark.ml.param.{Params, ParamMap}
+import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel}
+import org.apache.spark.ml.util.MetadataUtils
+import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.{RandomForest => OldRandomForest}
+import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo, 
Strategy => OldStrategy}
+import org.apache.spark.mllib.tree.model.{RandomForestModel => 
OldRandomForestModel}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
+
+
+/**
+ * :: AlphaComponent ::
+ *
+ * [[http://en.wikipedia.org/wiki/Random_forest  Random Forest]] learning 
algorithm for
+ * classification.
+ * It supports both binary and multiclass labels, as well as both 
continuous and categorical
+ * features.
+ */
+@AlphaComponent
+final class RandomForestClassifier
+  extends Predictor[Vector, RandomForestClassifier, 
RandomForestClassificationModel]
+  with RandomForestParams with TreeClassifierParams {
+
+  // Override parameter setters from parent trait for Java API 
compatibility.
+
+  // Parameters from TreeClassifierParams:
+
+  override def setMaxDepth(value: Int): this.type = 
super.setMaxDepth(value)
+
+  override def setMaxBins(value: Int): this.type = super.setMaxBins(value)
+
+  override def setMinInstancesPerNode(value: Int): this.type =
+super.setMinInstancesPerNode(value)
+
+  override def setMinInfoGain(value: Double): this.type = 
super.setMinInfoGain(value)
+
+  override def setMaxMemoryInMB(value: Int): this.type = 
super.setMaxMemoryInMB(value)
+
+  override def setCacheNodeIds(value: Boolean): this.type = 
super.setCacheNodeIds(value)
+
+  override def setCheckpointInterval(value: Int): this.type = 
super.setCheckpointInterval(value)
+
+  override def setImpurity(value: String): this.type = 
super.setImpurity(value)
+
+  // Parameters from TreeEnsembleParams:
+
+  override def setSubsamplingRate(value: Double): this.type = 
super.setSubsamplingRate(value)
+
+  override def setSeed(value: Long): this.type = super.setSeed(value)
+
+  // Parameters from RandomForestParams:
+
+  override def setNumTrees(value: Int): this.type = 
super.setNumTrees(value)
+
+  override def setFeatureSubsetStrategy(value: String): this.type =
+super.setFeatureSubsetStrategy(value)
+
+  override protected def train(
+  dataset: DataFrame,
+  paramMap: ParamMap): RandomForestClassificationModel = {
+val categoricalFeatures: Map[Int, Int] =
+  
MetadataUtils.getCategoricalFeatures(dataset.schema(paramMap(featuresCol)))
+val numClasses: Int = 
MetadataUtils.getNumClasses(dataset.schema(paramMap(labelCol))) match {
+  case Some(n: Int) => n
+  case None => throw new 
IllegalArgumentException("RandomForestClassifier was given input" +
+s" with invalid label column, without the number of classes 
specified.")
--- End diff --

Mention the label column name in the error message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ti

[GitHub] spark pull request: [SQL] Fixed expression data type matching.

2015-04-23 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5675#discussion_r29024715
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -40,32 +40,46 @@ import org.apache.spark.util.Utils
  */
 @DeveloperApi
 abstract class DataType {
-  /** Matches any expression that evaluates to this DataType */
-  def unapply(a: Expression): Boolean = a match {
+  /**
+   * Enables matching against NumericType for expressions:
--- End diff --

ah yes - I will fix that.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread sarutak
Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/2342#discussion_r29024688
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -17,17 +17,172 @@
 
 package org.apache.spark.ui.jobs
 
-import scala.xml.{Node, NodeSeq}
+import scala.collection.mutable.{HashMap, ListBuffer}
+import scala.xml.{Node, NodeSeq, Unparsed}
 
+import java.util.Date
 import javax.servlet.http.HttpServletRequest
 
-import org.apache.spark.ui.{WebUIPage, UIUtils}
-import org.apache.spark.ui.jobs.UIData.JobUIData
+import org.apache.spark.ui.{UIUtils, WebUIPage}
+import org.apache.spark.ui.jobs.UIData.{ExecutorUIData, JobUIData}
+import org.apache.spark.JobExecutionStatus
 
 /** Page showing list of all ongoing and recently finished jobs */
 private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage("") {
-  private val startTime: Option[Long] = parent.sc.map(_.startTime)
-  private val listener = parent.listener
+  private val JOBS_LEGEND =
+
+  
+  Succeeded Job
+  
+  Failed Job
+  
+  Running Job
+.toString.filter(_ != '\n')
+
+  private val EXECUTORS_LEGEND =
+
+  
--- End diff --

I think stroke and fill can work. I'll address it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5677#issuecomment-95805563
  
  [Test build #30911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30911/consoleFull)
 for   PR 5677 at commit 
[`ebadaa9`](https://github.com/apache/spark/commit/ebadaa9798752498004fc3bc53de07ed53b49f7b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5677#issuecomment-95805571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30911/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Fixed expression data type matching.

2015-04-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/5675#discussion_r29024422
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -40,32 +40,46 @@ import org.apache.spark.util.Utils
  */
 @DeveloperApi
 abstract class DataType {
-  /** Matches any expression that evaluates to this DataType */
-  def unapply(a: Expression): Boolean = a match {
+  /**
+   * Enables matching against NumericType for expressions:
--- End diff --

typo? Seems it should be DataType.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7103][Spark Core]Verify patitionors are...

2015-04-23 Thread vinodkc
GitHub user vinodkc opened a pull request:

https://github.com/apache/spark/pull/5678

[SPARK-7103][Spark Core]Verify patitionors are available in all RDDs used  
in PartitionerAwareUnionRDD



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vinodkc/spark fix_unionRDD_without_partition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5678


commit 8f414d145569041b30766be6a9c6880297303b3c
Author: Vinod K C 
Date:   2015-04-24T07:52:16Z

Verify patitionors of  RDDs in union




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7103][Spark Core]Verify patitionors are...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5678#issuecomment-95805275
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-95805052
  
  [Test build #30912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30912/consoleFull)
 for   PR 5647 at commit 
[`9903837`](https://github.com/apache/spark/commit/990383761841b444506e91f3052c2de3736d6052).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5687][Core]TaskResultGetter needs to ca...

2015-04-23 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4474#issuecomment-95805067
  
I'd be happy to have a patch that kills the JVM when this occurs, with a 
warning message logged. I didn't realize in your original submission that this 
was actually just killing the thread but allowing the JVM to survive.

Really, once we are out of memory, we should coerce the JVM to terminate. I 
agree that is better than having a silent thread death.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-95805062
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30912/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-04-23 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/5643#discussion_r29024298
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastLeftSemiJoinHash.scala
 ---
@@ -32,36 +32,69 @@ case class BroadcastLeftSemiJoinHash(
 leftKeys: Seq[Expression],
 rightKeys: Seq[Expression],
 left: SparkPlan,
-right: SparkPlan) extends BinaryNode with HashJoin {
+right: SparkPlan,
+condition: Option[Expression]) extends BinaryNode with HashJoin {
 
   override val buildSide: BuildSide = BuildRight
 
   override def output: Seq[Attribute] = left.output
 
+  @transient private lazy val boundCondition =
+condition.map(newPredicate(_, left.output ++ 
right.output)).getOrElse((row: Row) => true)
--- End diff --

`newPredicate(condition.getOrElse(Literal(true)), left.output ++ 
right.output)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...

2015-04-23 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/5665#issuecomment-95804784
  
`newPredicate(condition.getOrElse(Literal(true)), left.output ++ 
right.output)`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2342#discussion_r29024275
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -17,17 +17,172 @@
 
 package org.apache.spark.ui.jobs
 
-import scala.xml.{Node, NodeSeq}
+import scala.collection.mutable.{HashMap, ListBuffer}
+import scala.xml.{Node, NodeSeq, Unparsed}
 
+import java.util.Date
 import javax.servlet.http.HttpServletRequest
 
-import org.apache.spark.ui.{WebUIPage, UIUtils}
-import org.apache.spark.ui.jobs.UIData.JobUIData
+import org.apache.spark.ui.{UIUtils, WebUIPage}
+import org.apache.spark.ui.jobs.UIData.{ExecutorUIData, JobUIData}
+import org.apache.spark.JobExecutionStatus
 
 /** Page showing list of all ongoing and recently finished jobs */
 private[ui] class AllJobsPage(parent: JobsTab) extends WebUIPage("") {
-  private val startTime: Option[Long] = parent.sc.map(_.startTime)
-  private val listener = parent.listener
+  private val JOBS_LEGEND =
+
+  
+  Succeeded Job
+  
+  Failed Job
+  
+  Running Job
+.toString.filter(_ != '\n')
+
+  private val EXECUTORS_LEGEND =
+
+  
--- End diff --

I only really care about the colors (stroke and fill) are you sure it does 
not work for those?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5637#issuecomment-95804369
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30906/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5637#issuecomment-95804360
  
  [Test build #30906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30906/consoleFull)
 for   PR 5637 at commit 
[`ab38c71`](https://github.com/apache/spark/commit/ab38c71356c23d63ca9f3990c8c0f0b8e8fc7976).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3468][WebUI] Timeline-View feature

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2342#issuecomment-95803737
  
  [Test build #30916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30916/consoleFull)
 for   PR 2342 at commit 
[`d3c63c8`](https://github.com/apache/spark/commit/d3c63c84a56041756841dd0706d87c8c808e84d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6113] [ml] Tree ensembles for Pipelines...

2015-04-23 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/5626#discussion_r29024038
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala ---
@@ -0,0 +1,225 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import com.github.fommil.netlib.BLAS.{getInstance => blas}
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.AlphaComponent
+import org.apache.spark.ml.impl.estimator.{PredictionModel, Predictor}
+import org.apache.spark.ml.impl.tree._
+import org.apache.spark.ml.param.{Param, Params, ParamMap}
+import org.apache.spark.ml.regression.DecisionTreeRegressionModel
+import org.apache.spark.ml.tree.{DecisionTreeModel, TreeEnsembleModel}
+import org.apache.spark.ml.util.MetadataUtils
+import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.tree.{GradientBoostedTrees => OldGBT}
+import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo}
+import org.apache.spark.mllib.tree.loss.{Loss => OldLoss, LogLoss => 
OldLogLoss}
+import org.apache.spark.mllib.tree.model.{GradientBoostedTreesModel => 
OldGBTModel}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.DataFrame
+
+
+/**
+ * :: AlphaComponent ::
+ *
+ * [[http://en.wikipedia.org/wiki/Gradient_boosting Gradient-Boosted Trees 
(GBTs)]]
+ * learning algorithm for classification.
+ * It supports binary labels, as well as both continuous and categorical 
features.
+ * Note: Multiclass labels are not currently supported.
+ */
+@AlphaComponent
+final class GBTClassifier
+  extends Predictor[Vector, GBTClassifier, GBTClassificationModel]
+  with GBTParams with TreeClassifierParams with Logging {
+
+  // Override parameter setters from parent trait for Java API 
compatibility.
+
+  // Parameters from TreeClassifierParams:
+
+  override def setMaxDepth(value: Int): this.type = 
super.setMaxDepth(value)
+
+  override def setMaxBins(value: Int): this.type = super.setMaxBins(value)
+
+  override def setMinInstancesPerNode(value: Int): this.type =
+super.setMinInstancesPerNode(value)
+
+  override def setMinInfoGain(value: Double): this.type = 
super.setMinInfoGain(value)
+
+  override def setMaxMemoryInMB(value: Int): this.type = 
super.setMaxMemoryInMB(value)
+
+  override def setCacheNodeIds(value: Boolean): this.type = 
super.setCacheNodeIds(value)
+
+  override def setCheckpointInterval(value: Int): this.type = 
super.setCheckpointInterval(value)
+
+  /**
+   * The impurity setting is ignored for GBT models.
+   * Individual trees are built using impurity "Variance."
+   */
+  override def setImpurity(value: String): this.type = {
+logWarning("GBTClassifier.setImpurity should NOT be used")
+this
+  }
+
+  // Parameters from TreeEnsembleParams:
+
+  override def setSubsamplingRate(value: Double): this.type = 
super.setSubsamplingRate(value)
+
+  override def setSeed(value: Long): this.type = {
+logWarning("The 'seed' parameter is currently ignored by Gradient 
Boosting.")
+super.setSeed(value)
+  }
+
+  // Parameters from GBTParams:
+
+  override def setMaxIter(value: Int): this.type = super.setMaxIter(value)
+
+  override def setLearningRate(value: Double): this.type = 
super.setLearningRate(value)
+
+  // Parameters for GBTClassifier:
+
+  /**
+   * Loss function which GBT tries to minimize. (case-insensitive)
+   * Supported: "LogLoss"
+   * (default = LogLoss)
+   * @group param
+   */
+  val loss: Param[String] = new Param[String](this, "loss", "Loss function 
which GBT tries to" +
+" minimize (case-insensitive). Supported options: LogLoss")
+
+  setDefault(loss -> "logloss")
+
+  /** @group setParam */
+  def

[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5643#issuecomment-95800072
  
  [Test build #30905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30905/consoleFull)
 for   PR 5643 at commit 
[`d29f9a6`](https://github.com/apache/spark/commit/d29f9a640a9882fd469d995a7ecd92b230cd8a65).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5643#issuecomment-95800074
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30905/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6852][SPARKR] Accept numeric as numPart...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5613#issuecomment-95799955
  
  [Test build #30907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30907/consoleFull)
 for   PR 5613 at commit 
[`abaf02e`](https://github.com/apache/spark/commit/abaf02e611359102f3117e3fa484923155f3f314).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6852][SPARKR] Accept numeric as numPart...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5613#issuecomment-95799978
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30907/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2015-04-23 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4015#issuecomment-95799659
  
@marmbrus Any more comment on this before merging? It will be great 
appreciated if you merge this soon, as I did take lots of time in rebase again 
and again. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5354#issuecomment-95799433
  
  [Test build #30915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30915/consoleFull)
 for   PR 5354 at commit 
[`0eefe4d`](https://github.com/apache/spark/commit/0eefe4d46c0a42859b8c9c0bc0ff98a0beeb440a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6122][Core] Upgrade tachyon-client vers...

2015-04-23 Thread calvinjia
Github user calvinjia commented on the pull request:

https://github.com/apache/spark/pull/5354#issuecomment-95799161
  
@srowen 
I appreciate the feedback, and I've cleaned up the httpclient versions as 
you suggested. 
Do you have any other comments? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...

2015-04-23 Thread chenghao-intel
Github user chenghao-intel closed the pull request at:

https://github.com/apache/spark/pull/5671


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...

2015-04-23 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/5671#issuecomment-95798871
  
Thank you @rxin, closing since it's merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95798479
  
  [Test build #30909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30909/consoleFull)
 for   PR 5645 at commit 
[`e0d19fb`](https://github.com/apache/spark/commit/e0d19fb1f0e6472d3e0ca55223c36ed506f32709).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95798495
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30909/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95798520
  
  [Test build #30914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30914/consoleFull)
 for   PR 5645 at commit 
[`d7cd15b`](https://github.com/apache/spark/commit/d7cd15b5cef64766a432918e54cca4750d13745b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7084] improve saveAsTable documentation

2015-04-23 Thread phatak-dev
Github user phatak-dev commented on the pull request:

https://github.com/apache/spark/pull/5654#issuecomment-95798250
  
Added for other methods also.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95798234
  
  [Test build #30913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30913/consoleFull)
 for   PR 5645 at commit 
[`1a32a4b`](https://github.com/apache/spark/commit/1a32a4b5ce740721343915452974c9fb3f9a3910).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6435] spark-shell --jars option does no...

2015-04-23 Thread tsudukim
Github user tsudukim commented on the pull request:

https://github.com/apache/spark/pull/5227#issuecomment-95798208
  
I was checking about the `SparkLauncherSuite` on Windows as vanzin's 
comment, and faced some trouble. It seems not to related with this PR, but I'm 
not sure yet. Please give me more time a little. When I resolve the problem, 
I'll rebase this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5676#issuecomment-95797914
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30908/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...

2015-04-23 Thread shroffpradyumn
Github user shroffpradyumn commented on the pull request:

https://github.com/apache/spark/pull/5547#issuecomment-95797940
  
Thank you all for your feedback, and I apologize for my late reply (it’s 
been a rough week of midterms).

@pwendell - I’ve addressed all your inline comments (memoization, 
Javascript indentation, JSON lists, etc.) in my latest commit. As per the load 
time of the graph, it’s improved a bit after moving from string 
representations to JSON arrays, but only by a small factor.

When you say you’re skeptical about the graph scalability, what is the 
maximum number of tasks you want displayed on the graph? I’m thinking of 
keeping it to 1000 (at the most), and having the users select a task range if 
they want to view a different region of tasks (say tasks 1200-2000 for example).

My reason for the above is that the task stages become too cluttered above 
a certain number, so it’s better to keep a limit, or alternatively, increase 
the max height of the graph (which would involve a lot more scrolling though).

@andrewor14  - The visualization doesn’t currently support zooming, and 
it will definitely be pretty challenging to implement it on top of D3.js. 
However, the task-range functionality I mentioned above can serve as a 
pseudo-zoom feature since a user can select a task range and hence zoom into 
the graph.

Also, breaking down the task times along the vertical axis shouldn’t be 
that difficult so we can definitely add that later on if required (provided 
this patch gets accepted haha).

@punya - I haven’t looked into using Amber yet, and I’ll definitely 
check out plottable.js.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5676#issuecomment-95797907
  
  [Test build #30908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30908/consoleFull)
 for   PR 5676 at commit 
[`1693b54`](https://github.com/apache/spark/commit/1693b54f209a17ebb6bed449f81840737f97366a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Fixed expression data type matching.

2015-04-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5675


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5667#issuecomment-95797532
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30903/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5667#issuecomment-95797525
  
  [Test build #30903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30903/consoleFull)
 for   PR 5667 at commit 
[`9d2295e`](https://github.com/apache/spark/commit/9d2295e73046fca9e0134876a19f9638336d7023).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Fixed expression data type matching.

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5675#issuecomment-95797394
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30902/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SQL] Fixed expression data type matching.

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5675#issuecomment-95797389
  
  [Test build #30902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30902/consoleFull)
 for   PR 5675 at commit 
[`0f31856`](https://github.com/apache/spark/commit/0f31856d170102ec4a7d19e9da488726c2a37bb5).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7092] Update spark scala version to 2.1...

2015-04-23 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/5662#discussion_r29022546
  
--- Diff: 
repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkIMain.scala ---
@@ -1129,7 +1129,7 @@ class SparkIMain(@BeanProperty val factory: 
ScriptEngineFactory, initialSettings
 
 def apply(line: String): Result = debugging(s"""parse("$line")""")  {
   var isIncomplete = false
-  currentRun.reporting.withIncompleteHandler((_, _) => isIncomplete = 
true) {
+  currentRun.parsing.withIncompleteHandler((_, _) => isIncomplete = 
true) {
--- End diff --

This was the change, it corresponds to 
https://github.com/scala/scala/commit/64ebac245d58221814f9c9375927e3f2e7a2d4f0



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7093][SQL] Using newPredicate in Nested...

2015-04-23 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/5665#issuecomment-95792030
  
/cc @rxin @liancheng


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-95791867
  
  [Test build #30912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30912/consoleFull)
 for   PR 5647 at commit 
[`9903837`](https://github.com/apache/spark/commit/990383761841b444506e91f3052c2de3736d6052).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-23 Thread FlytxtRnD
Github user FlytxtRnD commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-95791587
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...

2015-04-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5671#issuecomment-95791377
  
Can you close the PR? Since it was not merged into master, github won't 
close this automatically.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update sql-programming-guide.md

2015-04-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5674


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Update sql-programming-guide.md

2015-04-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5674#issuecomment-95791355
  
Thanks. I've merged this in master & branch-1.3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5677#issuecomment-95790916
  
  [Test build #30911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30911/consoleFull)
 for   PR 5677 at commit 
[`ebadaa9`](https://github.com/apache/spark/commit/ebadaa9798752498004fc3bc53de07ed53b49f7b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7033][SPARKR] Clean usage of split. Use...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5628#issuecomment-95789973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30899/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...

2015-04-23 Thread scwf
Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/5677#issuecomment-95789809
  
i will try to add a test case for this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7033][SPARKR] Clean usage of split. Use...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5628#issuecomment-95789945
  
  [Test build #30899 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30899/consoleFull)
 for   PR 5628 at commit 
[`046bc9e`](https://github.com/apache/spark/commit/046bc9e4e664ad36903d7e6bcf832912c53c53f8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7109] [SQL] Push down left side filter ...

2015-04-23 Thread scwf
GitHub user scwf opened a pull request:

https://github.com/apache/spark/pull/5677

[SPARK-7109] [SQL] Push down left side filter for left semi join

Now in spark sql optimizer we only push down right side filter for left 
semi join, actually we can push down left side filter.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scwf/spark leftsemi

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5677


commit ebadaa9798752498004fc3bc53de07ed53b49f7b
Author: wangfei 
Date:   2015-04-24T03:33:01Z

left filter push down for left semi join




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5547#issuecomment-95789435
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30910/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5547#issuecomment-95789429
  
  [Test build #30910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30910/consoleFull)
 for   PR 5547 at commit 
[`5c3a2a6`](https://github.com/apache/spark/commit/5c3a2a697fca83d6de843850e786cf3406c4bd5a).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6418] Add simple per-stage visualizatio...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5547#issuecomment-95789315
  
  [Test build #30910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30910/consoleFull)
 for   PR 5547 at commit 
[`5c3a2a6`](https://github.com/apache/spark/commit/5c3a2a697fca83d6de843850e786cf3406c4bd5a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6924][YARN] Fix driver hangs in yarn-cl...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5663#issuecomment-95789182
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30898/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6924][YARN] Fix driver hangs in yarn-cl...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5663#issuecomment-95789176
  
  [Test build #30898 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30898/consoleFull)
 for   PR 5663 at commit 
[`cf80049`](https://github.com/apache/spark/commit/cf8004938e6078bf370fcbe22ad39ea05913ec66).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5542#issuecomment-95788877
  
  [Test build #30901 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30901/consoleFull)
 for   PR 5542 at commit 
[`71f1bd5`](https://github.com/apache/spark/commit/71f1bd538b3e0befead2d1d592ce12990cb9b417).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4233] [SQL] [WIP] UDAF Interface Refact...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5542#issuecomment-9573
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30901/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5645#issuecomment-95788717
  
  [Test build #30909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30909/consoleFull)
 for   PR 5645 at commit 
[`e0d19fb`](https://github.com/apache/spark/commit/e0d19fb1f0e6472d3e0ca55223c36ed506f32709).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5676#issuecomment-95787694
  
  [Test build #30908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30908/consoleFull)
 for   PR 5676 at commit 
[`1693b54`](https://github.com/apache/spark/commit/1693b54f209a17ebb6bed449f81840737f97366a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7031][ThriftServer]let thrift server ta...

2015-04-23 Thread WangTaoTheTonic
Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/5609#issuecomment-95787561
  
BTW I have tested on my cluster with setting 
>export SPARK_DAEMON_MEMORY=m
export SPARK_DAEMON_JAVA_OPTS=" -Dx=y "

in spark-env.sh.
Before this patch the jinfo shows:
>
VM Flags:
-Xms512m -Xmx512m -XX:MaxPermSize=128m

After:
>VM Flags:
-Dx=y -Xmsm -Xmxm -XX:MaxPermSize=128m


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6891] Fix the bug that ExecutorAllocati...

2015-04-23 Thread ArcherShao
GitHub user ArcherShao opened a pull request:

https://github.com/apache/spark/pull/5676

[SPARK-6891] Fix the bug that ExecutorAllocationManager will request 
negative number executors

In ExecutorAllocationManager, executor allocate schedule at a fix 
rate(100ms), it will call the method 'addOrCancelExecutorRequests' first, and 
then remove expired excutors. 
Suppose at time T, no task is running or pending, and there a 5 executors 
runing, but all expired. 
1. the method 'addOrCancelExecutorRequests'  wiill be called, and the value 
of 'ExecutorAllocationManager.numExecutorsPending' will update to -5. 
2. remove 5 expired excutors.
Suppose still no task is running or pending at T+1, the method 
'targetNumExecutors' will return -5, and method 'addExecutors' will be called,

private def addExecutors(maxNumExecutorsNeeded: Int): Int = {
val currentTarget = targetNumExecutors

val actualMaxNumExecutors = math.min(maxNumExecutors, 
maxNumExecutorsNeeded)
val newTotalExecutors = math.min(currentTarget + numExecutorsToAdd, 
actualMaxNumExecutors)
val addRequestAcknowledged = testing || 
client.requestTotalExecutors(newTotalExecutors)

  }

newTotalExecutors will be a negative number, when 
client.requestTotalExecutors(newTotalExecutors) called, it will throw an 
exception. 

Let method 'targetNumExecutors' return a value not less than 
minNumExecutors, then the newTotalExecutors will never be negative. 

And targetNumExecutors not less than minNumExecutors is also make sense.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ArcherShao/spark SPARK-6891

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5676


commit 1693b54f209a17ebb6bed449f81840737f97366a
Author: ArcherShao 
Date:   2015-04-24T00:59:59Z

[SPARK-6891] Fix the bug that ExecutorAllocationManager will request 
negative number executors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6852][SPARKR] Accept numeric as numPart...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5613#issuecomment-95786756
  
  [Test build #30907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30907/consoleFull)
 for   PR 5613 at commit 
[`abaf02e`](https://github.com/apache/spark/commit/abaf02e611359102f3117e3fa484923155f3f314).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5637#issuecomment-95786710
  
  [Test build #30906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30906/consoleFull)
 for   PR 5637 at commit 
[`ab38c71`](https://github.com/apache/spark/commit/ab38c71356c23d63ca9f3990c8c0f0b8e8fc7976).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2015-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4015#issuecomment-95786382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30897/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2015-04-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4015#issuecomment-95786341
  
  [Test build #30897 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30897/consoleFull)
 for   PR 4015 at commit 
[`81a731f`](https://github.com/apache/spark/commit/81a731f9f9a4eb828deb8d5bcc344bd28221a763).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7044][SQL] Fix the deadlock in ScriptTr...

2015-04-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5671#issuecomment-95785614
  
Thanks. I've merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...

2015-04-23 Thread zhzhan
Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/5637#issuecomment-95784598
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >