[GitHub] spark pull request: [SPARK-14345][SQL] Decouple deserializer expre...

2016-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14345][SQL] Decouple deserializer expre...

2016-04-05 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/12131#issuecomment-205918635
  
LGTM, thanks for improving the comments.  Its much clearer to me what is 
happing now!]

Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14397][WEBUI] and tags ar...

2016-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...

2016-04-05 Thread pravingadakh
Github user pravingadakh commented on the pull request:

https://github.com/apache/spark/pull/12176#issuecomment-205918384
  
I could document the returned values, but frankly I have no idea what those 
values are. I can see the documentation of first returned value `gammad` in the 
comments but none for others. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14397][WEBUI] and tags ar...

2016-04-05 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/12170#issuecomment-205917914
  
LGTM. Merging to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13538][ML] Add GaussianMixture to ML

2016-04-05 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/11419#issuecomment-205916377
  
Haha OK thanks.  I just sent a PR to update this PR: 
[https://github.com/zhengruifeng/spark/pull/1]


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12087#discussion_r58582999
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkContext
+import org.apache.spark.api.java.function.MapFunction
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark for Dataset typed operations.
+ */
+object DatasetBenchmark {
+
+  case class Data(i: Int, s: String)
+
+  def main(args: Array[String]): Unit = {
+val sparkContext = new SparkContext("local[*]", "benchmark")
+val sqlContext = new SQLContext(sparkContext)
+
+import sqlContext.implicits._
+
+val numRows = 1000
+val ds = sqlContext.range(numRows).map(l => Data(l.toInt, l.toString))
+ds.cache()
+ds.collect() // make sure data are cached
+
+val benchmark = new Benchmark("Dataset.map", numRows)
+
+val scalaFunc = (d: Data) => Data(d.i + 1, d.s)
+benchmark.addCase("scala function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(scalaFunc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
+}
+
+val javaFunc = new MapFunction[Data, Data] {
+  override def call(d: Data): Data = Data(d.i + 1, d.s)
+}
+val enc = implicitly[Encoder[Data]]
+benchmark.addCase("java function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(javaFunc, enc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
--- End diff --

Okay... its harder than I thought to run it on master :(

At least base the benchmark on these tests: 
https://github.com/databricks/spark-sql-perf/blob/master/src/main/scala/com/databricks/spark/sql/perf/DatasetPerformance.scala

(back to back maps, compare with RDDs)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14284][ML] KMeansSummary deprecating si...

2016-04-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12084


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12166#issuecomment-205914706
  
**[Test build #55001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55001/consoleFull)**
 for PR 12166 at commit 
[`59904c4`](https://github.com/apache/spark/commit/59904c441a57a22465e3a2b338f1867ad97f5bdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14290][CORE][Network] avoid significant...

2016-04-05 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/12083#issuecomment-205914342
  
@liyezhang556520 I like the idea of eargerly copying into a direct buffer, 
but understand that might be a lot of code for not much gain. I still think we 
should reduce that limit though - maybe 256k?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14284][ML] KMeansSummary deprecating si...

2016-04-05 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12084#issuecomment-205914146
  
LGTM
Merging with master
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

2016-04-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12166#discussion_r58582117
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
@@ -619,6 +651,31 @@ class DistributedLDAModel private[ml] (
   @Since("1.6.0")
   lazy val logPrior: Double = oldDistributedModel.logPrior
 
+  private var _checkpointFiles: Array[String] = 
oldDistributedModel.checkpointFiles
+
+  /**
+   * If using checkpointing and [[LDA.keepLastCheckpoint]] is set to true, 
then there may be
+   * saved checkpoint files.  This method is provided so that users can 
manage those files.
+   * Note that removing the checkpoints can cause failures if a partition 
is lost and is needed
+   * by certain [[DistributedLDAModel]] methods.
+   *
+   * @return  Checkpoint files from training
+   */
+  @Since("2.0.0")
+  def getCheckpointFiles: Array[String] = _checkpointFiles
--- End diff --

I'd like to give users a way to clean up manually, but I'll mark it as 
DeveloperApi.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

2016-04-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12166#discussion_r58582132
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
@@ -758,6 +816,10 @@ class LDA @Since("1.6.0") (
   @Since("1.6.0")
   def setOptimizeDocConcentration(value: Boolean): this.type = 
set(optimizeDocConcentration, value)
 
+  /** @group expertSetParam */
+  @Since("2.0.0")
+  def setKeepLastCheckpoint(value: Boolean): this.type = 
set(keepLastCheckpoint, value)
+
--- End diff --

I don't think so.  Once there is a model, the decision about keeping the 
last checkpoint has already been made.  Users can manage the checkpoint via the 
deletion method though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

2016-04-05 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12166#issuecomment-205913961
  
Updated!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13048][ML][MLLIB] keepLastCheckpoint op...

2016-04-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12166#discussion_r58582125
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
@@ -258,7 +265,30 @@ private[clustering] trait LDAParams extends Params 
with HasFeaturesCol with HasM
   def getOptimizeDocConcentration: Boolean = $(optimizeDocConcentration)
 
   /**
+   * For EM optimizer, if using checkpointing, this indicates whether to 
keep the last
+   * checkpoint. If false, then the checkpoint will be deleted. Deleting 
the checkpoint can
+   * cause failures if a data partition is lost, so set this bit with care.
+   *
+   * See [[DistributedLDAModel.getCheckpointFiles]] for getting remaining 
checkpoints and
+   * [[DistributedLDAModel.deleteCheckpointFiles]] for removing remaining 
checkpoints.
+   *
--- End diff --

Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14402][SQL] initcap UDF doesn't match H...

2016-04-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12175#issuecomment-205913234
  
Thank you, @srowen !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14402][SQL] initcap UDF doesn't match H...

2016-04-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12175#issuecomment-205910764
  
I think that's pretty reasonable as a minimally invasive fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12049#issuecomment-205910352
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12049#issuecomment-205910356
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54994/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12049#issuecomment-205909806
  
**[Test build #54994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54994/consoleFull)**
 for PR 12049 at commit 
[`48d760e`](https://github.com/apache/spark/commit/48d760eed41a3d559ad8aa6363b6000d4b9ed54d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class StreamingRelation(dataSource: DataSource, sourceName: 
String, output: Seq[Attribute])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-14402][CORE] initcap UDF doesn't m...

2016-04-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12175#issuecomment-205902962
  
Hi, @srowen . I minimized the change on master.
* Undo the changes on `common` module.
* Implement `initCap` by the following changes.
```
   override def nullSafeEval(string: Any): Any = {
-string.asInstanceOf[UTF8String].toTitleCase
+string.asInstanceOf[UTF8String].toLowerCase.toTitleCase
   }
   override def genCode(ctx: CodegenContext, ev: ExprCode): String = {
-defineCodeGen(ctx, ev, str => s"$str.toTitleCase()")
+defineCodeGen(ctx, ev, str => s"$str.toLowerCase().toTitleCase()")
   }
```
I think it's enough for `initCap` function as a small fix for now. How do 
you think about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/12087#discussion_r58577527
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkContext
+import org.apache.spark.api.java.function.MapFunction
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark for Dataset typed operations.
+ */
+object DatasetBenchmark {
+
+  case class Data(i: Int, s: String)
+
+  def main(args: Array[String]): Unit = {
+val sparkContext = new SparkContext("local[*]", "benchmark")
+val sqlContext = new SQLContext(sparkContext)
+
+import sqlContext.implicits._
+
+val numRows = 1000
+val ds = sqlContext.range(numRows).map(l => Data(l.toInt, l.toString))
+ds.cache()
+ds.collect() // make sure data are cached
+
+val benchmark = new Benchmark("Dataset.map", numRows)
+
+val scalaFunc = (d: Data) => Data(d.i + 1, d.s)
+benchmark.addCase("scala function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(scalaFunc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
+}
+
+val javaFunc = new MapFunction[Data, Data] {
+  override def call(d: Data): Data = Data(d.i + 1, d.s)
+}
+val enc = implicitly[Encoder[Data]]
+benchmark.addCase("java function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(javaFunc, enc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
--- End diff --

This is doing the expensive conversion to external rows.  Could you try to 
run the existing benchmark which avoids this and also compares against RDDs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP][SPARK-14402][CORE] initcap UDF doesn't m...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12175#issuecomment-205902930
  
**[Test build #55000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55000/consoleFull)**
 for PR 12175 at commit 
[`69c6e1c`](https://github.com/apache/spark/commit/69c6e1c5e02ef40ca8c1ba9c04cfa01e43386c5f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14407][SQL] Hides HadoopFsRelation rela...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12178#issuecomment-205900906
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14129][SQL] Alter table DDL commands

2016-04-05 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12121#issuecomment-205901169
  
I think this PR also resolves another JIRA: 
https://issues.apache.org/jira/browse/SPARK-14128 Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14407][SQL] Hides HadoopFsRelation rela...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12178#issuecomment-205900843
  
**[Test build #54997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54997/consoleFull)**
 for PR 12178 at commit 
[`64b7cf4`](https://github.com/apache/spark/commit/64b7cf487c59aee3217aae37733ff9879dd79c2c).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class OutputWriterFactory extends Serializable `
  * `abstract class OutputWriter `
  * `case class HadoopFsRelation(`
  * `trait FileFormat `
  * `case class Partition(values: InternalRow, files: Seq[FileStatus])`
  * `trait FileCatalog `
  * `class HDFSFileCatalog(`
  * `  case class FakeFileStatus(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14407][SQL] Hides HadoopFsRelation rela...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12178#issuecomment-205900916
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54997/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12087#discussion_r58576642
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkContext
+import org.apache.spark.api.java.function.MapFunction
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark for Dataset typed operations.
+ */
+object DatasetBenchmark {
+
+  case class Data(i: Int, s: String)
+
+  def main(args: Array[String]): Unit = {
+val sparkContext = new SparkContext("local[*]", "benchmark")
+val sqlContext = new SQLContext(sparkContext)
+
+import sqlContext.implicits._
+
+val numRows = 1000
+val ds = sqlContext.range(numRows).map(l => Data(l.toInt, l.toString))
+ds.cache()
+ds.collect() // make sure data are cached
+
+val benchmark = new Benchmark("Dataset.map", numRows)
+
+val scalaFunc = (d: Data) => Data(d.i + 1, d.s)
+benchmark.addCase("scala function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(scalaFunc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
+}
+
+val javaFunc = new MapFunction[Data, Data] {
+  override def call(d: Data): Data = Data(d.i + 1, d.s)
+}
+val enc = implicitly[Encoder[Data]]
+benchmark.addCase("java function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(javaFunc, enc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
+}
+
+/*
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11.4
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Dataset.map:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
+
---
+scala function   1029 / 1080  9.7  
   102.9   1.0X
+java function 965 /  999 10.4  
96.5   1.1X
--- End diff --

Could you have one more test case to show chained function ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205897396
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205897404
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54993/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14124] [SQL] [FOLLOWUP] Implement Datab...

2016-04-05 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12081#issuecomment-205893057
  
cc @andrewor14 @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14353] Dataset Time Window `window` API...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12136#issuecomment-205896645
  
**[Test build #54999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54999/consoleFull)**
 for PR 12136 at commit 
[`1bd7563`](https://github.com/apache/spark/commit/1bd7563ced8dca52f4156339d86b4d01535fdf58).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14369][SQL][test-hadoop2.2] Locality su...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12153#issuecomment-205896642
  
**[Test build #54998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54998/consoleFull)**
 for PR 12153 at commit 
[`5fef611`](https://github.com/apache/spark/commit/5fef61170f5b63845f8b5ee72fb3413e1ce4477d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12569][PySpark][ML]:DecisionTreeRegress...

2016-04-05 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request:

https://github.com/apache/spark/pull/12116#issuecomment-205896732
  
@holdenk Thanks for your comments! I will make changes accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14407][SQL] Hides HadoopFsRelation rela...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12178#issuecomment-205896647
  
**[Test build #54997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54997/consoleFull)**
 for PR 12178 at commit 
[`64b7cf4`](https://github.com/apache/spark/commit/64b7cf487c59aee3217aae37733ff9879dd79c2c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205896549
  
**[Test build #54993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54993/consoleFull)**
 for PR 12138 at commit 
[`8a18acd`](https://github.com/apache/spark/commit/8a18acd6c14c346e409f7da709e7f2d33f6e4662).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TimestampFromLong(child: Expression) extends 
UnaryExpression with ExpectsInputTypes `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12087#discussion_r58575785
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkContext
+import org.apache.spark.api.java.function.MapFunction
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark for Dataset typed operations.
+ */
+object DatasetBenchmark {
+
+  case class Data(i: Int, s: String)
+
+  def main(args: Array[String]): Unit = {
+val sparkContext = new SparkContext("local[*]", "benchmark")
+val sqlContext = new SQLContext(sparkContext)
+
+import sqlContext.implicits._
+
+val numRows = 1000
+val ds = sqlContext.range(numRows).map(l => Data(l.toInt, l.toString))
+ds.cache()
+ds.collect() // make sure data are cached
+
+val benchmark = new Benchmark("Dataset.map", numRows)
+
+val scalaFunc = (d: Data) => Data(d.i + 1, d.s)
+benchmark.addCase("scala function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(scalaFunc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
+}
+
+val javaFunc = new MapFunction[Data, Data] {
+  override def call(d: Data): Data = Data(d.i + 1, d.s)
+}
+val enc = implicitly[Encoder[Data]]
+benchmark.addCase("java function") { iter =>
+  var res = ds
+  var i = 0
+  while (i < 10) {
+res = res.map(javaFunc, enc)
+i += 1
+  }
+  res.queryExecution.toRdd.count()
+}
+
+/*
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11.4
+Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
+Dataset.map:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative
+
---
+scala function   1029 / 1080  9.7  
   102.9   1.0X
+java function 965 /  999 10.4  
96.5   1.1X
--- End diff --

This is very slow, range/filter/aggregate just take a few nano seconds.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14407][SQL] Hides HadoopFsRelation rela...

2016-04-05 Thread liancheng
GitHub user liancheng opened a pull request:

https://github.com/apache/spark/pull/12178

[SPARK-14407][SQL] Hides HadoopFsRelation related data source API into 
execution package

## What changes were proposed in this pull request?

This PR moves `HadoopFsRelation` related data source API into execution 
package.

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/liancheng/spark spark-14407-hide-file-scan-api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12178.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12178


commit 64b7cf487c59aee3217aae37733ff9879dd79c2c
Author: Cheng Lian 
Date:   2016-04-05T16:58:52Z

Hides HadoopFsRelation related data source API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/12087#discussion_r58575623
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.SparkContext
+import org.apache.spark.api.java.function.MapFunction
+import org.apache.spark.util.Benchmark
+
+/**
+ * Benchmark for Dataset typed operations.
+ */
+object DatasetBenchmark {
+
+  case class Data(i: Int, s: String)
+
+  def main(args: Array[String]): Unit = {
+val sparkContext = new SparkContext("local[*]", "benchmark")
+val sqlContext = new SQLContext(sparkContext)
+
+import sqlContext.implicits._
+
+val numRows = 1000
+val ds = sqlContext.range(numRows).map(l => Data(l.toInt, l.toString))
+ds.cache()
--- End diff --

Why cache here? uncached range() should be faster than cached.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11989#issuecomment-205893715
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14124] [SQL] [FOLLOWUP] Implement Datab...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12081#issuecomment-205893818
  
**[Test build #54996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54996/consoleFull)**
 for PR 12081 at commit 
[`16ac0b1`](https://github.com/apache/spark/commit/16ac0b1a548fedf7f602097ebb4aa1e7ed285515).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11989#issuecomment-205893720
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54995/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14124] [SQL] [FOLLOWUP] Implement Datab...

2016-04-05 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12081#issuecomment-205893108
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11989#issuecomment-205893381
  
**[Test build #54995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54995/consoleFull)**
 for PR 11989 at commit 
[`bebd544`](https://github.com/apache/spark/commit/bebd544bf411717ac22899f79627b0811b1da8c5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14396] [SQL] Throw Exceptions for DDLs ...

2016-04-05 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12169#issuecomment-205892839
  
Thanks for the review! @hvanhovell 

Also cc @yhuai @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-6429] Implement hashCode and equals tog...

2016-04-05 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12157#discussion_r58572140
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -53,14 +53,22 @@ import org.apache.spark.util.{NextIterator, 
SerializableConfiguration, ShutdownH
 /**
  * A Spark split class that wraps around a Hadoop InputSplit.
  */
-private[spark] class HadoopPartition(rddId: Int, idx: Int, s: InputSplit)
+private[spark] class HadoopPartition(rddId: Int, override val index: Int, 
s: InputSplit)
   extends Partition {
 
   val inputSplit = new SerializableWritable[InputSplit](s)
 
-  override def hashCode(): Int = 41 * (41 + rddId) + idx
+  override def hashCode(): Int = 41 * (41 + rddId) + index
 
-  override val index: Int = idx
+  def canEqual(other: Any): Boolean = other.isInstanceOf[HadoopPartition]
+
+  override def equals(other: Any): Boolean = other match {
+case that: HadoopPartition =>
+  super.equals(that) &&
+(that canEqual this) &&
+index == that.index
--- End diff --

It becomes a field if you use it outside the constructor, or should. It 
should stay private yes.

This caused me to think about the partition changes a bit more. Defining 
`hashCode` without `equals` is technically correct. By default no two distinct 
objects are equal, so they can't violate the contract that equal objects have 
the same hash code no matter what the hash code is. 

Here it's not clear that two partitions with the same index are 
semantically equivalent, since they can be from different RDDs. So the RDD's ID 
matters, but, maybe it's not right to implement a notion of equality here 
either. It does raise the question -- when do partitions get hashed and why is 
a non-default implementation important then? it could be vestigial. Maybe best 
not to add equals though, and unless we know for sure hash code isn't 
important, leave that.

So maybe we end up leaving the partition classes alone and weakening the 
condition to require hashCode if equals exists but not vice versa?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...

2016-04-05 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12176#issuecomment-205882669
  
A-ha, understood about `.values`. This looks pretty reasonable. My only 
question is, does it make sense conceptually that this method also returns a 
list of IDs? it doesn't hurt much in practice, and it seems like there's a 
reasonable argument for it logically. We have one case where the caller needs 
it after all. Maybe finish this by documenting the three things returned from 
this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14245] [Web UI] Display the user in the...

2016-04-05 Thread ajbozarth
Github user ajbozarth commented on the pull request:

https://github.com/apache/spark/pull/12123#issuecomment-205882481
  
I'll add it the history server later today then, I didn't re-test on the 
history server after the change so I didn't notice.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-04-05 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r58569079
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(TaskLocation("executor_host1_3") === 
ExecutorCacheTaskLocation("host1", "3"))
   }
 
+  test("Kill other task attempts when one attempt belonging to the same 
task succeeds") {
+sc = new SparkContext("local", "test")
+val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", 
"host2"))
+val taskSet = FakeTask.createTaskSet(4)
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
+val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = 
taskSet.tasks.map { task =>
+  task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
+}
+// Offer resources for 4 tasks to start
+for ((k, v) <- List(
+"exec1" -> "host1",
+"exec1" -> "host1",
+"exec2" -> "host2",
+"exec2" -> "host2")) {
+  val taskOption = manager.resourceOffer(k, v, NO_PREF)
+  assert(taskOption.isDefined)
+  val task = taskOption.get
+  assert(task.executorId === k)
+}
+assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
+// Complete the 3 tasks and leave 1 task in running
+for (id <- Set(0, 1, 2)) {
+  manager.handleSuccessfulTask(id, createTaskResult(id, 
accumUpdatesByTask(id)))
+  assert(sched.endedTasks(id) === Success)
+}
+
+// Wait for the threshold time to start speculative attempt for the 
running task
+Thread.sleep(100)
--- End diff --

ah you are right, sorry looked at the wrong config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205881415
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205881421
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54990/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205880601
  
**[Test build #54990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54990/consoleFull)**
 for PR 12087 at commit 
[`5a96ae4`](https://github.com/apache/spark/commit/5a96ae46b1a9f697b9541ae4abc408069b747315).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14369][SQL][test-hadoop2.2] Locality su...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12153#issuecomment-205878760
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11989#issuecomment-205878822
  
**[Test build #54995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54995/consoleFull)**
 for PR 11989 at commit 
[`bebd544`](https://github.com/apache/spark/commit/bebd544bf411717ac22899f79627b0811b1da8c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14123] [SPARK-14384] [SQL] Handle Creat...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12117#issuecomment-205878605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54987/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14369][SQL][test-hadoop2.2] Locality su...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12153#issuecomment-205878765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54988/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14123] [SPARK-14384] [SQL] Handle Creat...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12117#issuecomment-205878599
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14369][SQL][test-hadoop2.2] Locality su...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12153#issuecomment-205878364
  
**[Test build #54988 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54988/consoleFull)**
 for PR 12153 at commit 
[`a1f527a`](https://github.com/apache/spark/commit/a1f527aa2c91776085818c360f9ea3d0a8d5d616).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14123] [SPARK-14384] [SQL] Handle Creat...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12117#issuecomment-205878276
  
**[Test build #54987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54987/consoleFull)**
 for PR 12117 at commit 
[`3938766`](https://github.com/apache/spark/commit/39387666fe24a2e65a267eabc15c7816c8738449).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...

2016-04-05 Thread yongtang
Github user yongtang commented on the pull request:

https://github.com/apache/spark/pull/11989#issuecomment-205877757
  
@sethah Thanks. The import has been removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12049#issuecomment-205871057
  
**[Test build #54994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54994/consoleFull)**
 for PR 12049 at commit 
[`48d760e`](https://github.com/apache/spark/commit/48d760eed41a3d559ad8aa6363b6000d4b9ed54d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14257][SQL]Allow multiple continuous qu...

2016-04-05 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/12049#issuecomment-205869083
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14335][SQL] Describe function command r...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12128#issuecomment-205868721
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14335][SQL] Describe function command r...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12128#issuecomment-205868723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54986/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14335][SQL] Describe function command r...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12128#issuecomment-205868179
  
**[Test build #54986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54986/consoleFull)**
 for PR 12128 at commit 
[`927272c`](https://github.com/apache/spark/commit/927272cdc79b7fd8908faea80e8d842d40c2b468).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14362] [SPARK-14406] [SQL] [WIP] DDL Na...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12146#issuecomment-205866052
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54985/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14362] [SPARK-14406] [SQL] [WIP] DDL Na...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12146#issuecomment-205866050
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

2016-04-05 Thread devaraj-kavali
Github user devaraj-kavali commented on a diff in the pull request:

https://github.com/apache/spark/pull/11996#discussion_r58563479
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
@@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with 
LocalSparkContext with Logg
 assert(TaskLocation("executor_host1_3") === 
ExecutorCacheTaskLocation("host1", "3"))
   }
 
+  test("Kill other task attempts when one attempt belonging to the same 
task succeeds") {
+sc = new SparkContext("local", "test")
+val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", 
"host2"))
+val taskSet = FakeTask.createTaskSet(4)
+val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
+val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = 
taskSet.tasks.map { task =>
+  task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
+}
+// Offer resources for 4 tasks to start
+for ((k, v) <- List(
+"exec1" -> "host1",
+"exec1" -> "host1",
+"exec2" -> "host2",
+"exec2" -> "host2")) {
+  val taskOption = manager.resourceOffer(k, v, NO_PREF)
+  assert(taskOption.isDefined)
+  val task = taskOption.get
+  assert(task.executorId === k)
+}
+assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
+// Complete the 3 tasks and leave 1 task in running
+for (id <- Set(0, 1, 2)) {
+  manager.handleSuccessfulTask(id, createTaskResult(id, 
accumUpdatesByTask(id)))
+  assert(sched.endedTasks(id) === Success)
+}
+
+// Wait for the threshold time to start speculative attempt for the 
running task
+Thread.sleep(100)
--- End diff --

Thanks @tgravescs for your quick response.

Here Thread.sleep(100) is to match the threshold value mentioned in 
TaskSetManager.checkSpeculatableTasks(). It is the minimum time where the task 
needs to run for this much of time before becoming eligible for launching a 
speculative attempt. I don't see any way to change this default value.

> val medianDuration = durations(min((0.5 * tasksSuccessful).round.toInt, 
durations.length - 1))
> val threshold = max(SPECULATION_MULTIPLIER * medianDuration, 100)
> 

I don't think this threshold value is related to the config 
‘spark.speculation.interval’ here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14362] [SPARK-14406] [SQL] [WIP] DDL Na...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12146#issuecomment-205865589
  
**[Test build #54985 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54985/consoleFull)**
 for PR 12146 at commit 
[`5393174`](https://github.com/apache/spark/commit/5393174badbccc1794243ac6e18c424f8f062cf7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14362] [SPARK-14406] [SQL] [WIP] DDL Na...

2016-04-05 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/12146#issuecomment-205861283
  
@yhuai Sure, will do it in this PR. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...

2016-04-05 Thread sethah
Github user sethah commented on the pull request:

https://github.com/apache/spark/pull/11989#issuecomment-205857272
  
This LGTM other than one small comment about imports. @MLnick could you 
make a final pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14399] Remove unnecessary excludes from...

2016-04-05 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/12171#issuecomment-205858120
  
It looks like Hive might actually need Joda time:

```
- analyze MetastoreRelations *** FAILED ***
  org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution 
Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. 
org/joda/time/ReadWritableInstant
  at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:455)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:440)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:226)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:173)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:172)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:215)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:440)
  at 
org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:430)
  at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:351)
  at 
org.apache.spark.sql.hive.test.TestHiveContext.runSqlHive(TestHive.scala:183)
  ...
01:36:40.566 ERROR hive.ql.exec.DDLTask: java.lang.NoClassDefFoundError: 
Could not initialize class 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyPrimitiveObjectInspectorFactory
```


https://stackoverflow.com/questions/26259717/facing-java-lang-noclassdeffounderror-org-joda-time-readableinstant-error-even


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3724][ML] RandomForest: More options fo...

2016-04-05 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/11989#discussion_r58558671
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/RandomForestSuite.scala ---
@@ -27,6 +27,7 @@ import org.apache.spark.mllib.regression.LabeledPoint
 import org.apache.spark.mllib.tree.{DecisionTreeSuite => OldDTSuite, 
EnsembleTestHelper}
 import org.apache.spark.mllib.tree.configuration.{Algo => OldAlgo, 
QuantileStrategy, Strategy => OldStrategy}
 import org.apache.spark.mllib.tree.impurity.{Entropy, Gini, GiniCalculator}
+import org.apache.spark.mllib.tree.model.RandomForestModel
--- End diff --

Not sure why this import was added. It can be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14396] [SQL] Throw Exceptions for DDLs ...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12169#issuecomment-205855786
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14396] [SQL] Throw Exceptions for DDLs ...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12169#issuecomment-205855791
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54983/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14396] [SQL] Throw Exceptions for DDLs ...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12169#issuecomment-205855383
  
**[Test build #54983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54983/consoleFull)**
 for PR 12169 at commit 
[`140f859`](https://github.com/apache/spark/commit/140f85998953f1d945df4f318ac0a88d197583cd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205853906
  
**[Test build #54993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54993/consoleFull)**
 for PR 12138 at commit 
[`8a18acd`](https://github.com/apache/spark/commit/8a18acd6c14c346e409f7da709e7f2d33f6e4662).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205851532
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205850589
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205850592
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54992/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205850563
  
**[Test build #54992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54992/consoleFull)**
 for PR 12087 at commit 
[`a5b0d57`](https://github.com/apache/spark/commit/a5b0d57bb7bce985771ede00b0f116a4cce82900).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205848733
  
benchmark added, the result is included in the benchmark code. I also ran 
this benchmark against master branch, the result is:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Mac OS X 10.11.4
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz
Dataset.map:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative

---
scala function   2471 / 2531  4.0 
247.1   1.0X
java function2416 / 2478  4.1 
241.6   1.0X
```

So with whole stage codegen, we can get about 2.5 times speed up!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205848612
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205848622
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54991/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205848558
  
**[Test build #54991 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54991/consoleFull)**
 for PR 12138 at commit 
[`8a18acd`](https://github.com/apache/spark/commit/8a18acd6c14c346e409f7da709e7f2d33f6e4662).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TimestampFromLong(child: Expression) extends 
UnaryExpression with ExpectsInputTypes `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205847599
  
**[Test build #54992 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54992/consoleFull)**
 for PR 12087 at commit 
[`a5b0d57`](https://github.com/apache/spark/commit/a5b0d57bb7bce985771ede00b0f116a4cce82900).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...

2016-04-05 Thread pravingadakh
Github user pravingadakh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12176#discussion_r58553918
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -542,10 +539,11 @@ private[clustering] object OnlineLDAOptimizer {
   expElogbeta: BDM[Double],
   alpha: breeze.linalg.Vector[Double],
   gammaShape: Double,
-  k: Int): (BDV[Double], BDM[Double]) = {
-val (ids: List[Int], cts: Array[Double]) = termCounts match {
-  case v: DenseVector => ((0 until v.size).toList, v.values)
-  case v: SparseVector => (v.indices.toList, v.values)
+  k: Int,
+  ids: List[Int]): (BDV[Double], BDM[Double]) = {
+val cts: Array[Double] = termCounts match {
--- End diff --

Yes it looks redundant, but `values` is not a member of parent trait 
`Vector`, it is defined at individual implementation level (i.e.  `DenseVector` 
and `SparseVector`). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/12138#discussion_r58553204
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1659,11 +1665,12 @@ object TimeWindowing extends Rule[LogicalPlan] {
   val windowEnd = windowStart + window.windowDuration
 
   CreateNamedStruct(
-Literal(WINDOW_START) :: windowStart ::
--- End diff --

Previously we manually set the output of Expand here as `TimestampType` 
(`windowAttr`). As `windowStart` and `windowEnd` are producing long values, 
when we infer output from Expand's projections, we will get `LongType` instead 
of `TimestampType`. So we need to explicitly convert the `LongType` to 
`TimestampType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14370][MLLIB]removed duplicate generati...

2016-04-05 Thread pravingadakh
Github user pravingadakh commented on a diff in the pull request:

https://github.com/apache/spark/pull/12176#discussion_r58553056
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
@@ -440,12 +440,9 @@ final class OnlineLDAOptimizer extends LDAOptimizer {
   val stat = BDM.zeros[Double](k, vocabSize)
   var gammaPart = List[BDV[Double]]()
   nonEmptyDocs.foreach { case (_, termCounts: Vector) =>
-val ids: List[Int] = termCounts match {
-  case v: DenseVector => (0 until v.size).toList
-  case v: SparseVector => v.indices.toList
-}
+val ids: List[Int] = LDAUtils.vectorAsList(termCounts)
--- End diff --

Yes I agree. It is better to simply return `ids` rather than having callers 
to compute it. I will make that change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14354][SQL] Let Expand take name expres...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12138#issuecomment-205845316
  
**[Test build #54991 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54991/consoleFull)**
 for PR 12138 at commit 
[`8a18acd`](https://github.com/apache/spark/commit/8a18acd6c14c346e409f7da709e7f2d33f6e4662).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12566] [ML] [WIP] GLM model family, lin...

2016-04-05 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/11549#issuecomment-205844978
  
@jkbradley @hhbyyh I can work on this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14245] [Web UI] Display the user in the...

2016-04-05 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/12123#issuecomment-205844392
  
Yeah, we can see the user name in the history page and it's gotten using 
the REST API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14298] [ML] [MLlib] LDA should support ...

2016-04-05 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/12089#issuecomment-205843565
  
@jkbradley Agree, I will update the PR soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14369][SQL][WIP][test-hadoop2.2] Locali...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12153#issuecomment-205834944
  
**[Test build #54988 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54988/consoleFull)**
 for PR 12153 at commit 
[`a1f527a`](https://github.com/apache/spark/commit/a1f527aa2c91776085818c360f9ea3d0a8d5d616).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14404][SQL][TESTS] HDFSMetadataLogSuite...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12177#issuecomment-205837805
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14204] [SQL] register driverClass rathe...

2016-04-05 Thread mchalek
Github user mchalek commented on the pull request:

https://github.com/apache/spark/pull/12000#issuecomment-205840895
  
Bump.  Would be nice to get closure on this.  I doubt that we were in the 
minority in being affected by this (although admittedly, complaints of other 
failures seem not to have yet surfaced).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205840164
  
**[Test build #54990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54990/consoleFull)**
 for PR 12087 at commit 
[`5a96ae4`](https://github.com/apache/spark/commit/5a96ae46b1a9f697b9541ae4abc408069b747315).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14362] [SQL] DDL Native Support: Drop V...

2016-04-05 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12146#issuecomment-205839654
  
@gatorsmile Thank you for working on it. I feel changes of DropTable 
command should also make this command natively supported by Spark (no 
runSqlHive)? If so, we can also resolve 
https://issues.apache.org/jira/browse/SPARK-14406?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205839287
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14296][SQL] whole stage codegen support...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12087#issuecomment-205839240
  
**[Test build #54989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54989/consoleFull)**
 for PR 12087 at commit 
[`1707c21`](https://github.com/apache/spark/commit/1707c21ac7b25824c180ee9142372cefdc87fa5c).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   10   >