date:20170119

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71699/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11867
  
**[Test build #71699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71699/testReport)**
 for PR 11867 at commit 
[`38ebece`](https://github.com/apache/spark/commit/38ebece49f0313c7fa9553309da85b67af4398ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71701/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71701/testReport)**
 for PR 16593 at commit 
[`acca991`](https://github.com/apache/spark/commit/acca991d3d92116ce3a88918b3798d14d32849f8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ReorderHivePartitionedTableSchema(sparkSession: 
SparkSession)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16656: [SPARK-18116][DStream] Report stream input information a...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16656
  
**[Test build #71708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71708/testReport)**
 for PR 16656 at commit 
[`547ecb3`](https://github.com/apache/spark/commit/547ecb338fa086deb86edf93b091ea6fdf2836f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16656: [SPARK-18116][DStream] Report stream input inform...

2017-01-19 Thread uncleGen

GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/16656

[SPARK-18116][DStream] Report stream input information after recover from 
checkpoint

## What changes were proposed in this pull request?

Run a streaming application which souce from kafka.There are many batchs 
queued in the job list before application stopped, and then stop the 
application, as follow starting it from checkpointed file, in the spark ui, the 
size of the queued batchs which stored in the checkpoint file are 0


## How was this patch tested?

update unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark SPARK-18116

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16656.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16656


commit b8d09457448ca0f47238c8d5afb3db3d3e0cb3dc
Author: uncleGen 
Date:   2017-01-20T02:55:54Z

Report stream input information after recover from checkpoint

commit 547ecb338fa086deb86edf93b091ea6fdf2836f2
Author: uncleGen 
Date:   2017-01-20T07:29:36Z

add unit test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16654
  
**[Test build #71707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71707/testReport)**
 for PR 16654 at commit 
[`29bda3f`](https://github.com/apache/spark/commit/29bda3f136b9766decf87d8452f30dc40871441d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16654
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16654
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71698/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16654
  
**[Test build #71698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71698/testReport)**
 for PR 16654 at commit 
[`bb01219`](https://github.com/apache/spark/commit/bb01219acb8195c56bd76a25daec8952fba7631a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16586: [SPARK-19117][SPARK-18922][TESTS] Fix the rest of flaky,...

2017-01-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16586
  
Hi @srowen, I think it is ready for a second look. In short, the current 
status is,

- there are some test failures 
(https://github.com/apache/spark/pull/16586#issuecomment-273437565) when 
running each package-level, which possibly look flaky
- these failures were individually tested and passed by `test-only` 
(https://github.com/apache/spark/pull/16586#issuecomment-273952379)
- `local metrics` seems still flaky but it seems less flaky in individual 
tests assuming from the build results in 
https://github.com/apache/spark/pull/16586#discussion_r97022356


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16645
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71696/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16645
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16645
  
**[Test build #71696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71696/testReport)**
 for PR 16645 at commit 
[`b1028ad`](https://github.com/apache/spark/commit/b1028ad573301ae4d351678a6e6b3b66392e32d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16586: [WIP][SPARK-19117][SPARK-18922][TESTS] Fix the re...

2017-01-19 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/16586#discussion_r97022356
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ---
@@ -229,7 +229,7 @@ class SparkListenerSuite extends SparkFunSuite with 
LocalSparkContext with Match
 }
 
 val numSlices = 16
-val d = sc.parallelize(0 to 1e3.toInt, numSlices).map(w)
+val d = sc.parallelize(0 to 1, numSlices).map(w)
--- End diff --

I am pretty sure the deserialization time test is less flaky now assuming 
from the individual tests as below:

**Before** - 9 failures out of 10.


[1 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/546-windows-complete/job/ktdmdxkdi4ni4ier)
[2 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/549-windows-complete/job/b4mqgyt72g6he7e7)
[3 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/551-windows-complete/job/j0ywrgv8d733yqb4)
[4 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/553-windows-complete/job/yqoapee3og5x46wk)
[5 
(passed)](https://ci.appveyor.com/project/spark-test/spark/build/554-windows-complete/job/g3hhdl5s8odu9ir0)
[6 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/555-windows-complete/job/9utyo2glowuf3ulc)
[7 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/541-windows-test/job/4gtm26hcm5327aa1)
[8 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/542-windows-test/job/166i4xiljy7iof8l)
[9 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/540-windows-test/job/39v7nwuq598p3rtm)
[10 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/539-windows-test/job/how9cbsj5i5cykeh)

**After** - 1 failure out of 7.
 
[1 
(passed)](https://ci.appveyor.com/project/spark-test/spark/build/576-windows-complete/job/9sfx150cp38ofttn)
[2 
(passed)](https://ci.appveyor.com/project/spark-test/spark/build/577-windows-complete/job/nrjgs7emtlnj6y5f)
[3 
(passed)](https://ci.appveyor.com/project/spark-test/spark/build/578-windows-complete/job/qwgsuc5uas8mk0o7)
[4 
(passed)](https://ci.appveyor.com/project/spark-test/spark/build/579-windows-complete/job/sf1sspisb4ai4j7r)
[5 
(failed)](https://ci.appveyor.com/project/spark-test/spark/build/580-windows-complete/job/808c08fvnm26w3uh)
[6 
(passed)](https://ci.appveyor.com/project/spark-test/spark/build/581-windows-complete/job/y7o97qq18my44dvo)
[7 
(passed)](https://ci.appveyor.com/project/spark-test/spark/branch/68031366-45EE-45B4-867A-40A4D9B1AD07)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71702/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71702/testReport)**
 for PR 16593 at commit 
[`21f113a`](https://github.com/apache/spark/commit/21f113a85ae2df46c93dd57384a01955f394188b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15770: [SPARK-15784][ML]:Add Power Iteration Clustering ...

2017-01-19 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15770#discussion_r97021451
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/PowerIterationClustering.scala
 ---
@@ -0,0 +1,182 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.clustering
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.linalg.{Vector}
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.clustering.{PowerIterationClustering => 
MLlibPowerIterationClustering}
+import 
org.apache.spark.mllib.clustering.PowerIterationClustering.Assignment
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{DataFrame, Dataset, Row}
+import org.apache.spark.sql.functions.{col}
+import org.apache.spark.sql.types.{IntegerType, LongType, StructField, 
StructType}
+
+/**
+ * Common params for PowerIterationClustering
+ */
+private[clustering] trait PowerIterationClusteringParams extends Params 
with HasMaxIter
+  with HasFeaturesCol with HasPredictionCol {
+
+  /**
+   * The number of clusters to create (k). Must be > 1. Default: 2.
+   * @group param
+   */
+  @Since("2.2.0")
+  final val k = new IntParam(this, "k", "The number of clusters to create. 
" +
+"Must be > 1.", ParamValidators.gt(1))
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getK: Int = $(k)
+
+  /**
+   * Param for the initialization algorithm. This can be either "random" 
to use a random vector
+   * as vertex properties, or "degree" to use normalized sum similarities. 
Default: random.
+   */
+  @Since("2.2.0")
+  final val initMode = new Param[String](this, "initMode", "The 
initialization algorithm. " +
+"Supported options: 'random' and 'degree'.",
+(value: String) => validateInitMode(value))
--- End diff --

What about use validator `ParamValidators.inArray[String](...)` instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16586: [WIP][SPARK-19117][SPARK-18922][TESTS] Fix the rest of f...

2017-01-19 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16586
  
They all pass in individual tests with `test-only` (please check the logs 
above).

```
org.apache.spark.scheduler.SparkListenerSuite:
 - local metrics (8 seconds, 656 milliseconds)

org.apache.spark.sql.hive.execution.HiveQuerySuite:
 - constant null testing (531 milliseconds)

org.apache.spark.sql.hive.execution.AggregationQuerySuite:
 - udaf with all data types (4 seconds, 285 milliseconds)

org.apache.spark.sql.hive.StatisticsSuite:
 - verify serialized column stats after analyzing columns (2 seconds, 844 
milliseconds)

org.apache.spark.sql.hive.execution.SQLQuerySuite:
- dynamic partition value test (1 second, 407 milliseconds)
- SPARK-6785: HiveQuerySuite - Date cast (188 milliseconds)
```

Although I am wondering how/why those tests seem more flaky (assuming from 
observations in the builds), I think it is possible to say, at least, Spark 
tests (in a way I run) are able to pass on Windows.

Let me remove `[WIP]` and try to make the tests stable on Windows in the 
future if this sounds reasonable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16619: [WIP][SPARK-19257][SQL]CatalogStorageFormat.locationUri ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16619
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71700/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16619: [WIP][SPARK-19257][SQL]CatalogStorageFormat.locationUri ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16619
  
**[Test build #71700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71700/testReport)**
 for PR 16619 at commit 
[`66dc4de`](https://github.com/apache/spark/commit/66dc4de3cd466e1fc6897b5034967a8c01bc8867).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16619: [WIP][SPARK-19257][SQL]CatalogStorageFormat.locationUri ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16619
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97019304
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   .setStages(Array(rFormulaModel, gm))
   .fit(data)
 
-new GaussianMixtureWrapper(pipeline, dim)
+val gmm: GaussianMixtureModel = 
pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
--- End diff --

they are the same when the pipeline has 1 stage.
I prefer `stages.last` because if we later add a stage to transform the 
input data it will break `stages(1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97019071
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
   val rMetadata = parse(rMetadataStr)
   val dim = (rMetadata \ "dim").extract[Int]
-  new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
+  val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
+  new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = 
true)
--- End diff --

it may not be a big deal right now, since spark.gmm is relatively new.
but I think we should come up with a plan on model persistent compability 
not only with R vs JVM but also across versions of Spark.
also might be useful to link this JIRA to SPARK-18864


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16582: [SPARK-19220][UI] Make redirection to HTTPS apply...

2017-01-19 Thread sarutak

Github user sarutak commented on a diff in the pull request:

https://github.com/apache/spark/pull/16582#discussion_r97018717
  
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -306,23 +311,31 @@ private[spark] object JettyUtils extends Logging {
   httpConnector.setPort(currentPort)
   connectors += httpConnector
 
-  sslOptions.createJettySslContextFactory().foreach { factory =>
-// If the new port wraps around, do not try a privileged port.
-val securePort =
-  if (currentPort != 0) {
-(currentPort + 400 - 1024) % (65536 - 1024) + 1024
-  } else {
-0
-  }
-val scheme = "https"
-// Create a connector on port securePort to listen for HTTPS 
requests
-val connector = new ServerConnector(server, factory)
-connector.setPort(securePort)
-
-connectors += connector
-
-// redirect the HTTP requests to HTTPS port
-collection.addHandler(createRedirectHttpsHandler(securePort, 
scheme))
+  val httpsConnector = sslOptions.createJettySslContextFactory() match 
{
+case Some(factory) =>
+  // If the new port wraps around, do not try a privileged port.
+  val securePort =
+if (currentPort != 0) {
+  (currentPort + 400 - 1024) % (65536 - 1024) + 1024
+} else {
+  0
+}
+  val scheme = "https"
+  // Create a connector on port securePort to listen for HTTPS 
requests
+  val connector = new ServerConnector(server, factory)
+  connector.setPort(securePort)
+  connector.setName(SPARK_CONNECTOR_NAME)
+  connectors += connector
+
+  // redirect the HTTP requests to HTTPS port
+  httpConnector.setName(REDIRECT_CONNECTOR_NAME)
+  collection.addHandler(createRedirectHttpsHandler(securePort, 
scheme))
--- End diff --

I noticed one point.
If a port is already used, `collection.addHandler` will take place more 
than twice leading redirection doesn't work properly.
Of course, it's not your fault. If you fix it in this PR together, it's 
good but it's a separate issue so I'll fix in another PR otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16631: [SPARK-19271] [SQL] Change non-cbo estimation of aggrega...

2017-01-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16631
  
Thanks! Merging to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper in Spar...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16566
  
**[Test build #71706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71706/testReport)**
 for PR 16566 at commit 
[`b25fc83`](https://github.com/apache/spark/commit/b25fc832c79714db20e2c79e95253919a36714f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97018334
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   .setStages(Array(rFormulaModel, gm))
   .fit(data)
 
-new GaussianMixtureWrapper(pipeline, dim)
+val gmm: GaussianMixtureModel = 
pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
--- End diff --

I have a question: I saw in some wrappers,, it uses `pipeline.stages.last` 
and some uses `pipeline.stages(1)`. What is the difference of the two use case? 
I tried using them interchangably and the tests are still passed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97017980
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

Sure. Please submit the PR to fix the other cases. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97017902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

If we remove this, estimation result of aggregate still has wrong rowCount 
and attributeStats.
Shall we merge this and I'll do tests for other unaryNodes and fix them if 
something still goes wrong.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-19 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97017872
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param seed the random seed.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#'Note that it is an advanced. The default 
value should be enough
--- End diff --

In scala, it uses `@group expertParam` in the document and the API document 
shows `(expert-only) Parameters`. I will change it to `it is an expert 
parameter`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16655
  
**[Test build #71705 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71705/testReport)**
 for PR 16655 at commit 
[`9ec7d36`](https://github.com/apache/spark/commit/9ec7d36a198560441e3c3e96fa59789bdd36751b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...

2017-01-19 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16593#discussion_r97017743
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala
 ---
@@ -45,6 +46,18 @@ case class CreateHiveTableAsSelectCommand(
   override def innerChildren: Seq[LogicalPlan] = Seq(query)
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
+// when create a partitioned table, we should reorder the columns
+// to put the partition columns at the end
+val partitionAttrs = tableDesc.partitionColumnNames.map { p =>
+  query.output.find(_.name == p).getOrElse(
+new AnalysisException(s"Partition column[$p] does not exist " +
+  s"in query output partition").asInstanceOf[NamedExpression]
+  )
+}
+val partitionSet = AttributeSet(partitionAttrs)
+val dataAttrs = query.output.filterNot(partitionSet.contains)
+val reorderedOutputQuery = Project(dataAttrs ++ partitionAttrs, query)
--- End diff --

we can revert this after https://github.com/apache/spark/pull/16655


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16655: [SPARK-19305][SQL] partitioned table should always put p...

2017-01-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16655
  
cc @yhuai @gatorsmile @windpiger 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16655: [SPARK-19305][SQL] partitioned table should alway...

2017-01-19 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/16655

[SPARK-19305][SQL] partitioned table should always put partition columns at 
the end of table schema

## What changes were proposed in this pull request?

For data source tables, we will always reorder the specified table schema, 
or the query in CTAS, to put partition columns at the end. e.g. `CREATE TABLE 
t(a int, b int, c int, d int) USING parquet PARTITIONED BY (d, b)` will create 
a table with schema ``

Hive serde tables don't have this problem before, because its CREATE TABLE 
syntax specifies data schema and partition schema individually.

However, after we unifed the CREATE TABLE syntax, Hive serde table also 
need to do the reorder. This PR puts the reorder logic in a analyzer rule,  
which works with both data source tables and Hive serde tables.

## How was this patch tested?

new regression test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark schema

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16655.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16655


commit 9ec7d36a198560441e3c3e96fa59789bdd36751b
Author: Wenchen Fan 
Date:   2017-01-20T06:10:36Z

partitioned table should always put partition columns at the end of table 
schema




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-19 Thread junegunn

Github user junegunn commented on the issue:

https://github.com/apache/spark/pull/16347
  
Rebased to current master. The patch is simpler thanks to the refactoring 
made in [SPARK-18243](https://issues.apache.org/jira/browse/SPARK-18243).

Anyway, I can understand your rationale for wanting to have explicit API on 
the writer side, but then make sure that the sort specification from 
`sortWithinPartitions` is automatically propagated to the writer, or the method 
is no longer compatible to `SORT BY` in Hive and [the 
documentation](https://github.com/apache/spark/blob/v2.1.0/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L990)
 should be corrected accordingly.  Care should be taken for `INSERT OVERWRITE 
TABLE ... DISTIRBUTE BY ... SORT BY ...` statement in Spark SQL so that it's 
compatible to the same Hive SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16642
  
**[Test build #71704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71704/testReport)**
 for PR 16642 at commit 
[`3a5ebd7`](https://github.com/apache/spark/commit/3a5ebd7ee5ead531bc9a778703faebc4807b8611).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16642
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71703/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16642
  
**[Test build #71703 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71703/testReport)**
 for PR 16642 at commit 
[`095d421`](https://github.com/apache/spark/commit/095d421a05f985785964c2fae0e7c4f84fc1752a).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16642
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97016294
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
   val rMetadata = parse(rMetadataStr)
   val dim = (rMetadata \ "dim").extract[Int]
-  new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
+  val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
+  new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = 
true)
--- End diff --

Yeah, it will break existing persisted model, but I think we don't 
guarantee mode persistent compatibility between different versions for SparkR. 
We are planing to make model persistence consistent between SparkR and MLlib, 
then there is no specific handling for SparkR and will let MLlib to handle all 
model persistent issue.
However if we want to make model persistent compatibility for SparkR 
currently, I can add code to handle different versions here but will lead 
maintenance more complicated. What's your opinions? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16642
  
**[Test build #71703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71703/testReport)**
 for PR 16642 at commit 
[`095d421`](https://github.com/apache/spark/commit/095d421a05f985785964c2fae0e7c4f84fc1752a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16643: [SPARK-17724][Streaming][WebUI] Unevaluated new lines in...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16643
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16643: [SPARK-17724][Streaming][WebUI] Unevaluated new lines in...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16643
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71695/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16643: [SPARK-17724][Streaming][WebUI] Unevaluated new lines in...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16643
  
**[Test build #71695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71695/testReport)**
 for PR 16643 at commit 
[`d1c16e2`](https://github.com/apache/spark/commit/d1c16e2f17190e6d227a9d062a54ffb75687ce68).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16653: [SPARK-19302][DOC][MINOR] Fix the wrong item format in s...

2017-01-19 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16653
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread windpiger

Github user windpiger commented on the issue:

https://github.com/apache/spark/pull/16593
  
thanks all, let's make a summary:
1. no CTAS
`
create table t(a int, b int, c string, d string)
using $provider
partitioned by(d, c)
`
the schema order of table in catalog should be `a, b, d, c`
a) for datasource table 
this situation `has ensured by DataSource.getOrInferFileFormatSchema`:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala#L182

b) for hive table
  as @lins05 's comment, currently we does not process this situation, as 
the suggest we should
 add a new rule for it.

2. CTAS
`
create table t
using $provider
partitioned by(d, c)
select 1 as b, 2 as a, 'x' as c, 'y' as d
`
the schema order of table in catalog should be `b, a, d, c`
a) for datasource table 
this situation `has ensured by create table with updated schema`:

https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala#L159

b) for hive table
  this pr put this logic in `CreateHIveTableAsSelectCommand`, if we add a 
new rule, we can merge the logic with no-CTAS for hive situation.

Above all, to ensure the order of schema in catalog as we expected, we need 
add a new rule for hive table. this is the test branch implement the new 
rule,https://github.com/windpiger/spark/commit/acca991d3d92116ce3a88918b3798d14d32849f8#diff-73bd90660f41c12a87ee9fe8d35d856aR463

But before this implement new rule, we should first merge the pr(#16642), 
then we can get a `tableDesc with non-empty schema`, and then we can use it 
here 
https://github.com/windpiger/spark/commit/acca991d3d92116ce3a88918b3798d14d32849f8#diff-73bd90660f41c12a87ee9fe8d35d856aR470
 

@cloud-fan @lins05 is this ok?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97015657
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   .setStages(Array(rFormulaModel, gm))
   .fit(data)
 
-new GaussianMixtureWrapper(pipeline, dim)
+val gmm: GaussianMixtureModel = 
pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
+val logLikelihood: Double = gmm.summary.logLikelihood
--- End diff --

Both are ok, to explicitly give a type will make developers clear to 
understand what it means.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15353: [SPARK-17724][WebUI][Streaming] Unevaluated new l...

2017-01-19 Thread keypointt

Github user keypointt closed the pull request at:

https://github.com/apache/spark/pull/15353


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71702/testReport)**
 for PR 16593 at commit 
[`21f113a`](https://github.com/apache/spark/commit/21f113a85ae2df46c93dd57384a01955f394188b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11867
  
**[Test build #71699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71699/testReport)**
 for PR 11867 at commit 
[`38ebece`](https://github.com/apache/spark/commit/38ebece49f0313c7fa9553309da85b67af4398ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97014380
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param seed the random seed.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#'Note that it is an advanced. The default 
value should be enough
--- End diff --

as far as I recall the term used in spark.ml doc is "expert parameter" - 
you might want to check how it is explained there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16566: [SPARK-18821][SparkR]: Bisecting k-means wrapper ...

2017-01-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16566#discussion_r97014257
  
--- Diff: R/pkg/R/mllib_clustering.R ---
@@ -38,6 +45,149 @@ setClass("KMeansModel", representation(jobj = "jobj"))
 #' @note LDAModel since 2.1.0
 setClass("LDAModel", representation(jobj = "jobj"))
 
+#' Bisecting K-Means Clustering Model
+#'
+#' Fits a bisecting k-means clustering model against a Spark DataFrame.
+#' Users can call \code{summary} to print a summary of the fitted model, 
\code{predict} to make
+#' predictions on new data, and \code{write.ml}/\code{read.ml} to 
save/load fitted models.
+#'
+#' @param data a SparkDataFrame for training.
+#' @param formula a symbolic description of the model to be fitted. 
Currently only a few formula
+#'operators are supported, including '~', '.', ':', '+', 
and '-'.
+#'Note that the response variable of formula is empty in 
spark.bisectingKmeans.
+#' @param k the desired number of leaf clusters. Must be > 1.
+#'  The actual number could be smaller if there are no divisible 
leaf clusters.
+#' @param maxIter maximum iteration number.
+#' @param seed the random seed.
+#' @param minDivisibleClusterSize The minimum number of points (if greater 
than or equal to 1.0)
+#'or the minimum proportion of points (if 
less than 1.0) of a divisible cluster.
+#'Note that it is an advanced. The default 
value should be enough
--- End diff --

`Note that it is an advanced. `
do you mean to say `Note that it is an advanced option.`? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71701/testReport)**
 for PR 16593 at commit 
[`acca991`](https://github.com/apache/spark/commit/acca991d3d92116ce3a88918b3798d14d32849f8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16619: [WIP][SPARK-19257][SQL]CatalogStorageFormat.locationUri ...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16619
  
**[Test build #71700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71700/testReport)**
 for PR 16619 at commit 
[`66dc4de`](https://github.com/apache/spark/commit/66dc4de3cd466e1fc6897b5034967a8c01bc8867).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16653: [SPARK-19302][DOC][MINOR] Fix the wrong item format in s...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16653
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71697/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16457: [SPARK-19057][ML] Instances' weight must be non-n...

2017-01-19 Thread zhengruifeng

Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/16457


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16457: [SPARK-19057][ML] Instances' weight must be non-negative

2017-01-19 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/16457
  
I think it better to discuss in the JIRA. When we come to an agreement, I 
will reopen this pr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16653: [SPARK-19302][DOC][MINOR] Fix the wrong item format in s...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16653
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16653: [SPARK-19302][DOC][MINOR] Fix the wrong item format in s...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16653
  
**[Test build #71697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71697/testReport)**
 for PR 16653 at commit 
[`f337dc3`](https://github.com/apache/spark/commit/f337dc33be85374296b43b1f25435521be63b782).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16654
  
**[Test build #71698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71698/testReport)**
 for PR 16654 at commit 
[`bb01219`](https://github.com/apache/spark/commit/bb01219acb8195c56bd76a25daec8952fba7631a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97013889
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

How about removing this and fix all the similar issues in a separate PR? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16631: [SPARK-19271] [SQL] Change non-cbo estimation of aggrega...

2017-01-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16631
  
LGTM except one comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16654: [SPARK-19303][ML][WIP] Add evaluate method in clu...

2017-01-19 Thread zhengruifeng

GitHub user zhengruifeng opened a pull request:

https://github.com/apache/spark/pull/16654

[SPARK-19303][ML][WIP] Add evaluate method in clustering models

## What changes were proposed in this pull request?
1, add evaluation metric in summary
2, add an evaluate() method which returns a summary 

## How was this patch tested?
added tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhengruifeng/spark clustering_model_evaluate

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16654.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16654


commit bb01219acb8195c56bd76a25daec8952fba7631a
Author: Zheng RuiFeng 
Date:   2017-01-20T05:12:29Z

create pr




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

2017-01-19 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16646
  
looks good. just have some question not specific to this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97013742
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -124,7 +129,8 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
   val rMetadata = parse(rMetadataStr)
   val dim = (rMetadata \ "dim").extract[Int]
-  new GaussianMixtureWrapper(pipeline, dim, isLoaded = true)
+  val logLikelihood = (rMetadata \ "logLikelihood").extract[Double]
+  new GaussianMixtureWrapper(pipeline, dim, logLikelihood, isLoaded = 
true)
--- End diff --

would this break with any existing persisted model (that is missing a 
double here for logLikelihood)?
is there a way to mitigate that?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97013542
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   .setStages(Array(rFormulaModel, gm))
   .fit(data)
 
-new GaussianMixtureWrapper(pipeline, dim)
+val gmm: GaussianMixtureModel = 
pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
+val logLikelihood: Double = gmm.summary.logLikelihood
--- End diff --

for this line and above it, why do we need to explicitly give it a type 
(ie, `Double` or `GaussianMixtureModel`)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97013459
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   .setStages(Array(rFormulaModel, gm))
   .fit(data)
 
-new GaussianMixtureWrapper(pipeline, dim)
+val gmm: GaussianMixtureModel = 
pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
--- End diff --

hmm, I see what you are saying


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16653: [SPARK-19302][DOC][MINOR] Fix the wrong item format in s...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16653
  
**[Test build #71697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71697/testReport)**
 for PR 16653 at commit 
[`f337dc3`](https://github.com/apache/spark/commit/f337dc33be85374296b43b1f25435521be63b782).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16653: [SPARK-19302][DOC][MINOR] Fix the wrong item form...

2017-01-19 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/16653

[SPARK-19302][DOC][MINOR] Fix the wrong item format in security.md

## What changes were proposed in this pull request?

In docs/security.md, there is a description as follows.

```
 steps to configure the key-stores and the trust-store for the standalone 
deployment mode is as
 follows:
 * Generate a keys pair for each node
 * Export the public key of the key pair to a file on each node
 * Import all exported public keys into a single trust-store
```

According to markdown format, the first item should follow a blank line.

## How was this patch tested?

Manually tested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-19302

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16653.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16653


commit f337dc33be85374296b43b1f25435521be63b782
Author: sarutak 
Date:   2017-01-20T04:52:38Z

Fixed item format in security.md to abide by the Markdown format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should fail f...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16652
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16652: [SPARK-19234][MLLib] AFTSurvivalRegression should...

2017-01-19 Thread admackin

GitHub user admackin opened a pull request:

https://github.com/apache/spark/pull/16652

[SPARK-19234][MLLib] AFTSurvivalRegression should fail fast when any labels 
are zero

## What changes were proposed in this pull request?

If any labels of 0.0 (which are invalid) are supplied, 
AFTSurvivalRegression gives an error straight away rather than 
hard-to-interpret warnings and zero-valued coefficients in the output.

## How was this patch tested?

Verified against current test suite. (One test needed to be updated as it 
was providing values of zero for labels so was failing after this patch)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/admackin/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16652.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16652


commit ab6d4148c4aa721898733b14eec5068652ca1085
Author: Andy MacKinlay 
Date:   2017-01-20T01:56:45Z

Addresses SPARK-19234 - make sure label is positive

commit b07c281c378d68d86b81498ca247c7346719973e
Author: Andy MacKinlay 
Date:   2017-01-20T04:02:54Z

Addresses SPARK-19234 - fix test suite to ensure no zero-labels get passed 
in test cases as they now throw errors




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16645
  
**[Test build #71696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71696/testReport)**
 for PR 16645 at commit 
[`b1028ad`](https://github.com/apache/spark/commit/b1028ad573301ae4d351678a6e6b3b66392e32d3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16645
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16645
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71693/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16645
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16645: [SPARK-19290][SQL] add a new extending interface in Anal...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16645
  
**[Test build #71693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71693/testReport)**
 for PR 16645 at commit 
[`b1028ad`](https://github.com/apache/spark/commit/b1028ad573301ae4d351678a6e6b3b66392e32d3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16633
  
Hi @viirya , the main concern of @scwf is that, we can't afford performance 
regression in any customer scenarios. I think you can understand that :)

I went through the discussion above, it seems we've had some solution for 
both cases you mentioned 
[here](https://github.com/apache/spark/pull/16633#issuecomment-273963150), then 
talking points becomes the following two:
1. how to decide the threshold of the two cases;
2. rdd chain is broken.

Let's wait @rxin 's comment on the second point. 

Here I'm just interested in the first one.
One possible way to get the number is to modify the mapoutput statistics 
suggested by @scwf .
For cbo, if the computing logic before limit is complex, it's hard to get 
an accurate estimation. E.g. joins from filtered tables, where join keys and 
filter keys are probably different (that'll need column correlation info).
As you mentioned we can get an estimated number and confidence, can you 
describe how?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16647: [SPARK-19292][SQL] filter with partition columns ...

2017-01-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16647


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16647: [SPARK-19292][SQL] filter with partition columns should ...

2017-01-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16647
  
Thanks! Merging to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97009203
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16631: [SPARK-19271] [SQL] Change non-cbo estimation of ...

2017-01-19 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16631#discussion_r97008849
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -344,7 +344,8 @@ abstract class UnaryNode extends LogicalPlan {
   sizeInBytes = 1
 }
 
-child.stats(conf).copy(sizeInBytes = sizeInBytes)
+// Don't propagate rowCount and attributeStats, since they are not 
estimated here.
--- End diff --

This sounds a general bug. We are having multiple `UnaryNode` are doing the 
same thing. Is my understanding right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71694/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-19 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16630
  
Jenkins add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-19 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16630
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16028
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16028: [SPARK-18518][ML] HasSolver supports override

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16028
  
**[Test build #71694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71694/testReport)**
 for PR 16028 at commit 
[`a95`](https://github.com/apache/spark/commit/a959a0b9a98dda2f45ce4843ed8595024e58).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15219
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...

2017-01-19 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15219
  
**[Test build #71692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71692/testReport)**
 for PR 15219 at commit 
[`b15d9d5`](https://github.com/apache/spark/commit/b15d9d5724936f5946d99acc40b75754e8583aa6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71692/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-01-19 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16630
  
jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97007956
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   .setStages(Array(rFormulaModel, gm))
   .fit(data)
 
-new GaussianMixtureWrapper(pipeline, dim)
+val gmm: GaussianMixtureModel = 
pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
--- End diff --

We need here to explicitly get ```logLikelihood``` and make it a member of 
the wrapper, since ```summary``` was not saved in the pipeline model so we 
can't get it (after L40) from a persistent R gaussian mixture model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-19 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/16646#discussion_r97007694
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala ---
@@ -91,7 +92,10 @@ private[r] object GaussianMixtureWrapper extends 
MLReadable[GaussianMixtureWrapp
   .setStages(Array(rFormulaModel, gm))
   .fit(data)
 
-new GaussianMixtureWrapper(pipeline, dim)
+val gmm: GaussianMixtureModel = 
pipeline.stages(1).asInstanceOf[GaussianMixtureModel]
+val logLikelihood: Double = gmm.summary.logLikelihood
+
+new GaussianMixtureWrapper(pipeline, dim, logLikelihood)
--- End diff --

We can't, since ```summary``` was not saved in the pipeline model, so we 
need to save it into the wrapper explicitly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16633: [SPARK-19274][SQL] Make GlobalLimit without shuffling da...

2017-01-19 Thread scwf

Github user scwf commented on the issue:

https://github.com/apache/spark/pull/16633
  
@viirya i suggest fix the 2 in this pr, let's wait some comment on 1.  /cc 
@rxin and @wzhfy who may comment on the first case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71690/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11867: [SPARK-14049] [CORE] Add functionality in spark history ...

2017-01-19 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11867
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 448 matches

Mail list logo