[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22812
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4562/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22812
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22309: [SPARK-20384][SQL] Support value class in schema ...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22309#discussion_r228730753
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala
 ---
@@ -358,4 +368,20 @@ class ScalaReflectionSuite extends SparkFunSuite {
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Double, 
Int)]) == 1)
 assert(numberOfCheckedArguments(deserializerFor[(java.lang.Integer, 
java.lang.Integer)]) == 0)
   }
+
+  test("schema for case class that is a value class") {
+val schema = schemaFor[TestingValueClass.IntWrapper]
+assert(schema === Schema(IntegerType, nullable = false))
+  }
+
+  test("schema for case class that contains value class fields") {
+val schema = schemaFor[TestingValueClass.ValueClassData]
+assert(schema === Schema(
+  StructType(Seq(
+StructField("intField", IntegerType, nullable = false),
+StructField("wrappedInt", IntegerType, nullable = false),
--- End diff --

to confirm, scala value class for primitive type can't be null?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22812
  
**[Test build #98147 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98147/testReport)**
 for PR 22812 at commit 
[`517bebf`](https://github.com/apache/spark/commit/517bebfb1e49f2315019696a50b657dcf715778c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22812: [SPARK-25817][SQL] Dataset encoder should support combin...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22812
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22309: [SPARK-20384][SQL] Support value class in schema of Data...

2018-10-27 Thread mt40
Github user mt40 commented on the issue:

https://github.com/apache/spark/pull/22309
  
@cloud-fan It works now. Actually, top level value class is supported from 
[SPARK-17368](https://issues.apache.org/jira/browse/SPARK-17368). I try to 
maintain that and add support for nested value class in this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22858: [SPARK-24709][SQL][2.4] use str instead of basestring in...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22858
  
@HyukjinKwon thanks for the information! Shall we replace `str` with 
`basestring` in `functions.py` for master branch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22858: [SPARK-24709][SQL][2.4] use str instead of basest...

2018-10-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22858#discussion_r228730582
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2326,7 +2326,7 @@ def schema_of_json(json):
 >>> df.select(schema_of_json('{"a": 0}').alias("json")).collect()
 [Row(json=u'struct')]
 """
-if isinstance(json, basestring):
+if isinstance(json, str):
--- End diff --

shall we apply it to 2.4? I'm not aware of the background, why we did not 
put
```
if sys.version >= '3':
basestring = str
```
in 2.4?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98145 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98145/testReport)**
 for PR 22784 at commit 
[`18af032`](https://github.com/apache/spark/commit/18af0325e95552a00983983224795e71f2e66204).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98145/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98144/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98144 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98144/testReport)**
 for PR 22784 at commit 
[`094594b`](https://github.com/apache/spark/commit/094594bf63a22be65bac7b31932d5d870f1142d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22809
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22809
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98139/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22809
  
**[Test build #98139 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98139/testReport)**
 for PR 22809 at commit 
[`07205de`](https://github.com/apache/spark/commit/07205dea343539cb812622205fd0534b77f183d0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #98146 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98146/testReport)**
 for PR 21732 at commit 
[`fec1cac`](https://github.com/apache/spark/commit/fec1cac2c5f8fa5226001820c24fe5fc8304fe3f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4561/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22817: [SPARK-25816][SQL] Fix attribute resolution in nested ex...

2018-10-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22817
  
The fix looks fine to me. cc @cloud-fan @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22817: [SPARK-25816][SQL] Fix attribute resolution in ne...

2018-10-27 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22817#discussion_r228729920
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2578,4 +2578,12 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 Row ("abc", 1))
 }
   }
+
+  test("SPARK-25816 ResolveReferences works with nested extractors") {
+val df0 = Seq((1, Map(1 -> "a")), (2, Map(2 -> "b"))).toDF("1", "2")
+val df1 = df0.select($"1".as("2"), $"2".as("1"))
+val df2 = df1.filter($"1"(map_keys($"1")(0)) > "a")
--- End diff --

We are unable to resolve the expressions in `extraction` of 
`UnresolvedExtractValue`. We can simplify the expression in the `extraction`. 
For example, `df1.filter($"1"($"2") > "a")`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98145/testReport)**
 for PR 22784 at commit 
[`18af032`](https://github.com/apache/spark/commit/18af0325e95552a00983983224795e71f2e66204).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228729214
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,21 @@ class PCASuite extends SparkFunSuite with 
MLlibTestSparkContext {
 // check overflowing
 assert(PCAUtil.memoryCost(4, 6) > Int.MaxValue)
   }
+
+  test("number of features more than 65535") {
+val rows = 10
+val columns = 10
+val k = 5
+val randomRDD = RandomRDDs.normalVectorRDD(sc, rows, columns, 0, 0)
+val pca = new PCA(k).fit(randomRDD)
+assert(pca.explainedVariance.size === 5)
+assert(pca.pc.numRows === 10 && pca.pc.numCols === 5)
+// Eigen values should not be negative
+assert(!pca.explainedVariance.values.exists(_ < 0))
+
+// Norm of the principle component should be 1.0
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228729215
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,21 @@ class PCASuite extends SparkFunSuite with 
MLlibTestSparkContext {
 // check overflowing
 assert(PCAUtil.memoryCost(4, 6) > Int.MaxValue)
   }
+
+  test("number of features more than 65535") {
+val rows = 10
+val columns = 10
+val k = 5
+val randomRDD = RandomRDDs.normalVectorRDD(sc, rows, columns, 0, 0)
+val pca = new PCA(k).fit(randomRDD)
+assert(pca.explainedVariance.size === 5)
+assert(pca.pc.numRows === 10 && pca.pc.numCols === 5)
+// Eigen values should not be negative
+assert(!pca.explainedVariance.values.exists(_ < 0))
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228729208
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -49,7 +50,16 @@ class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int) {
   "Try reducing the parameter k for PCA, or reduce the input feature " 
+
   "vector dimension to make this tractable.")
 
-val mat = new RowMatrix(sources)
+val mat = if (numFeatures > 65535) {
+  val meanVector = Statistics.colStats(sources).mean
--- End diff --

I have modified. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228729201
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,14 @@ class PCASuite extends SparkFunSuite with 
MLlibTestSparkContext {
 // check overflowing
 assert(PCAUtil.memoryCost(4, 6) > Int.MaxValue)
   }
+
+  test("number of features more than 65500") {
+val rows = 10
+val columns = 10
+val k = 5
+val randomRDD = RandomRDDs.normalVectorRDD(sc, rows, columns, 0, 0)
+val pca = new PCA(k).fit(randomRDD)
+assert(pca.explainedVariance.size === 5)
+assert(pca.pc.numRows === 10 && pca.pc.numCols === 5)
--- End diff --

Thanks. I have updated the test case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24914][SQL] Add configuration to avoid OOM during...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21950
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24914][SQL] Add configuration to avoid OOM during...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21950
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98137/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98144 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98144/testReport)**
 for PR 22784 at commit 
[`094594b`](https://github.com/apache/spark/commit/094594bf63a22be65bac7b31932d5d870f1142d3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24914][SQL] Add configuration to avoid OOM during...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21950
  
**[Test build #98137 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98137/testReport)**
 for PR 21950 at commit 
[`ddfe945`](https://github.com/apache/spark/commit/ddfe945ef161e59fc2bbc1a12bf40563d2bdd400).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

2018-10-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19601
  
Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/22863
  
Thanks @felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22863: [SPARK-25859][ML]add scala/java/python example an...

2018-10-27 Thread huaxingao
Github user huaxingao closed the pull request at:

https://github.com/apache/spark/pull/22863


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22863
  
please close this PR. thx


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22843


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22863
  
merged to 2.4


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22863: [SPARK-25859][ML]add scala/java/python example an...

2018-10-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/22863#discussion_r228727912
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/PrefixSpanExample.scala ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+// scalastyle:off println
+
+// $example on$
+import org.apache.spark.ml.fpm.PrefixSpan
+// $example off$
+import org.apache.spark.sql.SparkSession
+
+/**
+ * An example demonstrating PrefixSpan.
+ * Run with
+ * {{{
+ * bin/run-example ml.PrefixSpanExample
+ * }}}
+ */
+object PrefixSpanExample {
+
+  def main(args: Array[String]): Unit = {
+val spark = SparkSession
+  .builder
+  .appName(s"${this.getClass.getSimpleName}")
+  .getOrCreate()
+import spark.implicits._
+
+// $example on$
+val smallTestData = Seq(
+  Seq(Seq(1, 2), Seq(3)),
+  Seq(Seq(1), Seq(3, 2), Seq(1, 2)),
+  Seq(Seq(1, 2), Seq(5)),
+  Seq(Seq(6)))
+
+val df = smallTestData.toDF("sequence")
+val result = new PrefixSpan()
+  .setMinSupport(0.5)
+  .setMaxPatternLength(5)
+  .setMaxLocalProjDBSize(3200)
+  .findFrequentSequentialPatterns(df)
+  .show()
+// $example off$
+
+spark.stop()
+  }
+}
+// scalastyle:on println
--- End diff --

nit: looks like println is not used in example here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22865
  
**[Test build #98143 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98143/testReport)**
 for PR 22865 at commit 
[`af8a85a`](https://github.com/apache/spark/commit/af8a85ae4a1e477801bf104af6d4909cd822ba01).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22865
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22843: [SPARK-16693][SPARKR] Remove methods deprecated

2018-10-27 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/22843
  
merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22865
  
**[Test build #98142 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98142/testReport)**
 for PR 22865 at commit 
[`af8a85a`](https://github.com/apache/spark/commit/af8a85ae4a1e477801bf104af6d4909cd822ba01).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98141/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22865
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98141 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98141/testReport)**
 for PR 22784 at commit 
[`3cbe017`](https://github.com/apache/spark/commit/3cbe017c640764db0fe95bcc2a820917bbc5fb3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFilter.en...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22865
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread bersprockets
GitHub user bersprockets opened a pull request:

https://github.com/apache/spark/pull/22865

[DOC] Fix doc for spark.sql.parquet.recordLevelFilter.enabled

## What changes were proposed in this pull request?

Updated the doc string value for 
spark.sql.parquet.recordLevelFilter.enabled to indicate that 
spark.sql.parquet.enableVectorizedReader must be disabled.

The code in ParquetFileFormat uses 
spark.sql.parquet.recordLevelFilter.enabled only after falling back to 
parquet-mr (see else for this if statement): 
https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L412

https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L427-L430

Tests also bear this out.

## How was this patch tested?

This is just a doc string fix: I built Spark and ran a single test.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bersprockets/spark confdocfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22865


commit af8a85ae4a1e477801bf104af6d4909cd822ba01
Author: Bruce Robbins 
Date:   2018-10-27T21:47:50Z

update doc string for spark.sql.parquet.recordLevelFilter.enabled




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98140 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98140/testReport)**
 for PR 22784 at commit 
[`5674e17`](https://github.com/apache/spark/commit/5674e177b7894d61904c6748dbf7721359163938).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98140/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21157
  
I meant to use


https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L583

Instead of 


https://github.com/apache/spark/blob/a97001d21757ae214c86371141bd78a376200f66/python/pyspark/serializers.py#L561



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98141 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98141/testReport)**
 for PR 22784 at commit 
[`3cbe017`](https://github.com/apache/spark/commit/3cbe017c640764db0fe95bcc2a820917bbc5fb3e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98140 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98140/testReport)**
 for PR 22784 at commit 
[`5674e17`](https://github.com/apache/spark/commit/5674e177b7894d61904c6748dbf7721359163938).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22809
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22809
  
**[Test build #98139 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98139/testReport)**
 for PR 22809 at commit 
[`07205de`](https://github.com/apache/spark/commit/07205dea343539cb812622205fd0534b77f183d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22809
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4560/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22809: [SPARK-19851][SQL] Add support for EVERY and ANY (SOME) ...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22809
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22809: [SPARK-19851][SQL] Add support for EVERY and ANY ...

2018-10-27 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/22809#discussion_r228725645
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/UnevaluableAggs.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+abstract class UnevaluableBooleanAggBase(arg: Expression)
--- End diff --

@cloud-fan @mgaido91 Thank you. I have added a TODO for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
> I think people do defined NamedTuples in Notebooks, so I'm going to stick 
with -1.

@holdenk I understand your point, but there is still something we can do 
without breaking existing code relying on namedtuple serialization. Option 1: 
switch to cloudpickle as suggested by @HyukjinKwon. Option 2: #21180. What 
would be your choice between the two? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98138/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22863
  
**[Test build #98138 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98138/testReport)**
 for PR 22863 at commit 
[`ddcab50`](https://github.com/apache/spark/commit/ddcab50d458dbfad843f74d55aedc51da5c3b6d0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228724594
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,21 @@ class PCASuite extends SparkFunSuite with 
MLlibTestSparkContext {
 // check overflowing
 assert(PCAUtil.memoryCost(4, 6) > Int.MaxValue)
   }
+
+  test("number of features more than 65535") {
+val rows = 10
+val columns = 10
+val k = 5
+val randomRDD = RandomRDDs.normalVectorRDD(sc, rows, columns, 0, 0)
+val pca = new PCA(k).fit(randomRDD)
+assert(pca.explainedVariance.size === 5)
+assert(pca.pc.numRows === 10 && pca.pc.numCols === 5)
+// Eigen values should not be negative
+assert(!pca.explainedVariance.values.exists(_ < 0))
+
+// Norm of the principle component should be 1.0
--- End diff --

Nit: principle -> principal


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228724541
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala 
---
@@ -384,18 +384,28 @@ class RowMatrix @Since("1.0.0") (
 val n = numCols().toInt
 require(k > 0 && k <= n, s"k = $k out of range (0, n = $n]")
 
-val Cov = computeCovariance().asBreeze.asInstanceOf[BDM[Double]]
+if (n > 65535) {
+  val svd = computeSVD(k)
+  val s = svd.s.toArray.map(eigValue => eigValue * eigValue / (n - 1))
--- End diff --

Right, make sense.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22861
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228724515
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -49,7 +50,16 @@ class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int) {
   "Try reducing the parameter k for PCA, or reduce the input feature " 
+
   "vector dimension to make this tractable.")
 
-val mat = new RowMatrix(sources)
+val mat = if (numFeatures > 65535) {
+  val meanVector = Statistics.colStats(sources).mean
--- End diff --

Rather than call `.toArray` and `.zipped` below, can this not be written as 
Vector - Vector in the loop below? might be more efficient.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228724667
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,14 @@ class PCASuite extends SparkFunSuite with 
MLlibTestSparkContext {
 // check overflowing
 assert(PCAUtil.memoryCost(4, 6) > Int.MaxValue)
   }
+
+  test("number of features more than 65500") {
+val rows = 10
+val columns = 10
+val k = 5
+val randomRDD = RandomRDDs.normalVectorRDD(sc, rows, columns, 0, 0)
+val pca = new PCA(k).fit(randomRDD)
+assert(pca.explainedVariance.size === 5)
+assert(pca.pc.numRows === 10 && pca.pc.numCols === 5)
--- End diff --

Is there an easy dummy test case we can write where we know what the first 
PC should be? like if you generate a bunch of vectors like (a +/- epsilon, a 
+/- epsilon, ...) for many a, the principal component should be (1,1,1...) 
nearly right? is that easy enough to add as a trivial test of the actual 
analysis? I think that would really prove it, though you manual test suggests 
it's working.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535...

2018-10-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22784#discussion_r228724555
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala ---
@@ -54,4 +55,21 @@ class PCASuite extends SparkFunSuite with 
MLlibTestSparkContext {
 // check overflowing
 assert(PCAUtil.memoryCost(4, 6) > Int.MaxValue)
   }
+
+  test("number of features more than 65535") {
+val rows = 10
+val columns = 10
+val k = 5
+val randomRDD = RandomRDDs.normalVectorRDD(sc, rows, columns, 0, 0)
+val pca = new PCA(k).fit(randomRDD)
+assert(pca.explainedVariance.size === 5)
+assert(pca.pc.numRows === 10 && pca.pc.numCols === 5)
+// Eigen values should not be negative
+assert(!pca.explainedVariance.values.exists(_ < 0))
--- End diff --

You can write `.forAll(_ >= 0)` too, but doesn't matter


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-10-27 Thread superbobry
Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
@HyukjinKwon do you mean change the default serializer to cloudpickle and 
remove _hack_namedtuple?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22861
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98133/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22861: [SPARK-25663][SPARK-25661][SQL][TEST] Refactor BuiltInDa...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22861
  
**[Test build #98133 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98133/testReport)**
 for PR 22861 at commit 
[`81fe383`](https://github.com/apache/spark/commit/81fe383d4f1189c3a4a7bae32f8ca38d123e6d7d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait DataSourceWriteBenchmark extends SqlBasedBenchmark `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22863
  
**[Test build #98138 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98138/testReport)**
 for PR 22863 at commit 
[`ddcab50`](https://github.com/apache/spark/commit/ddcab50d458dbfad843f74d55aedc51da5c3b6d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4559/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21950: [SPARK-24914][SQL] Add configuration to avoid OOM during...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21950
  
**[Test build #98137 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98137/testReport)**
 for PR 21950 at commit 
[`ddfe945`](https://github.com/apache/spark/commit/ddfe945ef161e59fc2bbc1a12bf40563d2bdd400).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [Minor][WEBUI] Remove refresh interval parameter from th...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22864
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [Minor][WEBUI] Remove refresh interval parameter from th...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22864
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [Minor][WEBUI] Remove refresh interval parameter from th...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22864
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98136/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22863
  
**[Test build #98136 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98136/testReport)**
 for PR 22863 at commit 
[`3109c21`](https://github.com/apache/spark/commit/3109c213c2f875ea7099929621a3be18b5f02862).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaPrefixSpanExample `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22864: [Minor][WEBUI] Remove refresh interval parameter ...

2018-10-27 Thread shahidki31
GitHub user shahidki31 opened a pull request:

https://github.com/apache/spark/pull/22864

[Minor][WEBUI] Remove refresh interval parameter from the headerSparkPage 
method.

## What changes were proposed in this pull request?
'refreshInterval' is not used any where in the headerSparkPage method. So, 
we don't need to pass the parameter while calling the  'headerSparkPage' method.

## How was this patch tested?
Existing tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shahidki31/spark unusedCode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22864


commit cd6f3ba922c96fc2f00871d36362bdecb84344a4
Author: Shahid 
Date:   2018-10-27T18:49:46Z

Remove refresh interval from headerSparkPage




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98135/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98135 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98135/testReport)**
 for PR 22784 at commit 
[`a8c4391`](https://github.com/apache/spark/commit/a8c43919a5d8624a5a5ddf7ea862a93f2db098c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4558/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/22863
  
@felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22863: [SPARK-25859][ML]add scala/java/python example and doc f...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22863
  
**[Test build #98136 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98136/testReport)**
 for PR 22863 at commit 
[`3109c21`](https://github.com/apache/spark/commit/3109c213c2f875ea7099929621a3be18b5f02862).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22863: [SPARK-25859][ML]add scala/java/python example an...

2018-10-27 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/22863

[SPARK-25859][ML]add scala/java/python example and doc for PrefixSpan

## What changes were proposed in this pull request?

add scala/java/python example and doc for PrefixSpan in branch 2.4

## How was this patch tested?

Manually tested


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark mydocbranch

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22863.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22863


commit 3109c213c2f875ea7099929621a3be18b5f02862
Author: Huaxin Gao 
Date:   2018-10-27T18:14:36Z

[SPARK-25859][ML]add scala/java/python example and doc for PrefixSpan




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98132/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22847
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22847
  
**[Test build #98132 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98132/testReport)**
 for PR 22847 at commit 
[`0db224f`](https://github.com/apache/spark/commit/0db224f0eebc52a8fc1dc47fa03ff78151b3b6d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98131/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #98131 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98131/testReport)**
 for PR 21732 at commit 
[`79d10c1`](https://github.com/apache/spark/commit/79d10c1ebc7b29a7d05bc1fb71dd543eab23db24).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22847
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98130/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22847
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22847
  
**[Test build #98130 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98130/testReport)**
 for PR 22847 at commit 
[`b578dd4`](https://github.com/apache/spark/commit/b578dd45cb4e6831a4bb54ba4c0d9c8f5c84fec5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   >