[GitHub] spark issue #4066: [SPARK-4879] Use driver to coordinate Hadoop output commi...

2017-01-04 Thread matrixlibing
Github user matrixlibing commented on the issue:

https://github.com/apache/spark/pull/4066
  
SPARK-4879  also happened when use saveAsNewAPIHadoopFile.
Why does not support the saveAsNewAPIHadoopFile function?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-04 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/14725
  
Maybe it could be cleared up a bit with a good docstring? Although if the 
result is too confusing to be used then it's probably not worth doing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15211: [SPARK-14709][ML] spark.ml API for linear SVM

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15211#discussion_r94723792
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
@@ -0,0 +1,554 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.classification
+
+import scala.collection.mutable
+
+import breeze.linalg.{DenseVector => BDV}
+import breeze.optimize.{CachedDiffFunction, DiffFunction, OWLQN => 
BreezeOWLQN}
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.SparkException
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.internal.Logging
+import org.apache.spark.ml.feature.Instance
+import org.apache.spark.ml.linalg._
+import org.apache.spark.ml.linalg.BLAS._
+import org.apache.spark.ml.param._
+import org.apache.spark.ml.param.shared._
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.linalg.VectorImplicits._
+import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{Dataset, Row}
+import org.apache.spark.sql.functions.{col, lit}
+
+/** Params for linear SVM Classifier. */
+private[classification] trait LinearSVCParams extends ClassifierParams 
with HasRegParam
+  with HasMaxIter with HasFitIntercept with HasTol with HasStandardization 
with HasWeightCol
+  with HasThreshold with HasAggregationDepth {
+
--- End diff --

nit, the braces here can be removed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16296
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16296
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70904/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16296
  
**[Test build #70904 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70904/testReport)**
 for PR 16296 at commit 
[`91d173d`](https://github.com/apache/spark/commit/91d173de5e6610ea0621053320208fa8c6597b40).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12135
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70902/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12135
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #70902 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70902/testReport)**
 for PR 12135 at commit 
[`6517f21`](https://github.com/apache/spark/commit/6517f2186417fce1d0fcde1d5e8c561d686dcef0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16432
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70910/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16432
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16432
  
**[Test build #70910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70910/testReport)**
 for PR 16432 at commit 
[`131d420`](https://github.com/apache/spark/commit/131d420c779702f8a7129f29a4d1d361eb92a166).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16474
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16474
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70903/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16474
  
**[Test build #70903 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70903/testReport)**
 for PR 16474 at commit 
[`586b347`](https://github.com/apache/spark/commit/586b347b04b64ddf2b70e4fb16035f80ad5a400e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16296
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70905/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16296
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16296
  
**[Test build #70905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70905/testReport)**
 for PR 16296 at commit 
[`83ecc24`](https://github.com/apache/spark/commit/83ecc2439fcc7314cfaf67cfc8c18c99abb16f31).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-04 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16464#discussion_r94721880
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
@@ -172,6 +187,8 @@ private[r] object LDAWrapper extends 
MLReadable[LDAWrapper] {
   model,
   ldaModel.logLikelihood(preprocessedData),
   ldaModel.logPerplexity(preprocessedData),
+  trainingLogLikelihood,
+  logPrior,
--- End diff --

The first version, I got them from the model in the `LDAWrapper` class. 
However, when I read `logPrior`, I found that the loaded `logPrior` is not the 
same as the value before save. So, I followed the logLikelihood and 
logPerplexity to save it in the metadata. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16417: [SPARK-19014][SQL] support complex aggregate buff...

2017-01-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16417#discussion_r94721694
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -92,46 +89,53 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
   s"$rowWriter.reset();"
 }
 
-val writeFields = inputs.zip(inputTypes).zipWithIndex.map {
-  case ((input, dataType), index) =>
+val writeFields = inputs.zip(inputsTypeAndNullable).zipWithIndex.map {
+  case ((input, (dataType, nullable)), index) =>
 val dt = dataType match {
   case udt: UserDefinedType[_] => udt.sqlType
   case other => other
 }
 val tmpCursor = ctx.freshName("tmpCursor")
 
 val setNull = dt match {
-  case t: DecimalType if t.precision > Decimal.MAX_LONG_DIGITS =>
-// Can't call setNullAt() for DecimalType with precision 
larger than 18.
-s"$rowWriter.write($index, (Decimal) null, ${t.precision}, 
${t.scale});"
+  case dt if UnsafeRow.isMutable(dt) && 
!UnsafeRow.isFixedLength(dt) =>
+def varLenDataSize(s: DataType): Int = s match {
--- End diff --

Is it good to put this `varLenDataSize` into `UnsafeRow`? Writing a 
constant like 16 here makes the code untraceable for new comers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15819
  
Weird... Not sure why the build failed. The build works in my local 
environment. cc @srowen @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16417: [SPARK-19014][SQL] support complex aggregate buff...

2017-01-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16417#discussion_r94721395
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -327,24 +359,28 @@ class CodegenContext {
   nullable: Boolean,
   isVectorized: Boolean = false): String = {
 if (nullable) {
-  // Can't call setNullAt on DecimalType, because we need to keep the 
offset
-  if (!isVectorized && dataType.isInstanceOf[DecimalType]) {
+  // Can't call setNullAt for mutable but non-primitive data type, 
e.g. DecimalType, StructType
+  // with fix-length fields, because we need to keep the offset and 
length.
+  val isMutableVarLenField = UnsafeRow.isMutable(dataType) && 
!UnsafeRow.isFixedLength(dataType)
--- End diff --

Do you think it is good to add `isMutableVarLenField` method into 
`UnsafeRow` directly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-04 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16347
  
Maybe we should make DataFrameWriter.sortBy work here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16432
  
**[Test build #70910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70910/testReport)**
 for PR 16432 at commit 
[`131d420`](https://github.com/apache/spark/commit/131d420c779702f8a7129f29a4d1d361eb92a166).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15819
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70908/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15819
  
**[Test build #70908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70908/consoleFull)**
 for PR 15819 at commit 
[`15da7a8`](https://github.com/apache/spark/commit/15da7a83c8599a0d6d28f5a0bce8a3132033867c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16417
  
**[Test build #70909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70909/testReport)**
 for PR 16417 at commit 
[`32e527d`](https://github.com/apache/spark/commit/32e527d902318c9e81e8586f592968ee08416acd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16417: [SPARK-19014][SQL] support complex aggregate buff...

2017-01-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16417#discussion_r94719867
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -303,6 +331,15 @@ public void setDecimal(int ordinal, Decimal value, int 
precision) {
 }
   }
 
+  public void setFixedLengthStruct(int ordinal, UnsafeRow row) {
--- End diff --

Do we need call `setNotNullAt(ordinal)` here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15819
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2017-01-04 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16417
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2017-01-04 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16417
  
Jenkins looks unstable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15819
  
**[Test build #70908 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70908/consoleFull)**
 for PR 15819 at commit 
[`15da7a8`](https://github.com/apache/spark/commit/15da7a83c8599a0d6d28f5a0bce8a3132033867c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15819
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94719375
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
+   |location '${tmpDir.toURI.toString}'
+ """.stripMargin)
+
+  import sqlContext.implicits._
--- End diff --

Nit: move this import to line 231. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15819
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15819
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70907/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15819
  
**[Test build #70907 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70907/consoleFull)**
 for PR 15819 at commit 
[`15da7a8`](https://github.com/apache/spark/commit/15da7a83c8599a0d6d28f5a0bce8a3132033867c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94719123
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -54,6 +63,63 @@ case class InsertIntoHiveTable(
   @transient private lazy val hiveContext = new Context(sc.hiveconf)
   @transient private lazy val catalog = sc.catalog
 
+  @transient var createdTempDir: Option[Path] = None
+  val stagingDir = new HiveConf().getVar(HiveConf.ConfVars.STAGINGDIR)
+
+  private def executionId: String = {
+val rand: Random = new Random
+val format: SimpleDateFormat = new 
SimpleDateFormat("-MM-dd_HH-mm-ss_SSS")
+val executionId: String = "hive_" + format.format(new Date) + "_" + 
Math.abs(rand.nextLong)
+ executionId
--- End diff --

Nit: an indent issue. Please remove one more space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15819#discussion_r94719028
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite with Logging 
{
 "as 'COMPACT' WITH DEFERRED REBUILD")
   client.reset()
 }
+
+test(s"$version: CREATE TABLE AS SELECT") {
+  withTable("tbl") {
+sqlContext.sql("CREATE TABLE tbl AS SELECT 1 AS a")
+assert(sqlContext.table("tbl").collect().toSeq == Seq(Row(1)))
+  }
+}
+
+test(s"$version: Delete the temporary staging directory and files 
after each insert") {
+  withTempDir { tmpDir =>
+withTable("tab", "tbl") {
+  sqlContext.sql(
+s"""
+   |CREATE  TABLE tab(c1 string)
--- End diff --

Nit: two spaces -> one space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16470: [SPARK-19033][Core] Add admin acls for history server

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16470
  
**[Test build #70906 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70906/testReport)**
 for PR 16470 at commit 
[`f4357e8`](https://github.com/apache/spark/commit/f4357e8ae890b0e0e021167ef796b7dd2f6cbb18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15819
  
**[Test build #70907 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70907/consoleFull)**
 for PR 15819 at commit 
[`15da7a8`](https://github.com/apache/spark/commit/15da7a83c8599a0d6d28f5a0bce8a3132033867c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15819
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16470: [SPARK-19033][Core] Add admin acls for history server

2017-01-04 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/16470
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16337
  
I compared the results and confirmed the results are consistent. LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94718428
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, Params}
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.fpm.{FPGrowth => MLlibFPGrowth, 
FPGrowthModel => MLlibFPGrowthModel}
+import org.apache.spark.sql.{DataFrame, _}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+
+/**
+ * Common params for FPGrowth and FPGrowthModel
+ */
+private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+
+  /**
+   * Validates and transforms the input schema.
+   * @param schema input schema
+   * @return output schema
+   */
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+SchemaUtils.checkColumnType(schema, $(featuresCol), new 
ArrayType(StringType, false))
+SchemaUtils.appendColumn(schema, $(predictionCol), new 
ArrayType(StringType, false))
+  }
+
+  /**
+   * the minimal support level of the frequent pattern
+   * Default: 0.3
+   * @group param
+   */
+  @Since("2.2.0")
+  val minSupport: DoubleParam = new DoubleParam(this, "minSupport",
+"the minimal support level of the frequent pattern (Default: 0.3)")
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMinSupport: Double = $(minSupport)
+
+}
+
+/**
+ * :: Experimental ::
+ * A parallel FP-growth algorithm to mine frequent itemsets.
+ *
+ * @see [[http://dx.doi.org/10.1145/1454008.1454027 Li et al., PFP: 
Parallel FP-Growth for Query
+ *  Recommendation]]
+ */
+@Since("2.2.0")
+@Experimental
+class FPGrowth @Since("2.2.0") (
+@Since("2.2.0") override val uid: String)
+  extends Estimator[FPGrowthModel] with FPGrowthParams with 
DefaultParamsWritable {
+
+  @Since("2.2.0")
+  def this() = this(Identifiable.randomUID("FPGrowth"))
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setMinSupport(value: Double): this.type = set(minSupport, value)
+  setDefault(minSupport -> 0.3)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
+
+  def fit(dataset: Dataset[_]): FPGrowthModel = {
+val data = dataset.select($(featuresCol)).rdd.map(r => 
r.getSeq[String](0).toArray)
+val parentModel = new 
MLlibFPGrowth().setMinSupport($(minSupport)).run(data)
+copyValues(new FPGrowthModel(uid, parentModel))
+  }
+
+  @Since("2.2.0")
+  override def transformSchema(schema: StructType): StructType = {
+validateAndTransformSchema(schema)
+  }
+
+  override def copy(extra: ParamMap): FPGrowth = defaultCopy(extra)
+}
+
+
+@Since("2.2.0")
+object FPGrowth extends DefaultParamsReadable[FPGrowth] {
+
+  @Since("2.2.0")
+  override def load(path: String): FPGrowth = super.load(path)
+}
+
+/**
+ * :: Experimental ::
+ * Model fitted by FPGrowth.
+ *
+ * @param parentModel a model trained by spark.mllib.fpm.FPGrowth
+ */
+@Since("2.2.0")
+@Experimental
+class FPGrowthModel private[ml] (
+@Since("2.2.0") override val uid: String,
+private val parentModel: MLlibFPGrowthModel[_])
+  extends Model[FPGrowthModel] with FPGrowthParams 

[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70899/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16451
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16460: [SPARK-19058][SQL] fix partition related behavior...

2017-01-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16460


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16460: [SPARK-19058][SQL] fix partition related behaviors with ...

2017-01-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16460
  
thanks for the view, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16451
  
**[Test build #70899 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70899/testReport)**
 for PR 16451 at commit 
[`79a17f4`](https://github.com/apache/spark/commit/79a17f4ef67d8f2fcb6bb7e415cd9c15226a1773).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN s...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16337#discussion_r94718082
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-group-by.sql.out
 ---
@@ -0,0 +1,357 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 19
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:00:00.000', date '2014-04-04'),
+  ("t1b", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1a", 16S, 12, 21L, float(15.0), 20D, 20E2, timestamp '2014-06-04 
01:02:00.001', date '2014-06-04'),
+  ("t1a", 16S, 12, 10L, float(15.0), 20D, 20E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("t1c", 8S, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:02:00.001', date '2014-05-05'),
+  ("t1d", null, 16, 22L, float(17.0), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', null),
+  ("t1d", null, 16, 19L, float(17.0), 25D, 26E2, timestamp '2014-07-04 
01:02:00.001', null),
+  ("t1e", 10S, null, 25L, float(17.0), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-09-04 
01:02:00.001', date '2014-09-04'),
+  ("t1d", 10S, null, 12L, float(17.0), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("t1a", 6S, 8, 10L, float(15.0), 20D, 20E2, timestamp '2014-04-04 
01:02:00.001', date '2014-04-04'),
+  ("t1e", 10S, null, 19L, float(17.0), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04')
+  as t1(t1a, t1b, t1c, t1d, t1e, t1f, t1g, t1h, t1i)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  ("t2a", 6S, 12, 14L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:01:00.000', date '2014-04-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1b", 8S, 16, 119L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:01:00.000', date '2015-05-04'),
+  ("t1c", 12S, 16, 219L, float(17), 25D, 26E2, timestamp '2016-05-04 
01:01:00.000', date '2016-05-04'),
+  ("t1b", null, 16, 319L, float(17), 25D, 26E2, timestamp '2017-05-04 
01:01:00.000', null),
+  ("t2e", 8S, null, 419L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("t1f", 19S, null, 519L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:01:00.000', date '2014-06-04'),
+  ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:01:00.000', date '2014-07-04'),
+  ("t1c", 12S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:01:00.000', date '2014-08-05'),
+  ("t1e", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:01:00.000', date '2014-09-04'),
+  ("t1f", 19S, null, 19L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:01:00.000', date '2014-10-04'),
+  ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:01:00.000', null)
+  as t2(t2a, t2b, t2c, t2d, t2e, t2f, t2g, t2h, t2i)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+create temporary view t3 as select * from values
+  ("t3a", 6S, 12, 110L, float(15), 20D, 20E2, timestamp '2014-04-04 
01:02:00.000', date '2014-04-04'),
+  ("t3a", 6S, 12, 10L, float(15), 20D, 20E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 219L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 10S, 12, 19L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t1b", 8S, 16, 319L, float(17), 25D, 26E2, timestamp '2014-06-04 
01:02:00.000', date '2014-06-04'),
+  ("t1b", 8S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-07-04 
01:02:00.000', date '2014-07-04'),
+  ("t3c", 17S, 16, 519L, float(17), 25D, 26E2, timestamp '2014-08-04 
01:02:00.000', date '2014-08-04'),
+  ("t3c", 17S, 16, 19L, float(17), 25D, 26E2, timestamp '2014-09-04 
01:02:00.000', date '2014-09-05'),
+  ("t1b", null, 16, 419L, float(17), 25D, 26E2, timestamp '2014-10-04 
01:02:00.000', null),
+  ("t1b", null, 16, 19L, float(17), 25D, 26E2, timestamp '2014-11-04 
01:02:00.000', null),
+  ("t3b", 8S, null, 719L, float(17), 25D, 26E2, timestamp '2014-05-04 
01:02:00.000', date '2014-05-04'),
+  ("t3b", 8S, null, 19L, float(17), 25D, 26E2, timestamp '2015-05-04 
01:02:00.000', date '2015-05-04')
+  as t3(t3a, t3b, t3c, t3d, t3e, t3f, t3g, t3h, t3i)
+-- !query 2 schema
+struct<>
+-- !query 2 output
+
+
+
+-- !query 

[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94717710
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, Params}
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.fpm.{FPGrowth => MLlibFPGrowth, 
FPGrowthModel => MLlibFPGrowthModel}
+import org.apache.spark.sql.{DataFrame, _}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+
+/**
+ * Common params for FPGrowth and FPGrowthModel
+ */
+private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+
+  /**
+   * Validates and transforms the input schema.
+   * @param schema input schema
+   * @return output schema
+   */
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+SchemaUtils.checkColumnType(schema, $(featuresCol), new 
ArrayType(StringType, false))
+SchemaUtils.appendColumn(schema, $(predictionCol), new 
ArrayType(StringType, false))
+  }
+
+  /**
+   * the minimal support level of the frequent pattern
+   * Default: 0.3
+   * @group param
+   */
+  @Since("2.2.0")
+  val minSupport: DoubleParam = new DoubleParam(this, "minSupport",
+"the minimal support level of the frequent pattern (Default: 0.3)")
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMinSupport: Double = $(minSupport)
+
+}
+
+/**
+ * :: Experimental ::
+ * A parallel FP-growth algorithm to mine frequent itemsets.
+ *
+ * @see [[http://dx.doi.org/10.1145/1454008.1454027 Li et al., PFP: 
Parallel FP-Growth for Query
+ *  Recommendation]]
+ */
+@Since("2.2.0")
+@Experimental
+class FPGrowth @Since("2.2.0") (
+@Since("2.2.0") override val uid: String)
+  extends Estimator[FPGrowthModel] with FPGrowthParams with 
DefaultParamsWritable {
+
+  @Since("2.2.0")
+  def this() = this(Identifiable.randomUID("FPGrowth"))
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setMinSupport(value: Double): this.type = set(minSupport, value)
+  setDefault(minSupport -> 0.3)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
+
+  def fit(dataset: Dataset[_]): FPGrowthModel = {
+val data = dataset.select($(featuresCol)).rdd.map(r => 
r.getSeq[String](0).toArray)
+val parentModel = new 
MLlibFPGrowth().setMinSupport($(minSupport)).run(data)
+copyValues(new FPGrowthModel(uid, parentModel))
+  }
+
+  @Since("2.2.0")
+  override def transformSchema(schema: StructType): StructType = {
+validateAndTransformSchema(schema)
+  }
+
+  override def copy(extra: ParamMap): FPGrowth = defaultCopy(extra)
+}
+
+
+@Since("2.2.0")
+object FPGrowth extends DefaultParamsReadable[FPGrowth] {
+
+  @Since("2.2.0")
+  override def load(path: String): FPGrowth = super.load(path)
+}
+
+/**
+ * :: Experimental ::
+ * Model fitted by FPGrowth.
+ *
+ * @param parentModel a model trained by spark.mllib.fpm.FPGrowth
+ */
+@Since("2.2.0")
+@Experimental
+class FPGrowthModel private[ml] (
+@Since("2.2.0") override val uid: String,
+private val parentModel: MLlibFPGrowthModel[_])
+  extends Model[FPGrowthModel] with FPGrowthParams 

[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-04 Thread junegunn
Github user junegunn commented on the issue:

https://github.com/apache/spark/pull/16347
  
@chpritchard-expedia The patch here fixes the problem. I don't think it's 
possible to workaround the issue by using Spark API in some different ways, 
because we can't completely avoid memory spills at the writers. Hive doesn't 
have the problem, so maybe you can consider running the same statement on Hive 
if this is not something Spark wants to address.

Anyway, for anyone who's interested, I could confirm that for a sorted ORC 
table built with this patch, point/range lookups on the sorted column can be 
several times faster. Also the final size of the table turned out to be 
significantly smaller in this case (60% of unsorted table) due to the temporal 
locality in our data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13252: [SPARK-15473][SQL] CSV data source writes header for emp...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/13252
  
Let me suggest a generalized way latter because it does not look a clean 
fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13252: [SPARK-15473][SQL] CSV data source writes header ...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/13252


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94717542
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, Params}
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.fpm.{FPGrowth => MLlibFPGrowth, 
FPGrowthModel => MLlibFPGrowthModel}
+import org.apache.spark.sql.{DataFrame, _}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+
+/**
+ * Common params for FPGrowth and FPGrowthModel
+ */
+private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+
+  /**
+   * Validates and transforms the input schema.
+   * @param schema input schema
+   * @return output schema
+   */
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+SchemaUtils.checkColumnType(schema, $(featuresCol), new 
ArrayType(StringType, false))
+SchemaUtils.appendColumn(schema, $(predictionCol), new 
ArrayType(StringType, false))
+  }
+
+  /**
+   * the minimal support level of the frequent pattern
+   * Default: 0.3
+   * @group param
+   */
+  @Since("2.2.0")
+  val minSupport: DoubleParam = new DoubleParam(this, "minSupport",
+"the minimal support level of the frequent pattern (Default: 0.3)")
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMinSupport: Double = $(minSupport)
+
+}
+
+/**
+ * :: Experimental ::
+ * A parallel FP-growth algorithm to mine frequent itemsets.
+ *
+ * @see [[http://dx.doi.org/10.1145/1454008.1454027 Li et al., PFP: 
Parallel FP-Growth for Query
+ *  Recommendation]]
+ */
+@Since("2.2.0")
+@Experimental
+class FPGrowth @Since("2.2.0") (
+@Since("2.2.0") override val uid: String)
+  extends Estimator[FPGrowthModel] with FPGrowthParams with 
DefaultParamsWritable {
+
+  @Since("2.2.0")
+  def this() = this(Identifiable.randomUID("FPGrowth"))
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setMinSupport(value: Double): this.type = set(minSupport, value)
+  setDefault(minSupport -> 0.3)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
--- End diff --

ditto, `setOutputCol`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94717473
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, Params}
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.fpm.{FPGrowth => MLlibFPGrowth, 
FPGrowthModel => MLlibFPGrowthModel}
+import org.apache.spark.sql.{DataFrame, _}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+
+/**
+ * Common params for FPGrowth and FPGrowthModel
+ */
+private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+
+  /**
+   * Validates and transforms the input schema.
+   * @param schema input schema
+   * @return output schema
+   */
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+SchemaUtils.checkColumnType(schema, $(featuresCol), new 
ArrayType(StringType, false))
+SchemaUtils.appendColumn(schema, $(predictionCol), new 
ArrayType(StringType, false))
+  }
+
+  /**
+   * the minimal support level of the frequent pattern
+   * Default: 0.3
+   * @group param
+   */
+  @Since("2.2.0")
+  val minSupport: DoubleParam = new DoubleParam(this, "minSupport",
+"the minimal support level of the frequent pattern (Default: 0.3)")
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMinSupport: Double = $(minSupport)
+
+}
+
+/**
+ * :: Experimental ::
+ * A parallel FP-growth algorithm to mine frequent itemsets.
+ *
+ * @see [[http://dx.doi.org/10.1145/1454008.1454027 Li et al., PFP: 
Parallel FP-Growth for Query
+ *  Recommendation]]
+ */
+@Since("2.2.0")
+@Experimental
+class FPGrowth @Since("2.2.0") (
+@Since("2.2.0") override val uid: String)
+  extends Estimator[FPGrowthModel] with FPGrowthParams with 
DefaultParamsWritable {
+
+  @Since("2.2.0")
+  def this() = this(Identifiable.randomUID("FPGrowth"))
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setMinSupport(value: Double): this.type = set(minSupport, value)
+  setDefault(minSupport -> 0.3)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
--- End diff --

I perfer use `setInputCol` and `inputCol` instead of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16470: [SPARK-19033][Core] Add admin acls for history server

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16470
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16470: [SPARK-19033][Core] Add admin acls for history server

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16470
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70900/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16470: [SPARK-19033][Core] Add admin acls for history server

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16470
  
**[Test build #70900 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70900/testReport)**
 for PR 16470 at commit 
[`f4357e8`](https://github.com/apache/spark/commit/f4357e8ae890b0e0e021167ef796b7dd2f6cbb18).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14284: [SPARK-16633] [SPARK-16642] [SPARK-16721] [SQL] Fixes th...

2017-01-04 Thread chengat1314
Github user chengat1314 commented on the issue:

https://github.com/apache/spark/pull/14284
  
@hvanhovell  Nice, thank you very much!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14725: [SPARK-17161] [PYSPARK][ML] Add PySpark-ML JavaWrapper c...

2017-01-04 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/14725
  
Thanks @holdenk for taking a look!  Yeah, I think you're right about the 
issues trying to infer a type.  It would be nice if there was some easy way to 
specify a primitive type since that would cover most of the cases in mllib, but 
it's not that big of deal.  If anything it's probably just a little obscure to 
a dev who's not familiar with py4j in Spark that for a sting the java class 
should be `sc._gateway.jvm.java.lang.String`, for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94717245
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, Params}
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.fpm.{FPGrowth => MLlibFPGrowth, 
FPGrowthModel => MLlibFPGrowthModel}
+import org.apache.spark.sql.{DataFrame, _}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+
+/**
+ * Common params for FPGrowth and FPGrowthModel
+ */
+private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+
+  /**
+   * Validates and transforms the input schema.
+   * @param schema input schema
+   * @return output schema
+   */
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+SchemaUtils.checkColumnType(schema, $(featuresCol), new 
ArrayType(StringType, false))
+SchemaUtils.appendColumn(schema, $(predictionCol), new 
ArrayType(StringType, false))
+  }
+
+  /**
+   * the minimal support level of the frequent pattern
+   * Default: 0.3
+   * @group param
+   */
+  @Since("2.2.0")
+  val minSupport: DoubleParam = new DoubleParam(this, "minSupport",
+"the minimal support level of the frequent pattern (Default: 0.3)")
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMinSupport: Double = $(minSupport)
+
--- End diff --

MLLib's `FPGrowth` have a param `numPartitions`, will it be included here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16435: [SPARK-19027][SQL] estimate size of object buffer...

2017-01-04 Thread cloud-fan
Github user cloud-fan closed the pull request at:

https://github.com/apache/spark/pull/16435


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16435: [SPARK-19027][SQL] estimate size of object buffer for ob...

2017-01-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16435
  
closing this as it's very hard to estimate the size and does not provide 
much benefit for end users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16460: [SPARK-19058][SQL] fix partition related behaviors with ...

2017-01-04 Thread ericl
Github user ericl commented on the issue:

https://github.com/apache/spark/pull/16460
  
looks good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94717055
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.hadoop.fs.Path
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.{Estimator, Model}
+import org.apache.spark.ml.param.{DoubleParam, ParamMap, Params}
+import org.apache.spark.ml.param.shared.{HasFeaturesCol, HasPredictionCol}
+import org.apache.spark.ml.util._
+import org.apache.spark.mllib.fpm.{FPGrowth => MLlibFPGrowth, 
FPGrowthModel => MLlibFPGrowthModel}
+import org.apache.spark.sql.{DataFrame, _}
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types.{ArrayType, StringType, StructType}
+
+/**
+ * Common params for FPGrowth and FPGrowthModel
+ */
+private[fpm] trait FPGrowthParams extends Params with HasFeaturesCol with 
HasPredictionCol {
+
+  /**
+   * Validates and transforms the input schema.
+   * @param schema input schema
+   * @return output schema
+   */
+  protected def validateAndTransformSchema(schema: StructType): StructType 
= {
+SchemaUtils.checkColumnType(schema, $(featuresCol), new 
ArrayType(StringType, false))
+SchemaUtils.appendColumn(schema, $(predictionCol), new 
ArrayType(StringType, false))
+  }
+
+  /**
+   * the minimal support level of the frequent pattern
+   * Default: 0.3
+   * @group param
+   */
+  @Since("2.2.0")
+  val minSupport: DoubleParam = new DoubleParam(this, "minSupport",
+"the minimal support level of the frequent pattern (Default: 0.3)")
--- End diff --

also need a `ParamValidator` here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70901/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #70901 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70901/testReport)**
 for PR 15671 at commit 
[`c8693d8`](https://github.com/apache/spark/commit/c8693d870373978b4a43b2215a1b487215107d45).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16464: [SPARK-19066][SparkR]:SparkR LDA doesn't set opti...

2017-01-04 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16464#discussion_r94716683
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/r/LDAWrapper.scala ---
@@ -172,6 +187,8 @@ private[r] object LDAWrapper extends 
MLReadable[LDAWrapper] {
   model,
   ldaModel.logLikelihood(preprocessedData),
   ldaModel.logPerplexity(preprocessedData),
+  trainingLogLikelihood,
+  logPrior,
--- End diff --

since model is referenced and persisted, is there a need to handle 
trainingLogLikelihood and logPrior separately like this, and writing to 
metadata, instead of just getting from the model when fetching for the summary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16460: [SPARK-19058][SQL] fix partition related behavior...

2017-01-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16460#discussion_r94716609
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -74,12 +69,30 @@ case class InsertIntoHadoopFsRelationCommand(
 val fs = outputPath.getFileSystem(hadoopConf)
 val qualifiedOutputPath = outputPath.makeQualified(fs.getUri, 
fs.getWorkingDirectory)
 
+val partitionsTrackedByCatalog = 
sparkSession.sessionState.conf.manageFilesourcePartitions &&
+  catalogTable.isDefined &&
+  catalogTable.get.partitionColumnNames.nonEmpty &&
+  catalogTable.get.tracksPartitionsInCatalog
+
+var initialMatchingPartitions: Seq[TablePartitionSpec] = Nil
+var customPartitionLocations: Map[TablePartitionSpec, String] = 
Map.empty
--- End diff --

yea it's true, but then the code may looks ugly, e.g.
```
val (longVariableName: LongTypeNameXXX, longVariableName: 
LongTypeNameXXX) = {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16460: [SPARK-19058][SQL] fix partition related behaviors with ...

2017-01-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16460
  
cc @ericl anymore comments on this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94716584
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.{DoubleParam, Param, ParamMap, Params}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.fpm.{AssociationRules => 
MLlibAssociationRules}
+import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+
+/**
+ * :: Experimental ::
+ *
+ * Generates association rules from frequent itemsets ("items", "freq"). 
This method only generates
+ * association rules which have a single item as the consequent.
+ */
+@Since("2.1.0")
--- End diff --

should be 2.2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16468: [SPARK-19074][SS][DOCS] Updated Structured Streaming Pro...

2017-01-04 Thread david-weiluo-ren
Github user david-weiluo-ren commented on the issue:

https://github.com/apache/spark/pull/16468
  
@tdas 
It says “However, note that all of the operations applicable on static 
DataFrames/Datasets are not supported in streaming DataFrames/Datasets yet” 
in 
https://spark.apache.org/docs/2.1.0/structured-streaming-programming-guide.html#unsupported-operations

I think it should be “not all of the operations …. are supported in … 
yet” instead of “all of the operations … are not supported in … yet"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16296
  
**[Test build #70905 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70905/testReport)**
 for PR 16296 at commit 
[`83ecc24`](https://github.com/apache/spark/commit/83ecc2439fcc7314cfaf67cfc8c18c99abb16f31).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16422
  
Column-level security can block users to access the specific columns, but 
this command `DESC EXTENDED/FORMATTED COLUMN` might not be part of the 
design/solution. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94716372
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.{DoubleParam, Param, ParamMap, Params}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.fpm.{AssociationRules => 
MLlibAssociationRules}
+import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+
+/**
+ * :: Experimental ::
+ *
+ * Generates association rules from frequent itemsets ("items", "freq"). 
This method only generates
+ * association rules which have a single item as the consequent.
+ */
+@Since("2.1.0")
+@Experimental
+class AssociationRules(override val uid: String) extends Params {
+
+  @Since("2.1.0")
+  def this() = this(Identifiable.randomUID("AssociationRules"))
+
+  /**
+   * Param for items column name. Items must be array of Integers.
+   * Default: "items"
+   * @group param
+   */
+  final val itemsCol: Param[String] = new Param[String](this, "itemsCol", 
"items column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getItemsCol: String = $(itemsCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setItemsCol(value: String): this.type = set(itemsCol, value)
+
+  /**
+   * Param for frequency column name. Data type should be Long.
+   * Default: "freq"
+   * @group param
+   */
+  final val freqCol: Param[String] = new Param[String](this, "freqCol", 
"frequency column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getFreqCol: String = $(freqCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setFreqCol(value: String): this.type = set(freqCol, value)
+
+  /**
+   * Param for minimum confidence, range [0.0, 1.0].
+   * @group param
+   */
+  final val minConfidence: DoubleParam = new DoubleParam(this, 
"minConfidence", "min confidence")
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getMinConfidence: Double = $(minConfidence)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setMinConfidence(value: Double): this.type = set(minConfidence, 
value)
+
+  setDefault(itemsCol -> "items", freqCol -> "freq", minConfidence -> 0.8)
+
+  /**
+   * Computes the association rules with confidence above 
[[minConfidence]].
+   * @param freqItemsets DataFrame containing frequent itemset obtained 
from algorithms like
+   * [[FPGrowth]]. Users can set itemsCol (frequent 
itemSet, Array[String])
+   * and freqCol (appearance count, Long) names in the 
DataFrame.
+   * @return a DataFrame("antecedent", "consequent", "confidence") 
containing the association
+* rules.
+   *
+   */
+  @Since("2.1.0")
+  def run(freqItemsets: Dataset[_]): DataFrame = {
--- End diff --

If inheriting `Transform`, here should be `override def transform(dataset: 
Dataset[_]): DataFrame`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94716281
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.{DoubleParam, Param, ParamMap, Params}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.fpm.{AssociationRules => 
MLlibAssociationRules}
+import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+
+/**
+ * :: Experimental ::
+ *
+ * Generates association rules from frequent itemsets ("items", "freq"). 
This method only generates
+ * association rules which have a single item as the consequent.
+ */
+@Since("2.1.0")
+@Experimental
+class AssociationRules(override val uid: String) extends Params {
--- End diff --

Since `AssociationRules` transform DataFrame `freqItemsets` to DataFrame 
`rules`, can it be a subclass of `Transformer`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16296
  
**[Test build #70904 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70904/testReport)**
 for PR 16296 at commit 
[`91d173d`](https://github.com/apache/spark/commit/91d173de5e6610ea0621053320208fa8c6597b40).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94716047
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.{DoubleParam, Param, ParamMap, Params}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.fpm.{AssociationRules => 
MLlibAssociationRules}
+import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+
+/**
+ * :: Experimental ::
+ *
+ * Generates association rules from frequent itemsets ("items", "freq"). 
This method only generates
+ * association rules which have a single item as the consequent.
+ */
+@Since("2.1.0")
+@Experimental
+class AssociationRules(override val uid: String) extends Params {
+
+  @Since("2.1.0")
+  def this() = this(Identifiable.randomUID("AssociationRules"))
+
+  /**
+   * Param for items column name. Items must be array of Integers.
+   * Default: "items"
+   * @group param
+   */
+  final val itemsCol: Param[String] = new Param[String](this, "itemsCol", 
"items column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getItemsCol: String = $(itemsCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setItemsCol(value: String): this.type = set(itemsCol, value)
+
+  /**
+   * Param for frequency column name. Data type should be Long.
+   * Default: "freq"
+   * @group param
+   */
+  final val freqCol: Param[String] = new Param[String](this, "freqCol", 
"frequency column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getFreqCol: String = $(freqCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setFreqCol(value: String): this.type = set(freqCol, value)
+
+  /**
+   * Param for minimum confidence, range [0.0, 1.0].
+   * @group param
+   */
+  final val minConfidence: DoubleParam = new DoubleParam(this, 
"minConfidence", "min confidence")
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getMinConfidence: Double = $(minConfidence)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setMinConfidence(value: Double): this.type = set(minConfidence, 
value)
+
+  setDefault(itemsCol -> "items", freqCol -> "freq", minConfidence -> 0.8)
+
+  /**
+   * Computes the association rules with confidence above 
[[minConfidence]].
+   * @param freqItemsets DataFrame containing frequent itemset obtained 
from algorithms like
+   * [[FPGrowth]]. Users can set itemsCol (frequent 
itemSet, Array[String])
+   * and freqCol (appearance count, Long) names in the 
DataFrame.
+   * @return a DataFrame("antecedent", "consequent", "confidence") 
containing the association
+* rules.
+   *
+   */
+  @Since("2.1.0")
+  def run(freqItemsets: Dataset[_]): DataFrame = {
+val freqItemSetRdd = freqItemsets.select($(itemsCol), $(freqCol)).rdd
+  .map(row => new FreqItemset(row.getSeq[String](0).toArray, 
row.getLong(1)))
+
+val sqlContext = SparkSession.builder().getOrCreate()
--- End diff --

Since val `sqlContext` is of type `SparkSession`, what about rename it 
`spark`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For 

[GitHub] spark issue #16472: [SPARK-18877][SQL][BACKPORT-2.0] `CSVInferSchema.inferFi...

2017-01-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16472
  
Thanks! Merged to Spark 2.0. 

Could you please close it? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94715820
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.{DoubleParam, Param, ParamMap, Params}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.fpm.{AssociationRules => 
MLlibAssociationRules}
+import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+
+/**
+ * :: Experimental ::
+ *
+ * Generates association rules from frequent itemsets ("items", "freq"). 
This method only generates
+ * association rules which have a single item as the consequent.
+ */
+@Since("2.1.0")
+@Experimental
+class AssociationRules(override val uid: String) extends Params {
+
+  @Since("2.1.0")
+  def this() = this(Identifiable.randomUID("AssociationRules"))
+
+  /**
+   * Param for items column name. Items must be array of Integers.
+   * Default: "items"
+   * @group param
+   */
+  final val itemsCol: Param[String] = new Param[String](this, "itemsCol", 
"items column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getItemsCol: String = $(itemsCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setItemsCol(value: String): this.type = set(itemsCol, value)
+
+  /**
+   * Param for frequency column name. Data type should be Long.
+   * Default: "freq"
+   * @group param
+   */
+  final val freqCol: Param[String] = new Param[String](this, "freqCol", 
"frequency column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getFreqCol: String = $(freqCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setFreqCol(value: String): this.type = set(freqCol, value)
+
+  /**
+   * Param for minimum confidence, range [0.0, 1.0].
+   * @group param
+   */
+  final val minConfidence: DoubleParam = new DoubleParam(this, 
"minConfidence", "min confidence")
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getMinConfidence: Double = $(minConfidence)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setMinConfidence(value: Double): this.type = set(minConfidence, 
value)
+
+  setDefault(itemsCol -> "items", freqCol -> "freq", minConfidence -> 0.8)
+
+  /**
+   * Computes the association rules with confidence above 
[[minConfidence]].
+   * @param freqItemsets DataFrame containing frequent itemset obtained 
from algorithms like
+   * [[FPGrowth]]. Users can set itemsCol (frequent 
itemSet, Array[String])
--- End diff --

`Array[String]` confilct with `Array[Int]` in 
https://github.com/apache/spark/pull/15415/files#diff-0a641720038f962d333ef38402a02207R41
and is there some way to support general types?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16451: [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all ident...

2017-01-04 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16451
  
I just checked each except for the one below is passed in Windows via 
concatenated 
[full-log](https://gist.github.com/HyukjinKwon/2d199ac9156c380015ad5a71f77866be).

It seems `DirectKafkaStreamSuite` is flaky due to the occasional failure of 
removing created temp directory (the same problem with 
https://github.com/apache/spark/pull/16451#discussion_r94562756 but that one is 
constantly failed)

It is sometimes passed 
[![PR-16451](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A7615F8B-58B0-4D9B-A914-32E7BF7DCB65=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A7615F8B-58B0-4D9B-A914-32E7BF7DCB65)
 but sometimes failed as below: 

```
DirectKafkaStreamSuite:
 Exception encountered when attempting to run a suite with class name: 
org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite *** ABORTED *** (59 
seconds, 626 milliseconds)
   java.io.IOException: Failed to delete: 
C:\projects\spark\target\tmp\spark-426107da-68cf-4d94-b0d6-1f428f1c53f6
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15415: [SPARK-14501][ML] spark.ml API for FPGrowth

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15415#discussion_r94715503
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/fpm/AssociationRules.scala ---
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.fpm
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.{DoubleParam, Param, ParamMap, Params}
+import org.apache.spark.ml.util.Identifiable
+import org.apache.spark.mllib.fpm.{AssociationRules => 
MLlibAssociationRules}
+import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
+import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
+
+/**
+ * :: Experimental ::
+ *
+ * Generates association rules from frequent itemsets ("items", "freq"). 
This method only generates
+ * association rules which have a single item as the consequent.
+ */
+@Since("2.1.0")
+@Experimental
+class AssociationRules(override val uid: String) extends Params {
+
+  @Since("2.1.0")
+  def this() = this(Identifiable.randomUID("AssociationRules"))
+
+  /**
+   * Param for items column name. Items must be array of Integers.
+   * Default: "items"
+   * @group param
+   */
+  final val itemsCol: Param[String] = new Param[String](this, "itemsCol", 
"items column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getItemsCol: String = $(itemsCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setItemsCol(value: String): this.type = set(itemsCol, value)
+
+  /**
+   * Param for frequency column name. Data type should be Long.
+   * Default: "freq"
+   * @group param
+   */
+  final val freqCol: Param[String] = new Param[String](this, "freqCol", 
"frequency column name")
+
+
+  /** @group getParam */
+  @Since("2.1.0")
+  final def getFreqCol: String = $(freqCol)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setFreqCol(value: String): this.type = set(freqCol, value)
+
+  /**
+   * Param for minimum confidence, range [0.0, 1.0].
+   * @group param
+   */
+  final val minConfidence: DoubleParam = new DoubleParam(this, 
"minConfidence", "min confidence")
--- End diff --

there should be a `ParamValidators.inRange(...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16308
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70896/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16308
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16308
  
**[Test build #70896 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70896/testReport)**
 for PR 16308 at commit 
[`5b6dd4f`](https://github.com/apache/spark/commit/5b6dd4f227e823937433c876fd6efa8d6fb75a0e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class IndexToString @Since(\"2.2.0\") (@Since(\"1.5.0\") override val 
uid: String)`
  * `abstract class Collect[T <: Growable[Any] with Iterable[Any]] extends 
TypedImperativeAggregate[T] `
  * `abstract class AggregateFunction extends Expression `
  * `case class Literal (value: Any, dataType: DataType) extends 
LeafExpression `
  * `case class LimitPushDown(conf: CatalystConf) extends Rule[LogicalPlan] 
`
  * `trait TypedAggregateExpression extends AggregateFunction `
  * `case class SimpleTypedAggregateExpression(`
  * `case class ComplexTypedAggregateExpression(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16401: [SPARK-18998] [SQL] Add a cbo conf to switch betw...

2017-01-04 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16401#discussion_r94714976
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -95,6 +96,29 @@ abstract class LogicalPlan extends 
QueryPlan[LogicalPlan] with Logging {
   }
 
   /**
+   * Returns the default statistics or statistics estimated by cbo based 
on configuration.
+   */
+  final def planStats(conf: CatalystConf): Statistics = {
+if (conf.cboEnabled) {
+  if (estimatedStats.isEmpty) {
+estimatedStats = Some(cboStatistics(conf))
+  }
+  estimatedStats.get
+} else {
+  statistics
+}
+  }
+
+  /**
+   * Returns statistics estimated by cbo. If the plan doesn't override 
this, it returns the
+   * default statistics.
+   */
+  protected def cboStatistics(conf: CatalystConf): Statistics = statistics
+
+  /** A cache for the estimated statistics, such that it will only be 
computed once. */
+  private var estimatedStats: Option[Statistics] = None
--- End diff --

`estimatedStats` is a cache for the `def cboStatistics()` where stats is 
calculated by using column stats. And because I want to pass conf into 
cboStatistics(some parameters may be needed during estimation in the future), I 
can't use a lazy val, and I do this cache explicitly with `estimatedStats`.
Yes, the naming causes ambiguity, can you give it a better name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/12135
  
@jkbradley  Updated. Thanks for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work for Parq...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16474
  
**[Test build #70903 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70903/testReport)**
 for PR 16474 at commit 
[`586b347`](https://github.com/apache/spark/commit/586b347b04b64ddf2b70e4fb16035f80ad5a400e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16474: [SPARK-19082][SQL] Make ignoreCorruptFiles work f...

2017-01-04 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/16474

[SPARK-19082][SQL] Make ignoreCorruptFiles work for Parquet

## What changes were proposed in this pull request?

We have a config `spark.sql.files.ignoreCorruptFiles` which can be used to 
ignore corrupt files when reading files in SQL. Currently the 
`ignoreCorruptFiles` config has two issues and can't work for Parquet:

1. We only ignore corrupt files in `FileScanRDD` . Actually, we begin to 
read those files as early as inferring data schema from the files. For corrupt 
files, we can't read the schema and fail the program. A related issue reported 
at 
http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tc20418.html
2. In `FileScanRDD`, we assume that we only begin to read the files when 
starting to consume the iterator. However, it is possibly the files are read 
before that. In this case, `ignoreCorruptFiles` config doesn't work too.

This patch targets Parquet datasource. If this direction is ok, we can 
address the same issue for other datasources like Orc.

Two main changes in this patch:

1. Replace `ParquetFileReader.readAllFootersInParallel` by implementing the 
logic to read footers in multi-threaded manner

We can't ignore corrupt files if we use 
`ParquetFileReader.readAllFootersInParallel`. So this patch implements the 
logic to do the similar thing in `readParquetFootersInParallel`.

2. In `FileScanRDD`, we need to ignore corrupt file too when we call 
`readFunction` to return iterator.

One thing to notice is:

We read schema from Parquet file's footer. The method to read footer 
`ParquetFileReader.readFooter` throws `RuntimeException`, instead of 
`IOException`, if it can't successfully read the footer. Please check out 
https://github.com/apache/parquet-mr/blob/df9d8e415436292ae33e1ca0b8da256640de9710/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L470.
 So this patch catches `RuntimeException`.  One concern is that it might also 
shadow other runtime exceptions other than reading corrupt files.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 
fix-ignorecorrupted-parquet-files

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16474.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16474


commit 586b347b04b64ddf2b70e4fb16035f80ad5a400e
Author: Liang-Chi Hsieh 
Date:   2017-01-05T04:02:13Z

Make ignoreCorruptFiles work for Parquet.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12135: [SPARK-14352][SQL] approxQuantile should support multi c...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12135
  
**[Test build #70902 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70902/testReport)**
 for PR 12135 at commit 
[`6517f21`](https://github.com/apache/spark/commit/6517f2186417fce1d0fcde1d5e8c561d686dcef0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2017-01-04 Thread azmras
Github user azmras commented on the issue:

https://github.com/apache/spark/pull/16429
  
@cxww107
Try to update both patched files in the following locations
/usr/local/Cellar/apache-spark/2.1.0/libexec/python/pyspark
/usr/local/Cellar/apache-spark/2.1.0/libexec/python/lib/pyspark.zip
See if it works..

Thanks



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #70901 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70901/testReport)**
 for PR 15671 at commit 
[`c8693d8`](https://github.com/apache/spark/commit/c8693d870373978b4a43b2215a1b487215107d45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15671
  
@jkbradley Update according to your comments, including adding 
`quantileProbabilities` and `docConcentration`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,L...

2017-01-04 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/15671#discussion_r94712844
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala ---
@@ -905,7 +911,10 @@ class LDA @Since("1.6.0") (
   case m: OldDistributedLDAModel =>
 new DistributedLDAModel(uid, m.vocabSize, m, dataset.sparkSession, 
None)
 }
-copyValues(newModel).setParent(this)
+
+val model = copyValues(newModel).setParent(this)
--- End diff --

ok, I'll add it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...

2017-01-04 Thread chpritchard-expedia
Github user chpritchard-expedia commented on the issue:

https://github.com/apache/spark/pull/16347
  
@junegunn I ran into the same issue, using partitionBy; missed it 
completely during my testing. Would you share the workaround you used? I wasn't 
able to understand it from your Apache JIRA posting. At present I'm thinking 
about using mapPartitions and writing each partition out from there.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...

2017-01-04 Thread lirui-intel
Github user lirui-intel commented on the issue:

https://github.com/apache/spark/pull/12775
  
I think the failure is due to one more [skipped 
test](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70890/testReport/pyspark.sql.tests/HiveContextSQLTests/test_unbounded_frames/).
 The skip message is 
> Unittest < 3.3 doesn't support mocking

Any idea how that can be related to this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >