[GitHub] spark issue #19259: [BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Docker tes...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19259
  
LGTM pending test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17819
  
**[Test build #81867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81867/testReport)**
 for PR 17819 at commit 
[`7c38b77`](https://github.com/apache/spark/commit/7c38b77c30747e316328afbe25cc8bff1a51cd40).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17819
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17819
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17819
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81864/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17819
  
**[Test build #81864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81864/testReport)**
 for PR 17819 at commit 
[`7c38b77`](https://github.com/apache/spark/commit/7c38b77c30747e316328afbe25cc8bff1a51cd40).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19259: [BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Docker tes...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19259
  
**[Test build #81866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81866/testReport)**
 for PR 19259 at commit 
[`315e389`](https://github.com/apache/spark/commit/315e3898e5350121dbd7bb09ae67eb09730f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12646: [SPARK-14878][SQL] Trim characters string function suppo...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12646
  
**[Test build #81865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81865/testReport)**
 for PR 12646 at commit 
[`79846bf`](https://github.com/apache/spark/commit/79846bfd86c6265ebe7d14906853bfe2ec0467f3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/19256
  
Thanks @gatorsmile 
Does the following logic look good to you?
```
if(any dialect's isCascadingTruncateTable returns true)
  return Some(true)
else
  if (any dialect's isCascadingTruncateTable returns false)
 return Some(false)
  else // None of the dialect implements this method, return the superclass 
default value
 return None
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r139337054
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/AggregateFieldExtractionPushdown.scala
 ---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, 
NamedExpression}
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
LogicalPlan, Project}
+
+/**
+ * Pushes down aliases to [[expressions.GetStructField]] expressions in an 
aggregate's grouping and
+ * aggregate expressions into a projection over its children. The original
+ * [[expressions.GetStructField]] expressions are replaced with references 
to the pushed down
+ * aliases.
+ */
+object AggregateFieldExtractionPushdown extends FieldExtractionPushdown {
+  override def apply(plan: LogicalPlan): LogicalPlan =
+plan transformDown {
+  case agg @ Aggregate(groupingExpressions, aggregateExpressions, 
child) =>
+val expressions = groupingExpressions ++ aggregateExpressions
+val attributes = AttributeSet(expressions.collect { case att: 
Attribute => att })
+val childAttributes = AttributeSet(child.expressions)
+val fieldExtractors0 =
+  expressions
+.flatMap(getFieldExtractors)
+.distinct
+val fieldExtractors1 =
+  fieldExtractors0
+.filter(_.collectFirst { case att: Attribute => att }
+  .filter(attributes.contains).isEmpty)
--- End diff --

Why do we need to trim the extractors which contain attributes referred 
from `groupingExpressions ++ aggregateExpressions`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19234: [SPARK-22010][PySpark] Change fromInternal method of Tim...

2017-09-17 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19234
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r139336399
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/JoinFieldExtractionPushdown.scala
 ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, 
NamedExpression}
+import org.apache.spark.sql.catalyst.planning.PhysicalOperation
+import org.apache.spark.sql.catalyst.plans.logical.{Join, LogicalPlan, 
Project}
+
+/**
+ * Pushes down aliases to [[expressions.GetStructField]] expressions in a 
projection over a join
+ * and its join condition. The original [[expressions.GetStructField]] 
expressions are replaced
+ * with references to the pushed down aliases.
+ */
+object JoinFieldExtractionPushdown extends FieldExtractionPushdown {
--- End diff --

This applies generally. But for the non-nested column pruning cases, this 
seems a burden and making the query plan complicated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19260: [SPARK-22043][PYTHON] Improves error message for ...

2017-09-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19260


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19260: [SPARK-22043][PYTHON] Improves error message for show_pr...

2017-09-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19260
  
Merged to master, branch-2.2 and branch-2.1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19260: [SPARK-22043][PYTHON] Improves error message for show_pr...

2017-09-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19260
  
Thanks @maver1ck, @felixcheung and @viirya.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion

2017-09-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19249
  
Thanks for double checking @ueshin.

Yes, I noticed that too while reviewing it. I just decided to merge it as 
is because I am quite sure of this one given struct type is the root type and 
this case looks quite common, and regarding that it looks the first 
contribution. Even though this one has a downside, practically the improvement 
looked better.

I am also fine with doing this for others too (I am +0 for other types).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15544
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81863/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15544
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15544
  
**[Test build #81863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81863/testReport)**
 for PR 15544 at commit 
[`0e9a8dd`](https://github.com/apache/spark/commit/0e9a8dd4004c3331271105e4e9a3df963c3bc84b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion

2017-09-17 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19249
  
A late LGTM. Btw, can we use the same idea for `MapType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17819
  
**[Test build #81864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81864/testReport)**
 for PR 17819 at commit 
[`7c38b77`](https://github.com/apache/spark/commit/7c38b77c30747e316328afbe25cc8bff1a51cd40).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19230
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15544
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81862/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15544
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15544
  
**[Test build #81862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81862/testReport)**
 for PR 15544 at commit 
[`14bf25a`](https://github.com/apache/spark/commit/14bf25a8c3d18b0bdd2c2365b3471f2f3d2a453d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19230
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81861/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19230
  
**[Test build #81861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81861/testReport)**
 for PR 19230 at commit 
[`5ea4e89`](https://github.com/apache/spark/commit/5ea4e8985002dd26922376d672af4db052063909).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19230
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19259: [BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Doc...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19259#discussion_r139333133
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
---
@@ -913,6 +913,25 @@ class JDBCSuite extends SparkFunSuite
   "dbtable" -> "t1",
   "numPartitions" -> "10")
 assert(new JDBCOptions(parameters).asConnectionProperties.isEmpty)
-assert(new JDBCOptions(new 
CaseInsensitiveMap(parameters)).asConnectionProperties.isEmpty)
+assert(new 
JDBCOptions(CaseInsensitiveMap(parameters)).asConnectionProperties.isEmpty)
+  }
+
+  test("SPARK-19318: jdbc data source options should be treated 
case-insensitive.") {
--- End diff --

Where is the test case `test("SPARK-19318: Connection properties keys 
should be case-sensitive.")`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r139333125
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
 ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.planning.{PhysicalOperation, 
ProjectionOverSchema, SelectedField}
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Prunes unnecessary Parquet columns given a [[PhysicalOperation]] over a
+ * [[ParquetRelation]]. By "Parquet column", we mean a column as defined 
in the
+ * Parquet format. In Spark SQL, a root-level Parquet column corresponds 
to a
+ * SQL column, and a nested Parquet column corresponds to a 
[[StructField]].
+ */
+private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan =
+plan transformDown {
+  case op @ PhysicalOperation(projects, filters,
+  l @ LogicalRelation(hadoopFsRelation @ HadoopFsRelation(_, 
partitionSchema,
+dataSchema, _, parquetFormat: ParquetFileFormat, _), _, _)) =>
+val projectionFields = projects.flatMap(getFields)
+val filterFields = filters.flatMap(getFields)
+val requestedFields = (projectionFields ++ filterFields).distinct
+
+// If [[requestedFields]] includes a proper field, continue. 
Otherwise,
+// return [[op]]
+if (requestedFields.exists { case (_, optAtt) => optAtt.isEmpty }) 
{
+  val prunedSchema = requestedFields
+.map { case (field, _) => StructType(Array(field)) }
+.reduceLeft(_ merge _)
--- End diff --

We should add related tests for such cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19259: [BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Doc...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19259#discussion_r139333048
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -63,15 +63,40 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
   }
 
   override def dataPreparation(conn: Connection): Unit = {
-conn.prepareStatement("CREATE TABLE numerics (b DECIMAL(1), f 
DECIMAL(3, 2), i DECIMAL(10))").executeUpdate();
+conn.prepareStatement("CREATE TABLE datetime (id NUMBER(10), d DATE, t 
TIMESTAMP)")
+  .executeUpdate()
 conn.prepareStatement(
-  "INSERT INTO numerics VALUES (4, 1.23, 99)").executeUpdate();
-conn.commit();
+  """INSERT INTO datetime VALUES
+|(1, {d '1991-11-09'}, {ts '1996-01-01 01:23:45'})
+  """.stripMargin.replaceAll("\n", " ")).executeUpdate()
+conn.commit()
+
+sql(
+  s"""
+ |CREATE TEMPORARY VIEW datetime
+ |USING org.apache.spark.sql.jdbc
+ |OPTIONS (url '$jdbcUrl', dbTable 'datetime', 
oracle.jdbc.mapDateToTimestamp 'false')
+  """.stripMargin.replaceAll("\n", " "))
+
+conn.prepareStatement("CREATE TABLE datetime1 (id NUMBER(10), d DATE, 
t TIMESTAMP)")
+  .executeUpdate()
+conn.commit()
+
+sql(
+  s"""
+ |CREATE TEMPORARY VIEW datetime1
+ |USING org.apache.spark.sql.jdbc
+ |OPTIONS (url '$jdbcUrl', dbTable 'datetime1', 
oracle.jdbc.mapDateToTimestamp 'false')
+  """.stripMargin.replaceAll("\n", " "))
+
+conn.prepareStatement("CREATE TABLE numerics (b DECIMAL(1), f 
DECIMAL(3, 2), i DECIMAL(10))").executeUpdate()
+conn.prepareStatement(
+  "INSERT INTO numerics VALUES (4, 1.23, 99)").executeUpdate()
+conn.commit()
   }
 
-
-  test("SPARK-16625 : Importing Oracle numeric types") { 
-val df = sqlContext.read.jdbc(jdbcUrl, "numerics", new Properties);
+  test("SPARK-16625 : Importing Oracle numeric types") {
+val df = sqlContext.read.jdbc(jdbcUrl, "numerics", new Properties)
--- End diff --

revert this back


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19259: [BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Doc...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19259#discussion_r139333045
  
--- Diff: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala
 ---
@@ -82,7 +107,7 @@ class OracleIntegrationSuite extends 
DockerJDBCIntegrationSuite with SharedSQLCo
 // A value with fractions from DECIMAL(3, 2) is correct:
 assert(row.getDecimal(1).compareTo(BigDecimal.valueOf(1.23)) == 0)
 // A value > Int.MaxValue from DECIMAL(10) is correct:
-assert(row.getDecimal(2).compareTo(BigDecimal.valueOf(99l)) == 
0)
+assert(row.getDecimal(2).compareTo(BigDecimal.valueOf(99L)) == 
0)
--- End diff --

revert this back


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19196
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19196
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81860/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19196
  
**[Test build #81860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81860/testReport)**
 for PR 19196 at commit 
[`be39125`](https://github.com/apache/spark/commit/be3912552d200fced24f69c436b59a2c24639380).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r139331293
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
@@ -63,9 +74,22 @@ private[parquet] class ParquetReadSupport extends 
ReadSupport[UnsafeRow] with Lo
   StructType.fromString(schemaString)
 }
 
-val parquetRequestedSchema =
+val clippedParquetSchema =
   ParquetReadSupport.clipParquetSchema(context.getFileSchema, 
catalystRequestedSchema)
 
+val parquetRequestedSchema = if (parquetMrCompatibility) {
+  // Parquet-mr will throw an exception if we try to read a superset 
of the file's schema.
+  // Therefore, we intersect our clipped schema with the underlying 
file's schema
--- End diff --

`clipParquetSchema` can add column paths only exist in catalyst schema into 
parquet schema. I think those columns are useful but looks like the 
`intersectParquetGroups` will remove them again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r139331198
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
 ---
@@ -63,9 +74,22 @@ private[parquet] class ParquetReadSupport extends 
ReadSupport[UnsafeRow] with Lo
   StructType.fromString(schemaString)
 }
 
-val parquetRequestedSchema =
+val clippedParquetSchema =
   ParquetReadSupport.clipParquetSchema(context.getFileSchema, 
catalystRequestedSchema)
 
+val parquetRequestedSchema = if (parquetMrCompatibility) {
+  // Parquet-mr will throw an exception if we try to read a superset 
of the file's schema.
+  // Therefore, we intersect our clipped schema with the underlying 
file's schema
+  ParquetReadSupport.intersectParquetGroups(clippedParquetSchema, 
context.getFileSchema)
+.map(intersectionGroup =>
+  new MessageType(intersectionGroup.getName, 
intersectionGroup.getFields))
+.getOrElse(ParquetSchemaConverter.EMPTY_MESSAGE)
+} else {
+  // Spark's built-in Parquet reader will throw an exception in some 
cases if the requested
+  // schema is not the same as the clipped schema
--- End diff --

I think the built-in Parquet reader means vectorized reader. But I think we 
don't use `ParquetReadSupport` in vectorized reader?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19164: [SPARK-21953] Show both memory and disk bytes spi...

2017-09-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19164


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19221: [SPARK-4131] Merge HiveTmpFile.scala to SaveAsHiveFile.s...

2017-09-17 Thread aoxiangcao
Github user aoxiangcao commented on the issue:

https://github.com/apache/spark/pull/19221
  
i'm sorry if it right to ask here .i have some question when use 'insert 
overwrite directory /user/appUser/test'. first, i start the thrift by 'hdfs', 
and login into beeline with user 'appUser'. then, i run sql 'insert overwrite 
directory ' , the job cannot move data from staging folder to destination 
folder. because the result file is own by 'hdfs',the folder 
'/user/appUser/test' is own by appUser. so permission denied. is something 
wrong?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19164: [SPARK-21953] Show both memory and disk bytes spilled if...

2017-09-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19164
  
LGTM, merging to master/2.2/2.1!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15544
  
**[Test build #81863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81863/testReport)**
 for PR 15544 at commit 
[`0e9a8dd`](https://github.com/apache/spark/commit/0e9a8dd4004c3331271105e4e9a3df963c3bc84b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15544: [SPARK-17997] [SQL] Add an aggregation function for coun...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15544
  
**[Test build #81862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81862/testReport)**
 for PR 15544 at commit 
[`14bf25a`](https://github.com/apache/spark/commit/14bf25a8c3d18b0bdd2c2365b3471f2f3d2a453d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19230
  
**[Test build #81861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81861/testReport)**
 for PR 19230 at commit 
[`5ea4e89`](https://github.com/apache/spark/commit/5ea4e8985002dd26922376d672af4db052063909).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19230: [SPARK-22003][SQL] support array column in vectorized re...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19230
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...

2017-09-17 Thread liufengdb
Github user liufengdb commented on a diff in the pull request:

https://github.com/apache/spark/pull/19230#discussion_r139329522
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -16,6 +16,7 @@
  */
 package org.apache.spark.sql.execution.vectorized;
 
+import org.apache.spark.api.java.function.Function;
--- End diff --

oops, reverted it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...

2017-09-17 Thread liufengdb
Github user liufengdb commented on a diff in the pull request:

https://github.com/apache/spark/pull/19230#discussion_r139329523
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.vectorized
+
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.util.ArrayData
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class ColumnVectorSuite extends SparkFunSuite with BeforeAndAfterEach {
+
+  var testVector: WritableColumnVector = _
+
+  private def allocate(capacity: Int, dt: DataType): WritableColumnVector 
= {
+new OnHeapColumnVector(capacity, dt)
+  }
+
+  override def afterEach(): Unit = {
+testVector.close()
+  }
+
+  test("boolean") {
+testVector = allocate(10, BooleanType)
+(0 until 10).foreach { i =>
+  testVector.appendBoolean(i % 2 == 0)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, BooleanType) === (i % 2 == 0))
+}
+  }
+
+  test("byte") {
+testVector = allocate(10, ByteType)
+(0 until 10).foreach { i =>
+  testVector.appendByte(i.toByte)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, ByteType) === (i.toByte))
+}
+  }
+
+  test("short") {
+testVector = allocate(10, ShortType)
+(0 until 10).foreach { i =>
+  testVector.appendShort(i.toShort)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, ShortType) === (i.toShort))
+}
+  }
+
+  test("int") {
+testVector = allocate(10, IntegerType)
+(0 until 10).foreach { i =>
+  testVector.appendInt(i)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, IntegerType) === i)
+}
+  }
+
+  test("long") {
+testVector = allocate(10, LongType)
+(0 until 10).foreach { i =>
+  testVector.appendLong(i)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, LongType) === i)
+}
+  }
+
+  test("float") {
+testVector = allocate(10, FloatType)
+(0 until 10).foreach { i =>
+  testVector.appendFloat(i.toFloat)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, FloatType) === i.toFloat)
+}
+  }
+
+  test("double") {
+testVector = allocate(10, DoubleType)
+(0 until 10).foreach { i =>
+  testVector.appendDouble(i.toDouble)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, DoubleType) === i.toDouble)
+}
+  }
+
+  test("string") {
+testVector = allocate(10, StringType)
+(0 until 10).map { i =>
+  val utf8 = s"str$i".getBytes("utf8")
+  testVector.appendByteArray(utf8, 0, utf8.length)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, StringType) === UTF8String.fromString(s"str$i"))
+}
+  }
+
+  test("binary") {
+testVector = allocate(10, BinaryType)
+(0 until 10).map { i =>
+  val utf8 = s"str$i".getBytes("utf8")
+  testVector.appendByteArray(utf8, 0, utf8.length)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+

[GitHub] spark issue #19260: [SPARK-22043][PYTHON] Improves error message for show_pr...

2017-09-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19260
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19196
  
**[Test build #81860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81860/testReport)**
 for PR 19196 at commit 
[`be39125`](https://github.com/apache/spark/commit/be3912552d200fced24f69c436b59a2c24639380).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-17 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15544#discussion_r139327658
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala
 ---
@@ -0,0 +1,248 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions.aggregate
+
+import java.util
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckFailure, 
TypeCheckSuccess}
+import org.apache.spark.sql.catalyst.expressions.{AttributeReference, 
ExpectsInputTypes, Expression, ExpressionDescription}
+import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, 
HyperLogLogPlusPlusHelper}
+import org.apache.spark.sql.types._
+
+/**
+ * This function counts the approximate number of distinct values (ndv) in
+ * intervals constructed from endpoints specified in 
`endpointsExpression`. The endpoints will be
+ * sorted into ascending order. To count ndv's in these intervals, apply 
the HyperLogLogPlusPlus
+ * algorithm in each of them.
+ * @param child to estimate the ndv's of.
+ * @param endpointsExpression to construct the intervals.
+ * @param relativeSD The maximum estimation error allowed in the 
HyperLogLogPlusPlus algorithm.
+ */
+@ExpressionDescription(
+  usage = """
+_FUNC_(col, array(endpoint_1, endpoint_2, ... endpoint_N)) - Returns 
the approximate
+  number of distinct values (ndv) for intervals [endpoint_1, 
endpoint_2],
+  (endpoint_2, endpoint_3], ... (endpoint_N-1, endpoint_N].
+
+_FUNC_(col, array(endpoint_1, endpoint_2, ... endpoint_N), 
relativeSD=0.05) - Returns
+  the approximate number of distinct values (ndv) for intervals with 
relativeSD, the maximum
+  estimation error allowed in the HyperLogLogPlusPlus algorithm.
+  """,
+  extended = """
+Examples:
+  > SELECT approx_count_distinct_for_intervals(10.0, array(5, 15, 25), 
0.01);
+   [1, 0]
+  """)
+case class ApproxCountDistinctForIntervals(
+child: Expression,
+endpointsExpression: Expression,
+relativeSD: Double = 0.05,
+mutableAggBufferOffset: Int = 0,
+inputAggBufferOffset: Int = 0)
+  extends ImperativeAggregate with ExpectsInputTypes {
+
+  def this(child: Expression, endpointsExpression: Expression) = {
+this(
+  child = child,
+  endpointsExpression = endpointsExpression,
+  relativeSD = 0.05,
+  mutableAggBufferOffset = 0,
+  inputAggBufferOffset = 0)
+  }
+
+  def this(child: Expression, endpointsExpression: Expression, relativeSD: 
Expression) = {
+this(
+  child = child,
+  endpointsExpression = endpointsExpression,
+  relativeSD = HyperLogLogPlusPlus.validateDoubleLiteral(relativeSD),
+  mutableAggBufferOffset = 0,
+  inputAggBufferOffset = 0)
+  }
+
+  override def inputTypes: Seq[AbstractDataType] = {
+Seq(TypeCollection(NumericType, TimestampType, DateType), ArrayType)
+  }
+
+  // Mark as lazy so that endpointsExpression is not evaluated during tree 
transformation.
+  lazy val endpoints: Array[Double] = {
+val doubleArray = (endpointsExpression.dataType, 
endpointsExpression.eval()) match {
+  case (ArrayType(baseType: NumericType, _), arrayData: ArrayData) =>
+val numericArray = arrayData.toObjectArray(baseType)
+numericArray.map { x =>
+  baseType.numeric.toDouble(x.asInstanceOf[baseType.InternalType])
+}
+}
+util.Arrays.sort(doubleArray)
+doubleArray
+  }
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val defaultCheck = super.checkInputDataTypes()
+if (defaultCheck.isFailure) {
+  defaultCheck
+} else if 

[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19252
  
This is not a bug. We just follow [the behavior of Hive's dynamic partition 
insert](https://docs.databricks.com/spark/latest/spark-sql/language-manual/insert.html#dynamic-partition-inserts).
 

> The dynamic partition columns must be specified last in both part_spec 
and the input result set (of the row value lists or the select query). They are 
resolved by position, instead of by names. Thus, the orders must be exactly 
matched.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19258: add MockNetCat

2017-09-17 Thread bluejoe2008
Github user bluejoe2008 closed the pull request at:

https://github.com/apache/spark/pull/19258


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...

2017-09-17 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19230#discussion_r139324826
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.vectorized
+
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.util.ArrayData
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class ColumnVectorSuite extends SparkFunSuite with BeforeAndAfterEach {
+
+  var testVector: WritableColumnVector = _
+
+  private def allocate(capacity: Int, dt: DataType): WritableColumnVector 
= {
+new OnHeapColumnVector(capacity, dt)
+  }
+
+  override def afterEach(): Unit = {
+testVector.close()
+  }
+
+  test("boolean") {
+testVector = allocate(10, BooleanType)
+(0 until 10).foreach { i =>
+  testVector.appendBoolean(i % 2 == 0)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, BooleanType) === (i % 2 == 0))
+}
+  }
+
+  test("byte") {
+testVector = allocate(10, ByteType)
+(0 until 10).foreach { i =>
+  testVector.appendByte(i.toByte)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, ByteType) === (i.toByte))
+}
+  }
+
+  test("short") {
+testVector = allocate(10, ShortType)
+(0 until 10).foreach { i =>
+  testVector.appendShort(i.toShort)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, ShortType) === (i.toShort))
+}
+  }
+
+  test("int") {
+testVector = allocate(10, IntegerType)
+(0 until 10).foreach { i =>
+  testVector.appendInt(i)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, IntegerType) === i)
+}
+  }
+
+  test("long") {
+testVector = allocate(10, LongType)
+(0 until 10).foreach { i =>
+  testVector.appendLong(i)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, LongType) === i)
+}
+  }
+
+  test("float") {
+testVector = allocate(10, FloatType)
+(0 until 10).foreach { i =>
+  testVector.appendFloat(i.toFloat)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, FloatType) === i.toFloat)
+}
+  }
+
+  test("double") {
+testVector = allocate(10, DoubleType)
+(0 until 10).foreach { i =>
+  testVector.appendDouble(i.toDouble)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, DoubleType) === i.toDouble)
+}
+  }
+
+  test("string") {
+testVector = allocate(10, StringType)
+(0 until 10).map { i =>
+  val utf8 = s"str$i".getBytes("utf8")
+  testVector.appendByteArray(utf8, 0, utf8.length)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 until 10).foreach { i =>
+  assert(array.get(i, StringType) === UTF8String.fromString(s"str$i"))
+}
+  }
+
+  test("binary") {
+testVector = allocate(10, BinaryType)
+(0 until 10).map { i =>
+  val utf8 = s"str$i".getBytes("utf8")
+  testVector.appendByteArray(utf8, 0, utf8.length)
+}
+
+val array = new ColumnVector.Array(testVector)
+
+(0 

[GitHub] spark issue #19259: [BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Docker tes...

2017-09-17 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/19259
  
@gatorsmile Yes, Docker integration tests passed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19263: Optionally add block updates to log

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19263
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19263: Optionally add block updates to log

2017-09-17 Thread michaelmior
GitHub user michaelmior opened a pull request:

https://github.com/apache/spark/pull/19263

Optionally add block updates to log

I see that block updates are not logged to the event log.
This makes sense as a default for performance reasons.
However, I find it helpful when trying to get a better understanding of 
caching for a job to be able to log these updates.
This PR adds a configuration setting `spark.eventLog.blockUpdates` 
(defaulting to false) which allows block updates to be recorded in the log.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/michaelmior/spark log-block-updates

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19263


commit 2a967c368cf72316b760c5f17503658aa12b901d
Author: Michael Mior 
Date:   2017-07-12T13:32:24Z

Optionally add block updates to log




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-09-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19218
  
Sorry, guys. I've been away from keyboard since last Friday night. I'll be 
back on next Tuesday (PST).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...

2017-09-17 Thread aokolnychyi
Github user aokolnychyi commented on the issue:

https://github.com/apache/spark/pull/19252
  
@gatorsmile thanks for the feedback. I also covered 
``TruncateTableCommand`` with additional tests. However, I see a bit strange 
behavior while creating a test for ``AlterTableAddPartitionCommand ``.

```
sql(s"CREATE TABLE t1 (col1 int, col2 int) USING PARQUET")
sql(s"INSERT INTO TABLE t1 SELECT 1, 2")
sql(s"INSERT INTO TABLE t1 SELECT 2, 4")
sql("SELECT * FROM t1").show()
+++
|col1|col2|
+++
|   1|   2|
|   2|   4|
+++

sql(s"CREATE TABLE t2 (col1 int, col2 int) USING PARQUET PARTITIONED BY 
(col1)")
sql(s"INSERT INTO TABLE t2 SELECT 1, 2")
sql(s"INSERT INTO TABLE t2 SELECT 2, 4")
sql("SELECT * FROM t2").show()
+++
|col2|col1|
+++
|   2|   4|
|   1|   2|
+++
```

Why are the results different? Is it a bug?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19243: [SPARK-21780][R] Simpler Dataset.sample API in R

2017-09-17 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19243
  
thinking about this, I wonder if it is more common in R to skip param with 
default values and the rest of param by names, like `sample(df, fraction=1.0)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19256
  
Let me correct what I said above. The logics should be 
> If any dialect's `isCascadingTruncateTable` returns `true`, we should 
return `true`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/19256
  
Thanks @gatorsmile 
I will change both the implementation and the PR title. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19256
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81859/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19256
  
**[Test build #81859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81859/testReport)**
 for PR 19256 at commit 
[`7b9d6fc`](https://github.com/apache/spark/commit/7b9d6fcec61cd72ffb257abf17eb0e9d6c462f9e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19256
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19252
  
Actually, the right fix should add `refreshTable(identifier)` to the 
SessionCatalog's [alterTableStats 
API](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L379).




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats ...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19252#discussion_r139317292
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala
 ---
@@ -44,6 +44,7 @@ object CommandUtils extends Logging {
   } else {
 catalog.alterTableStats(table.identifier, None)
   }
+  catalog.refreshTable(table.identifier)
--- End diff --

Add a comment above this line:
> Invalidate the table relation cache


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19260: [SPARK-22043][PYTHON] Improves error message for show_pr...

2017-09-17 Thread maver1ck
Github user maver1ck commented on the issue:

https://github.com/apache/spark/pull/19260
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats should ...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19252
  
`TruncateTableCommand` also has similar issues. Could you also fix it in 
this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19260: [SPARK-22043][PYTHON] Improves error message for show_pr...

2017-09-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19260
  
cc @maver1ck and @viirya who I believe recently ran Python profile. Could 
you take a look please when you have some time?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...

2017-09-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19226
  
Merged to master, branch-2.2 and branch-2.1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19226: [SPARK-21985][PySpark] PairDeserializer is broken...

2017-09-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19226


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19249: [SPARK-22032][PySpark] Speed up StructType conver...

2017-09-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19249


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19249: [SPARK-22032][PySpark] Speed up StructType conversion

2017-09-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19249
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19230: [SPARK-22003][SQL] support array column in vector...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19230#discussion_r139316092
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java
 ---
@@ -16,6 +16,7 @@
  */
 package org.apache.spark.sql.execution.vectorized;
 
+import org.apache.spark.api.java.function.Function;
--- End diff --

Please revert it back. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19259: [BACKPORT-2.1][SPARK-19318][SPARK-22041][SQL] Docker tes...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19259
  
Have you run the docker test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19261: [SPARK-22040] Add current_date function with timezone id

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19261
  
Any other database has such an interface?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19256
  
BTW, could you update the PR title?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19256: [SPARK-21338][SQL]implement isCascadingTruncateTa...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19256#discussion_r139315660
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala ---
@@ -41,4 +41,8 @@ private class AggregatedDialect(dialects: 
List[JdbcDialect]) extends JdbcDialect
   override def getJDBCType(dt: DataType): Option[JdbcType] = {
 dialects.flatMap(_.getJDBCType(dt)).headOption
   }
+
+  override def isCascadingTruncateTable(): Option[Boolean] = {
+dialects.flatMap(_.isCascadingTruncateTable).headOption
--- End diff --

This one is different from `getCatalystType ` and `getJDBCType `. Instead, 
we should follow 
[canHandle](https://github.com/huaxingao/spark/blob/7b9d6fcec61cd72ffb257abf17eb0e9d6c462f9e/sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala#L33-L34).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19256
  
**[Test build #81859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81859/testReport)**
 for PR 19256 at commit 
[`7b9d6fc`](https://github.com/apache/spark/commit/7b9d6fcec61cd72ffb257abf17eb0e9d6c462f9e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19256: [SPARK-21338][SQL]implement isCascadingTruncateTable() m...

2017-09-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19256
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19204
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19204
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81858/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19204
  
**[Test build #81858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81858/testReport)**
 for PR 19204 at commit 
[`a980e6b`](https://github.com/apache/spark/commit/a980e6bfbbeb1da9ce674d69b9cb537b7204459a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19204
  
**[Test build #81858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81858/testReport)**
 for PR 19204 at commit 
[`a980e6b`](https://github.com/apache/spark/commit/a980e6bfbbeb1da9ce674d69b9cb537b7204459a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19226
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19226
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81856/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19226
  
**[Test build #81856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81856/testReport)**
 for PR 19226 at commit 
[`5282ee5`](https://github.com/apache/spark/commit/5282ee5a8624d313ec5995db4d772ec1170ea049).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19262: [MINOR][ML] Remove unnecessary default value setting for...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19262
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19262: [MINOR][ML] Remove unnecessary default value setting for...

2017-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19262
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81857/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19262: [MINOR][ML] Remove unnecessary default value setting for...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19262
  
**[Test build #81857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81857/testReport)**
 for PR 19262 at commit 
[`84c6c9e`](https://github.com/apache/spark/commit/84c6c9e394939fa97a6d3888547ac269b234a2c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19256: [SPARK-21338][SQL]implement isCascadingTruncateTa...

2017-09-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19256#discussion_r139313120
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala ---
@@ -41,4 +41,8 @@ private class AggregatedDialect(dialects: 
List[JdbcDialect]) extends JdbcDialect
   override def getJDBCType(dt: DataType): Option[JdbcType] = {
 dialects.flatMap(_.getJDBCType(dt)).headOption
   }
+
+  override def isCascadingTruncateTable(): Option[Boolean] = {
+dialects.flatMap(_.isCascadingTruncateTable).headOption
--- End diff --

Both getCatalystType and getJDBCType(dt: DataType) use the first one. Also, 
in the class header, it has the following:
/**
 * AggregatedDialect can unify multiple dialects into one virtual Dialect.
 * Dialects are tried in order, and the first dialect that does not return a
 * neutral element will will.
 *
 * @param dialects List of dialects.
 */
It has a typo, I guess it means "the first dialect that does not return a
 neutral element will return"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18193: [SPARK-15616] [SQL] CatalogRelation should fallba...

2017-09-17 Thread cenyuhai
Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/spark/pull/18193#discussion_r139312866
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -139,6 +138,54 @@ class DetermineTableStats(session: SparkSession) 
extends Rule[LogicalPlan] {
 }
 
 /**
+ *
+ * TODO: merge this with PruneFileSourcePartitions after we completely 
make hive as a data source.
+ */
+case class PruneHiveTablePartitions(
+session: SparkSession) extends Rule[LogicalPlan] with PredicateHelper {
+  override def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
+case filter @ Filter(condition, relation: HiveTableRelation) if 
relation.isPartitioned =>
+  val predicates = splitConjunctivePredicates(condition)
+  val normalizedFilters = predicates.map { e =>
+e transform {
+  case a: AttributeReference =>
+a.withName(relation.output.find(_.semanticEquals(a)).get.name)
+}
+  }
+  val partitionSet = AttributeSet(relation.partitionCols)
+  val pruningPredicates = normalizedFilters.filter { predicate =>
+!predicate.references.isEmpty &&
+  predicate.references.subsetOf(partitionSet)
+  }
+  if (pruningPredicates.nonEmpty && 
session.sessionState.conf.fallBackToHdfsForStatsEnabled &&
+session.sessionState.conf.metastorePartitionPruning) {
+val prunedPartitions = 
session.sharedState.externalCatalog.listPartitionsByFilter(
+  relation.tableMeta.database,
+  relation.tableMeta.identifier.table,
+  pruningPredicates,
+  session.sessionState.conf.sessionLocalTimeZone)
+val sizeInBytes = try {
+  prunedPartitions.map { part =>
+CommandUtils.calculateLocationSize(
--- End diff --

I think we should first check  whether partition.parameters  contains 
SetupConst.RAW_DATA_SIZE and  StatsSetupConst.TOTAL_SIZE) or not. If 
partition.parameters contains the size of the partition, use it instead of 
getConetSummary of hdfs


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19262: [MINOR][ML] Remove unnecessary default value setting for...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19262
  
**[Test build #81857 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81857/testReport)**
 for PR 19262 at commit 
[`84c6c9e`](https://github.com/apache/spark/commit/84c6c9e394939fa97a6d3888547ac269b234a2c0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19204: [SPARK-21981][PYTHON][ML] Added Python interface ...

2017-09-17 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19204#discussion_r139312695
  
--- Diff: python/pyspark/ml/evaluation.py ---
@@ -328,6 +329,86 @@ def setParams(self, predictionCol="prediction", 
labelCol="label",
 kwargs = self._input_kwargs
 return self._set(**kwargs)
 
+
+@inherit_doc
+class ClusteringEvaluator(JavaEvaluator, HasPredictionCol, HasFeaturesCol,
+  JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+
+Evaluator for Clustering results, which expects two input
+columns: prediction and features.
+
+>>> from sklearn import datasets
+>>> from pyspark.sql.types import *
+>>> from pyspark.ml.linalg import Vectors, VectorUDT
+>>> from pyspark.ml.evaluation import ClusteringEvaluator
+...
+>>> iris = datasets.load_iris()
+>>> iris_rows = [(Vectors.dense(x), int(iris.target[i]))
+... for i, x in enumerate(iris.data)]
+>>> schema = StructType([
+...StructField("features", VectorUDT(), True),
+...StructField("cluster_id", IntegerType(), True)])
+>>> rdd = spark.sparkContext.parallelize(iris_rows)
+>>> dataset = spark.createDataFrame(rdd, schema)
+...
+>>> evaluator = ClusteringEvaluator(predictionCol="cluster_id")
+>>> evaluator.evaluate(dataset)
+0.656...
+>>> ce_path = temp_path + "/ce"
+>>> evaluator.save(ce_path)
+>>> evaluator2 = ClusteringEvaluator.load(ce_path)
+>>> str(evaluator2.getPredictionCol())
+'cluster_id'
+
+.. versionadded:: 2.3.0
+"""
+metricName = Param(Params._dummy(), "metricName",
+   "metric name in evaluation (silhouette)",
+   typeConverter=TypeConverters.toString)
+
+@keyword_only
+def __init__(self, predictionCol="prediction", featuresCol="features",
+ metricName="silhouette"):
+"""
+__init__(self, predictionCol="prediction", featuresCol="features", 
\
+ metricName="silhouette")
+"""
+super(ClusteringEvaluator, self).__init__()
+self._java_obj = self._new_java_obj(
+"org.apache.spark.ml.evaluation.ClusteringEvaluator", self.uid)
+self._setDefault(predictionCol="prediction", 
featuresCol="features",
--- End diff --

I sent #19262 to fix same issue for other evaluators, please feel free to 
comment. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...

2017-09-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19226
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r139312657
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -77,20 +77,21 @@ trait QueryPlanConstraints { self: LogicalPlan =>
 constraint match {
   // When the root is IsNotNull, we can push IsNotNull through the 
child null intolerant
   // expressions
-  case IsNotNull(expr) => 
scanNullIntolerantAttribute(expr).map(IsNotNull(_))
+  case IsNotNull(expr) => 
scanNullIntolerantField(expr).map(IsNotNull(_))
   // Constraints always return true for all the inputs. That means, 
null will never be returned.
   // Thus, we can infer `IsNotNull(constraint)`, and also push 
IsNotNull through the child
   // null intolerant expressions.
-  case _ => scanNullIntolerantAttribute(constraint).map(IsNotNull(_))
+  case _ => scanNullIntolerantField(constraint).map(IsNotNull(_))
--- End diff --

Previously IsNotNull constraints are Attribute-specific, why do we need to 
expand it to `ExtractValue`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19226: [SPARK-21985][PySpark] PairDeserializer is broken for do...

2017-09-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19226
  
**[Test build #81856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81856/testReport)**
 for PR 19226 at commit 
[`5282ee5`](https://github.com/apache/spark/commit/5282ee5a8624d313ec5995db4d772ec1170ea049).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19262: [MINOR][ML] Remove unnecessary default value sett...

2017-09-17 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/19262

[MINOR][ML] Remove unnecessary default value setting for evaluators.

## What changes were proposed in this pull request?
Remove unnecessary default value setting for all evaluators, as we have set 
them in corresponding _HasXXX_ base classes.

## How was this patch tested?
Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark evaluation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19262


commit 84c6c9e394939fa97a6d3888547ac269b234a2c0
Author: Yanbo Liang 
Date:   2017-09-17T14:48:35Z

Remove unnecessary default value setting for evaluators.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-17 Thread vkhristenko
Github user vkhristenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/16578#discussion_r139312613
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala
 ---
@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.parquet
+
+import org.apache.spark.sql.catalyst.expressions.{And, Attribute, 
Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.planning.{PhysicalOperation, 
ProjectionOverSchema, SelectedField}
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.types.{StructField, StructType}
+
+/**
+ * Prunes unnecessary Parquet columns given a [[PhysicalOperation]] over a
+ * [[ParquetRelation]]. By "Parquet column", we mean a column as defined 
in the
+ * Parquet format. In Spark SQL, a root-level Parquet column corresponds 
to a
+ * SQL column, and a nested Parquet column corresponds to a 
[[StructField]].
+ */
+private[sql] object ParquetSchemaPruning extends Rule[LogicalPlan] {
+  override def apply(plan: LogicalPlan): LogicalPlan =
+plan transformDown {
+  case op @ PhysicalOperation(projects, filters,
+  l @ LogicalRelation(hadoopFsRelation @ HadoopFsRelation(_, 
partitionSchema,
+dataSchema, _, parquetFormat: ParquetFileFormat, _), _, _)) =>
+val projectionFields = projects.flatMap(getFields)
+val filterFields = filters.flatMap(getFields)
+val requestedFields = (projectionFields ++ filterFields).distinct
+
+// If [[requestedFields]] includes a proper field, continue. 
Otherwise,
+// return [[op]]
+if (requestedFields.exists { case (_, optAtt) => optAtt.isEmpty }) 
{
+  val prunedSchema = requestedFields
+.map { case (field, _) => StructType(Array(field)) }
+.reduceLeft(_ merge _)
--- End diff --

@viirya @mallman 

If I may add here, given a schema:
```
root
| - a: StructType
|| - f1: Int
|| - f2: Int
```
and a selection `df.select("a.f1", "a.f2")` vs `df.select("a.f2", "a.f1")`
will produce different requiredSchema fed into buildReader upon some action.

In the first case it will be `StructType( StructField( "a", 
StructType(StructField("f1") :: StructField("f2") :: Nil)) :: Nil)` and in the 
second `StructType( StructField( "a", StructType(StructField("f2") :: 
StructField("f1") :: Nil)) :: Nil)` 

which means that the original schema is different from the one that is 
required for the second case. But as long as fields f1 and f2 are splitted (can 
be read without reading the other), it's remappable on the data source level.

VK


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19204: [SPARK-21981][PYTHON][ML] Added Python interface ...

2017-09-17 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19204#discussion_r139312388
  
--- Diff: python/pyspark/ml/evaluation.py ---
@@ -328,6 +329,86 @@ def setParams(self, predictionCol="prediction", 
labelCol="label",
 kwargs = self._input_kwargs
 return self._set(**kwargs)
 
+
+@inherit_doc
+class ClusteringEvaluator(JavaEvaluator, HasPredictionCol, HasFeaturesCol,
+  JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+
+Evaluator for Clustering results, which expects two input
+columns: prediction and features.
+
+>>> from sklearn import datasets
+>>> from pyspark.sql.types import *
+>>> from pyspark.ml.linalg import Vectors, VectorUDT
+>>> from pyspark.ml.evaluation import ClusteringEvaluator
+...
+>>> iris = datasets.load_iris()
+>>> iris_rows = [(Vectors.dense(x), int(iris.target[i]))
+... for i, x in enumerate(iris.data)]
+>>> schema = StructType([
+...StructField("features", VectorUDT(), True),
+...StructField("cluster_id", IntegerType(), True)])
--- End diff --

```cluster_id``` -> ```prediction``` to emphasize this is the prediction 
value, not ground truth.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19204: [SPARK-21981][PYTHON][ML] Added Python interface ...

2017-09-17 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/19204#discussion_r139312199
  
--- Diff: python/pyspark/ml/evaluation.py ---
@@ -328,6 +329,86 @@ def setParams(self, predictionCol="prediction", 
labelCol="label",
 kwargs = self._input_kwargs
 return self._set(**kwargs)
 
+
+@inherit_doc
+class ClusteringEvaluator(JavaEvaluator, HasPredictionCol, HasFeaturesCol,
+  JavaMLReadable, JavaMLWritable):
+"""
+.. note:: Experimental
+
+Evaluator for Clustering results, which expects two input
+columns: prediction and features.
+
+>>> from sklearn import datasets
+>>> from pyspark.sql.types import *
+>>> from pyspark.ml.linalg import Vectors, VectorUDT
+>>> from pyspark.ml.evaluation import ClusteringEvaluator
+...
+>>> iris = datasets.load_iris()
+>>> iris_rows = [(Vectors.dense(x), int(iris.target[i]))
+... for i, x in enumerate(iris.data)]
+>>> schema = StructType([
+...StructField("features", VectorUDT(), True),
+...StructField("cluster_id", IntegerType(), True)])
+>>> rdd = spark.sparkContext.parallelize(iris_rows)
+>>> dataset = spark.createDataFrame(rdd, schema)
+...
+>>> evaluator = ClusteringEvaluator(predictionCol="cluster_id")
+>>> evaluator.evaluate(dataset)
+0.656...
+>>> ce_path = temp_path + "/ce"
+>>> evaluator.save(ce_path)
+>>> evaluator2 = ClusteringEvaluator.load(ce_path)
+>>> str(evaluator2.getPredictionCol())
+'cluster_id'
+
+.. versionadded:: 2.3.0
+"""
+metricName = Param(Params._dummy(), "metricName",
+   "metric name in evaluation (silhouette)",
+   typeConverter=TypeConverters.toString)
+
+@keyword_only
+def __init__(self, predictionCol="prediction", featuresCol="features",
+ metricName="silhouette"):
+"""
+__init__(self, predictionCol="prediction", featuresCol="features", 
\
+ metricName="silhouette")
+"""
+super(ClusteringEvaluator, self).__init__()
+self._java_obj = self._new_java_obj(
+"org.apache.spark.ml.evaluation.ClusteringEvaluator", self.uid)
+self._setDefault(predictionCol="prediction", 
featuresCol="features",
--- End diff --

Remove setting default value for ```predictionCol``` and ```featuresCol```, 
as they have been set in ```HasPredictionCol``` and ```HasFeaturesCol```.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >