[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21594
  
**[Test build #92265 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92265/testReport)**
 for PR 21594 at commit 
[`bf42fdf`](https://github.com/apache/spark/commit/bf42fdf4d5b0f83e4dc62594a384cb594a5c4d97).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21594
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21594: [SPARK-24596][SQL] Non-cascading Cache Invalidation

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21594
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/442/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceRe...

2018-06-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21625#discussion_r197628635
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 ---
@@ -573,32 +578,6 @@ object DataSourceReadBenchmark {
   }
 }
 
-/*
-Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
-Partitioned Table:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
-

--- End diff --

oh, thanks. I'll update soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceRe...

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21625#discussion_r197627610
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
 ---
@@ -573,32 +578,6 @@ object DataSourceReadBenchmark {
   }
 }
 
-/*
-Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
-Partitioned Table:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
-

--- End diff --

Seems missed to update.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92262/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21482
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21482
  
**[Test build #92262 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92262/testReport)**
 for PR 21482 at commit 
[`6bd6735`](https://github.com/apache/spark/commit/6bd6735320797c4d42723fdfc232c68df996c40c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBench...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21625
  
**[Test build #92264 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92264/testReport)**
 for PR 21625 at commit 
[`2352820`](https://github.com/apache/spark/commit/23528200f833f236a83d6b891388b6ec698bcac7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBench...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/441/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBench...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21625
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBench...

2018-06-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21625
  
This pr is a follow-up of 
https://github.com/apache/spark/pull/21288#event-1697426905


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21625: [SPARK-24206][SQL][FOLLOW-UP] Update DataSourceRe...

2018-06-23 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/21625

[SPARK-24206][SQL][FOLLOW-UP] Update DataSourceReadBenchmark benchmark 
results 

## What changes were proposed in this pull request?
This pr corrected the default configuration (`spark.master=local[1]`) for 
benchmarks. Also, this updated performance results on the AWS `r3.xlarge`.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark FixDataSourceReadBenchmark

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21625


commit 23528200f833f236a83d6b891388b6ec698bcac7
Author: Takeshi Yamamuro 
Date:   2018-06-16T01:48:15Z

Fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21389
  
**[Test build #92263 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92263/testReport)**
 for PR 21389 at commit 
[`a3c400d`](https://github.com/apache/spark/commit/a3c400d83398443bbf4abdb3764f52c6c4a8893d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/ParquetFi...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21389
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/440/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21288
  
Sure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21379: [SPARK-24327][SQL] Verify and normalize a partition colu...

2018-06-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21379
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626497
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
+import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
+import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource in write path.
+   */
+  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = false)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource in read path.
+   */
+  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = true)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource. This verification 
should be done
+   * in a driver side, e.g., `prepareWrite`, `buildReader`, and 
`buildReaderWithPartitionValues`
+   * in `FileFormat`.
+   *
+   * Unsupported data types of csv, json, orc, and parquet are as follows;
+   *  csv -> R/W: Interval, Null, Array, Map, Struct
+   *  json -> W: Interval
+   *  orc -> W: Interval, Null
+   *  parquet -> R/W: Interval, Null
+   */
+  private def verifySchema(format: FileFormat, schema: StructType, 
isReadPath: Boolean): Unit = {
+def throwUnsupportedException(dataType: DataType): Unit = {
+  throw new UnsupportedOperationException(
+s"$format data source does not support ${dataType.simpleString} 
data type.")
+}
+
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case _: StructType | _: ArrayType | _: MapType if 
format.isInstanceOf[CSVFileFormat] =>
+throwUnsupportedException(dataType)
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case _: CalendarIntervalType if isReadPath && 
format.isInstanceOf[JsonFileFormat] ||
+isReadPath && format.isInstanceOf[OrcFileFormat] =>
--- End diff --

yea, I'll recheck and list up these in the end.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626501
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -202,4 +204,222 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext with Befo
   }
 }
   }
+
+  // Unsupported data types of csv, json, orc, and parquet are as follows;
+  //  csv -> R/W: Interval, Null, Array, Map, Struct
+  //  json -> W: Interval
+  //  orc -> W: Interval, Null
+  //  parquet -> R/W: Interval, Null
+  test("SPARK-24204 error handling for unsupported data types") {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20345: [SPARK-23172][SQL] Expand the ReorderJoin rule to handle...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20345
  
cc @maryannxue Please take a look at this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626306
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
+import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
+import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource in write path.
+   */
+  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = false)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource in read path.
+   */
+  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = true)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource. This verification 
should be done
+   * in a driver side, e.g., `prepareWrite`, `buildReader`, and 
`buildReaderWithPartitionValues`
+   * in `FileFormat`.
+   *
+   * Unsupported data types of csv, json, orc, and parquet are as follows;
+   *  csv -> R/W: Interval, Null, Array, Map, Struct
+   *  json -> W: Interval
+   *  orc -> W: Interval, Null
+   *  parquet -> R/W: Interval, Null
+   */
+  private def verifySchema(format: FileFormat, schema: StructType, 
isReadPath: Boolean): Unit = {
+def throwUnsupportedException(dataType: DataType): Unit = {
+  throw new UnsupportedOperationException(
+s"$format data source does not support ${dataType.simpleString} 
data type.")
+}
+
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case _: StructType | _: ArrayType | _: MapType if 
format.isInstanceOf[CSVFileFormat] =>
+throwUnsupportedException(dataType)
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case _: CalendarIntervalType if isReadPath && 
format.isInstanceOf[JsonFileFormat] ||
+isReadPath && format.isInstanceOf[OrcFileFormat] =>
--- End diff --

Let us try to enumerate all the ones we want to block? Instead of relying 
on the last case to throw the exceptions?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626232
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -202,4 +204,222 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext with Befo
   }
 }
   }
+
+  // Unsupported data types of csv, json, orc, and parquet are as follows;
+  //  csv -> R/W: Interval, Null, Array, Map, Struct
+  //  json -> W: Interval
+  //  orc -> W: Interval, Null
+  //  parquet -> R/W: Interval, Null
+  test("SPARK-24204 error handling for unsupported data types") {
--- End diff --

Can we split this test case to multiple smaller ones? It will be easy to 
understand what happens if any of them fail by a meaningful test case name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626168
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
+import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
+import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource in write path.
+   */
+  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = false)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource in read path.
+   */
+  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = true)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource. This verification 
should be done
+   * in a driver side, e.g., `prepareWrite`, `buildReader`, and 
`buildReaderWithPartitionValues`
+   * in `FileFormat`.
+   *
+   * Unsupported data types of csv, json, orc, and parquet are as follows;
+   *  csv -> R/W: Interval, Null, Array, Map, Struct
+   *  json -> W: Interval
+   *  orc -> W: Interval, Null
+   *  parquet -> R/W: Interval, Null
+   */
+  private def verifySchema(format: FileFormat, schema: StructType, 
isReadPath: Boolean): Unit = {
+def throwUnsupportedException(dataType: DataType): Unit = {
+  throw new UnsupportedOperationException(
+s"$format data source does not support ${dataType.simpleString} 
data type.")
+}
+
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case _: StructType | _: ArrayType | _: MapType if 
format.isInstanceOf[CSVFileFormat] =>
+throwUnsupportedException(dataType)
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case _: CalendarIntervalType if isReadPath && 
format.isInstanceOf[JsonFileFormat] ||
+isReadPath && format.isInstanceOf[OrcFileFormat] =>
+
+  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
+
+  // For JSON backward-compatibility
+  case NullType if format.isInstanceOf[JsonFileFormat] ||
+(isReadPath && format.isInstanceOf[OrcFileFormat]) =>
+
+  case _ => throwUnsupportedException(dataType)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626164
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
+import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
+import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource in write path.
+   */
+  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = false)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource in read path.
+   */
+  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = true)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource. This verification 
should be done
+   * in a driver side, e.g., `prepareWrite`, `buildReader`, and 
`buildReaderWithPartitionValues`
+   * in `FileFormat`.
+   *
+   * Unsupported data types of csv, json, orc, and parquet are as follows;
+   *  csv -> R/W: Interval, Null, Array, Map, Struct
+   *  json -> W: Interval
+   *  orc -> W: Interval, Null
+   *  parquet -> R/W: Interval, Null
+   */
+  private def verifySchema(format: FileFormat, schema: StructType, 
isReadPath: Boolean): Unit = {
+def throwUnsupportedException(dataType: DataType): Unit = {
+  throw new UnsupportedOperationException(
+s"$format data source does not support ${dataType.simpleString} 
data type.")
+}
+
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case _: StructType | _: ArrayType | _: MapType if 
format.isInstanceOf[CSVFileFormat] =>
+throwUnsupportedException(dataType)
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case _: CalendarIntervalType if isReadPath && 
format.isInstanceOf[JsonFileFormat] ||
+isReadPath && format.isInstanceOf[OrcFileFormat] =>
--- End diff --

ok, will do


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626154
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
+import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
+import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource in write path.
+   */
+  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = false)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource in read path.
+   */
+  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = true)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource. This verification 
should be done
+   * in a driver side, e.g., `prepareWrite`, `buildReader`, and 
`buildReaderWithPartitionValues`
+   * in `FileFormat`.
+   *
+   * Unsupported data types of csv, json, orc, and parquet are as follows;
+   *  csv -> R/W: Interval, Null, Array, Map, Struct
+   *  json -> W: Interval
+   *  orc -> W: Interval, Null
+   *  parquet -> R/W: Interval, Null
+   */
+  private def verifySchema(format: FileFormat, schema: StructType, 
isReadPath: Boolean): Unit = {
+def throwUnsupportedException(dataType: DataType): Unit = {
+  throw new UnsupportedOperationException(
+s"$format data source does not support ${dataType.simpleString} 
data type.")
+}
+
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case _: StructType | _: ArrayType | _: MapType if 
format.isInstanceOf[CSVFileFormat] =>
+throwUnsupportedException(dataType)
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case _: CalendarIntervalType if isReadPath && 
format.isInstanceOf[JsonFileFormat] ||
+isReadPath && format.isInstanceOf[OrcFileFormat] =>
+
+  case udt: UserDefinedType[_] => verifyType(udt.sqlType)
+
+  // For JSON backward-compatibility
+  case NullType if format.isInstanceOf[JsonFileFormat] ||
+(isReadPath && format.isInstanceOf[OrcFileFormat]) =>
+
+  case _ => throwUnsupportedException(dataType)
--- End diff --

Write a comment above this? 
> // Actually we won't pass in unsupported data types, this is a safety 
check.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21598
  
IMHO we need to have clear decision rules for these kinds of behaviour 
changes in the contribution guide. In the past migration guide descriptions, it 
seems we've already accept some behaviour changes? e.g., `elt` and `concat`, 
these apis are of public APIs though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF shou...

2018-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21427


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21389: [SPARK-24204][SQL] Verify a schema in Json/Orc/Pa...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21389#discussion_r197626122
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceUtils.scala
 ---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
+import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
+import org.apache.spark.sql.execution.datasources.orc.OrcFileFormat
+import org.apache.spark.sql.types._
+
+
+object DataSourceUtils {
+
+  /**
+   * Verify if the schema is supported in datasource in write path.
+   */
+  def verifyWriteSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = false)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource in read path.
+   */
+  def verifyReadSchema(format: FileFormat, schema: StructType): Unit = {
+verifySchema(format, schema, isReadPath = true)
+  }
+
+  /**
+   * Verify if the schema is supported in datasource. This verification 
should be done
+   * in a driver side, e.g., `prepareWrite`, `buildReader`, and 
`buildReaderWithPartitionValues`
+   * in `FileFormat`.
+   *
+   * Unsupported data types of csv, json, orc, and parquet are as follows;
+   *  csv -> R/W: Interval, Null, Array, Map, Struct
+   *  json -> W: Interval
+   *  orc -> W: Interval, Null
+   *  parquet -> R/W: Interval, Null
+   */
+  private def verifySchema(format: FileFormat, schema: StructType, 
isReadPath: Boolean): Unit = {
+def throwUnsupportedException(dataType: DataType): Unit = {
+  throw new UnsupportedOperationException(
+s"$format data source does not support ${dataType.simpleString} 
data type.")
+}
+
+def verifyType(dataType: DataType): Unit = dataType match {
+  case BooleanType | ByteType | ShortType | IntegerType | LongType | 
FloatType | DoubleType |
+   StringType | BinaryType | DateType | TimestampType | _: 
DecimalType =>
+
+  case _: StructType | _: ArrayType | _: MapType if 
format.isInstanceOf[CSVFileFormat] =>
+throwUnsupportedException(dataType)
+
+  case st: StructType => st.foreach { f => verifyType(f.dataType) }
+
+  case ArrayType(elementType, _) => verifyType(elementType)
+
+  case MapType(keyType, valueType, _) =>
+verifyType(keyType)
+verifyType(valueType)
+
+  case _: CalendarIntervalType if isReadPath && 
format.isInstanceOf[JsonFileFormat] ||
+isReadPath && format.isInstanceOf[OrcFileFormat] =>
--- End diff --

simplify the condition?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21482#discussion_r197626112
  
--- Diff: python/pyspark/sql/column.py ---
@@ -514,6 +514,16 @@ def isin(self, *cols):
 desc_nulls_first = ignore_unicode_prefix(_unary_op("desc_nulls_first", 
_desc_nulls_first_doc))
 desc_nulls_last = ignore_unicode_prefix(_unary_op("desc_nulls_last", 
_desc_nulls_last_doc))
 
+_isInf_doc = """
+True if the current expression is inf.
+
+>>> from pyspark.sql import Row
+>>> df = spark.createDataFrame([\
+Row(name=u'Tom', height=80.0),\
+Row(name=u'Alice', height=float('inf'))])
--- End diff --

nit:

```
>>> df = spark.createDataFrame([
... Row(name=u'Tom', height=80.0),
... Row(name=u'Alice', height=float('inf'))
... ])
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21427: [SPARK-24324][PYTHON] Pandas Grouped Map UDF should assi...

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21427
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21598
  
@markhamstra, Spark sometimes has some behaviour changes for some bug fixes 
or in few other cases so far. At least, see the similar configurations added in 
the migration guide. It sounded we are setting a hard limit here whether it's a 
bug or not. If the standards don't reflect the practice, it really should be 
discussed to be correct or complied. It is a matter of "from now on" to me 
since it sounds a bit different so far if I understood correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21288
  
Thanks for the check! btw, `DataSourceReadBenchmark` has the same issue 
(`spark.master` setup), so is it ok to fix this as follow-up? 

https://github.com/apache/spark/compare/master...maropu:FixDataSourceReadBenchmark
Also, I update the bench on `r3.xlarge`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21288


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmark bench...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21288
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21247: [SPARK-24190][SQL] Allow saving of JSON files in ...

2018-06-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21247


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20949
  
cc @MaxGekk @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21247: [SPARK-24190][SQL] Allow saving of JSON files in UTF-16 ...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21247
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21590: [SPARK-24423][SQL] Add a new option for JDBC sources

2018-06-23 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21590
  
@gatorsmile Thanks a lot. I will process your comments and get back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21482: [SPARK-24393][SQL] SQL builtin: isinf

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21482
  
**[Test build #92262 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92262/testReport)**
 for PR 21482 at commit 
[`6bd6735`](https://github.com/apache/spark/commit/6bd6735320797c4d42723fdfc232c68df996c40c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/21598
  
@HyukjinKwon this is not new policy. It is what Apache Spark has guaranteed 
in its version numbering and public API since 1.0.0. It is not a matter of 
"from now on", but rather of whether committers have started allowing our 
standards to slip. It may well be time for a discussion of that and of better 
tools to help guarantee that additions and changes to the public API are 
adequately discussed and reviewed, appropriate InterfaceStability annotations 
are applied, etc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21621: [SPARK-24633][SQL] Fix codegen when split is requ...

2018-06-23 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21621#discussion_r197623576
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -556,6 +556,17 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
 checkAnswer(df8.selectExpr("arrays_zip(v1, v2)"), expectedValue8)
   }
 
+  test("SPARK-24633: arrays_zip splits input processing correctly") {
+Seq("true", "false").foreach { wholestageCodegenEnabled =>
+  withSQLConf(SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key -> 
wholestageCodegenEnabled) {
+val df = spark.range(1)
+val exprs = (0 to 5).map(x => array($"id" + lit(x)))
--- End diff --

This wasn't splitting input processing for me. Maybe 
splitExpressionsWithCurrentInputs has a bigger threshold than splitExpressions.

Even at 90, it still had not split the input processing. At 100, it finally 
did. So someplace between 90 and 100, it starts splitting.

I might be looking at the wrong thing. Check at your end.


val exprs = (0 to 100).map(x => array($"id" + lit(x)))
checkAnswer(df.select(arrays_zip(exprs: _*)),
  Row(Seq(Row(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 
16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 
37, 38, 39, 40, 41,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 
58, 59, 60, 61, 62,
63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 
79, 80, 81, 82, 83,
84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 
100




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12119: [SPARK-14288][SQL] Memory Sink for streaming

2018-06-23 Thread ChandraMadhumanchi
Github user ChandraMadhumanchi commented on the issue:

https://github.com/apache/spark/pull/12119
  
Whether Memory Structured streaming sink is idempotent and provides 
end-to-end exactly-once semantics?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21622
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92261/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21622
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21622
  
**[Test build #92261 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92261/testReport)**
 for PR 21622 at commit 
[`147c98a`](https://github.com/apache/spark/commit/147c98a94140bae505116f5af4d616dcf8d85eab).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620329
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
--- End diff --

Another option is to follow what we are doing in another PR: 
https://github.com/apache/spark/pull/21247 ? We are facing the same issue 
there. The options are shared by both read and write paths. However, the 
limitations are different. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620436
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableOrQuery = tableName.map(_.trim).getOrElse {
--- End diff --

Using a tuple match here?
```
(tableName, query) match {
  case ...
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620455
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
--- End diff --

Great! Please add them to the doc as I mentioned above?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620140
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
+   SELECT columns FROM (user_specified_query) 
spark_gen_alias
+  Its not allowed to specify `dbtable` and `query` options at the same 
time.
--- End diff --

Also document the limitation of `query` when having `partitionColumn` and 
the workaround?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620099
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
--- End diff --

`datasource` -> `JDBC source`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620021
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
--- End diff --

Nit: `It is`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620115
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
+  As an example, spark will issue a query of the following form to the 
datasource.
+   SELECT columns FROM (user_specified_query) 
spark_gen_alias
+  Its not allowed to specify `dbtable` and `query` options at the same 
time.
--- End diff --

`Its` -> `it is`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620496
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcRelationProvider.scala
 ---
@@ -58,6 +58,10 @@ class JdbcRelationProvider extends 
CreatableRelationProvider
   parameters: Map[String, String],
   df: DataFrame): BaseRelation = {
 val options = new JDBCOptions(parameters)
+require(
+  options.tableName.isDefined,
+  s"Option '${JDBCOptions.JDBC_TABLE_NAME}' is required. " +
+s"Option '${JDBCOptions.JDBC_QUERY_STRING}' is not applicable 
while writing.")
--- End diff --

Let us create a `JDBCOptionsInWrite`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620483
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -109,6 +134,20 @@ class JDBCOptions(
 s"When reading JDBC data sources, users need to specify all or none 
for the following " +
   s"options: '$JDBC_PARTITION_COLUMN', '$JDBC_LOWER_BOUND', 
'$JDBC_UPPER_BOUND', " +
   s"and '$JDBC_NUM_PARTITIONS'")
+
+  require(!(query.isDefined && partitionColumn.isDefined),
+s"""
+   |Options '$JDBC_QUERY_STRING' and '$JDBC_PARTITION_COLUMN' can not 
be specified together.
+   |Please define the query using `$JDBC_TABLE_NAME` option instead 
and make sure to qualify
+   |the partition columns using the supplied subquery alias to resolve 
any ambiguity.
+   |Example :
+   |spark.read.format("jdbc")
+   |.option("dbtable", "(select c1, c2 from t1) as subq")
+   |.option("partitionColumn", "subq.c1"
+   |.load()
+ """.stripMargin
+  )
--- End diff --

@maropu The new option `query` is just a syntactic sugar for simplifying 
the work from many basic JDBC users. We can improve it in the future. For 
example, parsing the user-specified query and make all the other options work. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620091
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1302,9 +1302,20 @@ the following case-insensitive options:
   
 dbtable
 
-  The JDBC table that should be read. Note that anything that is valid 
in a FROM clause of
-  a SQL query can be used. For example, instead of a full table you 
could also use a
-  subquery in parentheses.
+  The JDBC table that should be read from or written into. Note that 
when using it in the read
+  path anything that is valid in a FROM clause of a SQL 
query can be used.
+  For example, instead of a full table you could also use a subquery 
in parentheses. Its not
+  allowed to specify `dbtable` and `query` options at the same time.
+
+  
+  
+query
+
+  A query that will be used to read data into Spark. The specified 
query will be parenthesized and used
+  as a subquery in the FROM clause. Spark will also 
assign a alias to the subquery clause.
--- End diff --

`a alias` -> `an alias` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620340
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
--- End diff --

-> `can not be specified at the same time`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21590: [SPARK-24423][SQL] Add a new option for JDBC sour...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21590#discussion_r197620384
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -65,13 +65,38 @@ class JDBCOptions(
   // Required parameters
   // 
   require(parameters.isDefinedAt(JDBC_URL), s"Option '$JDBC_URL' is 
required.")
-  require(parameters.isDefinedAt(JDBC_TABLE_NAME), s"Option 
'$JDBC_TABLE_NAME' is required.")
+
   // a JDBC URL
   val url = parameters(JDBC_URL)
-  // name of table
-  val table = parameters(JDBC_TABLE_NAME)
+  val tableName = parameters.get(JDBC_TABLE_NAME)
+  val query = parameters.get(JDBC_QUERY_STRING)
+  // Following two conditions make sure that :
+  // 1. One of the option (dbtable or query) must be specified.
+  // 2. Both of them can not be specified at the same time as they are 
conflicting in nature.
+  require(
+tableName.isDefined || query.isDefined,
+s"Option '$JDBC_TABLE_NAME' or '${JDBC_QUERY_STRING}' is required."
+  )
+
+  require(
+!(tableName.isDefined && query.isDefined),
+s"Both '$JDBC_TABLE_NAME' and '$JDBC_QUERY_STRING' can not be 
specified."
+  )
+
+  // table name or a table expression.
+  val tableOrQuery = tableName.map(_.trim).getOrElse {
+// We have ensured in the code above that either dbtable or query is 
specified.
+query.get match {
+  case subQuery if subQuery.nonEmpty => s"(${subQuery}) 
spark_gen_${curId.getAndIncrement()}"
--- End diff --

`__SPARK_GEN_JDBC_SUBQUERY_NAME_${curId.getAndIncrement()}__`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21588
  
> is it basically true that Hadoop 3 will work with only minor patches to 
the Hive fork in Spark?

Up to my knowledge so far, yes, it basically works. At least, the 
regression tests we wrote so far work for what we currently cover.

> Is the blocker that we can't get a change into the Hive fork?

Up to my knowledge so far, yes, there is one line fix (HIVE-16081) that the 
fork needs.

> who owns it?

As far as I know, @JoshRosen owns it.

Please correct me if I am wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21598
  
I am good to have the configuration as that's basically what I suggested 
too. Removing the behaviour in 3.0.0 is fine. My only question left is the 
default value.

> We should change the behavior in 3.0. Before 3.0 release, we introduce a 
conf and make it configurable. The default is to keep the current behavior 
unchanged.

@gatorsmile, do you think this is specific to this PR and JIRA, or we 
should do the same things for other changes from now on? If it's latter, it 
should really be discussed in the dev mailing list in a separate thread.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21598#discussion_r197618603
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -75,28 +75,47 @@ trait BinaryArrayExpressionWithImplicitCast extends 
BinaryExpression
   > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
4
   """)
-case class Size(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
+case class Size(
+child: Expression,
+legacySizeOfNull: Boolean)
+  extends UnaryExpression with ExpectsInputTypes {
+
+  def this(child: Expression) =
+this(
+  child,
+  legacySizeOfNull = SQLConf.get.getConf(SQLConf.LEGACY_SIZE_OF_NULL))
+
   override def dataType: DataType = IntegerType
   override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(ArrayType, MapType))
-  override def nullable: Boolean = false
+  override def nullable: Boolean = if (legacySizeOfNull) false else 
super.nullable
 
   override def eval(input: InternalRow): Any = {
 val value = child.eval(input)
 if (value == null) {
-  -1
+  if (legacySizeOfNull) -1 else null
 } else child.dataType match {
   case _: ArrayType => value.asInstanceOf[ArrayData].numElements()
   case _: MapType => value.asInstanceOf[MapData].numElements()
+  case other => throw new UnsupportedOperationException(
+s"The size function doesn't support the operand type 
${other.getClass.getCanonicalName}")
 }
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val childGen = child.genCode(ctx)
-ev.copy(code = code"""
+if (legacySizeOfNull) {
+  val childGen = child.genCode(ctx)
+  ev.copy(code = code"""
   boolean ${ev.isNull} = false;
   ${childGen.code}
   ${CodeGenerator.javaType(dataType)} ${ev.value} = ${childGen.isNull} 
? -1 :
 (${childGen.value}).numElements();""", isNull = FalseLiteral)
+} else {
+  child.dataType match {
+case _: ArrayType | _: MapType => defineCodeGen(ctx, ev, c => 
s"($c).numElements()")
+case other => throw new UnsupportedOperationException(
--- End diff --

Could you reorganize the code? The current flow looks confusing. It appears 
like that the other types are not supported when `legacySizeOfNull` is false. 
However, the input types are already limited to ArrayType and MapType. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21598: [SPARK-24605][SQL] size(null) returns null instea...

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21598#discussion_r197618318
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -75,28 +75,47 @@ trait BinaryArrayExpressionWithImplicitCast extends 
BinaryExpression
   > SELECT _FUNC_(array('b', 'd', 'c', 'a'));
--- End diff --

Update the description of this function? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21619: [SPARK-24635][SQL] Remove Blocks class from JavaCode cla...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21619: [SPARK-24635][SQL] Remove Blocks class from JavaCode cla...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92259/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21619: [SPARK-24635][SQL] Remove Blocks class from JavaCode cla...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21619
  
**[Test build #92259 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92259/testReport)**
 for PR 21619 at commit 
[`9e14397`](https://github.com/apache/spark/commit/9e1439720a55f5c80169ddb367a1095db188698b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20701
  
**[Test build #92257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92257/testReport)**
 for PR 20701 at commit 
[`e2f68ac`](https://github.com/apache/spark/commit/e2f68ac612227aaafa809ad5f5074d1984aa907e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21598: [SPARK-24605][SQL] size(null) returns null instead of -1

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21598
  
I created a JIRA https://issues.apache.org/jira/browse/SPARK-24640. We 
should change the behavior in 3.0. Before 3.0 release, we introduce a conf and 
make it configurable. The default is to keep the current behavior unchanged. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21623
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92258/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21623
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21623
  
**[Test build #92258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92258/testReport)**
 for PR 21623 at commit 
[`5b52ace`](https://github.com/apache/spark/commit/5b52ace44c8a41631535c883b7a5c8545959e5e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21622
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92260/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21622
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21622
  
**[Test build #92260 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92260/testReport)**
 for PR 21622 at commit 
[`147c98a`](https://github.com/apache/spark/commit/147c98a94140bae505116f5af4d616dcf8d85eab).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21610: Updates to LICENSE and NOTICE

2018-06-23 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21610
  
OK if you want to pare down the PR to fixes we had discussed here and on 
the thread, I can get it in. I will need to just reevaluate the whole LICENSE 
and NOTICE anyway as it's clear it's at least a little out of date. So I'll be 
doing that in another PR anyway. OK if you'd rather me just run with that too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21621: [SPARK-24633][SQL] Fix codegen when split is required fo...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21621
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21621: [SPARK-24633][SQL] Fix codegen when split is required fo...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21621
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92255/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21621: [SPARK-24633][SQL] Fix codegen when split is required fo...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21621
  
**[Test build #92255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92255/testReport)**
 for PR 21621 at commit 
[`f682a5e`](https://github.com/apache/spark/commit/f682a5e78f11e43985a56352a6a8143be53a10e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92254/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21596
  
**[Test build #92254 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92254/testReport)**
 for PR 21596 at commit 
[`1c5ea8a`](https://github.com/apache/spark/commit/1c5ea8af1610c772869e82719ad1bd48961be10d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21622
  
**[Test build #92261 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92261/testReport)**
 for PR 21622 at commit 
[`147c98a`](https://github.com/apache/spark/commit/147c98a94140bae505116f5af4d616dcf8d85eab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-06-23 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/16677
  
If there is some codepath not updating shuffle write metrics (introduced
for sql), that would be a bug.

On Sat, Jun 23, 2018 at 7:27 AM Liang-Chi Hsieh 
wrote:

> *@viirya* commented on this pull request.
> --
>
> In
> 
core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
> :
>
> >  while (records.hasNext()) {
>final Product2 record = records.next();
>final K key = record._1();
>partitionWriters[partitioner.getPartition(key)].write(key, 
record._2());
> +  numOfRecords += 1;
>
> Hmm, I think it is fine. However, maybe I miss it, but I can't find
> SortShuffleWriter has updated writeMetrics_recordsWritten?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/21622
  
retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21622
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21622
  
**[Test build #92256 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92256/testReport)**
 for PR 21622 at commit 
[`147c98a`](https://github.com/apache/spark/commit/147c98a94140bae505116f5af4d616dcf8d85eab).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21622
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92256/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21623: [SPARK-24638][SQL] StringStartsWith support push down

2018-06-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21623
  
cc @rdblue 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21588: [SPARK-24590][BUILD] Make Jenkins tests passed with hado...

2018-06-23 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21588
  
Also weighing in here to ask: is it basically true that Hadoop 3 will work 
with only minor patches to the Hive fork in Spark? then that seems worth the 
hacking. Is the blocker that we can't get a change into the Hive fork? who owns 
it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21622
  
**[Test build #92260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92260/testReport)**
 for PR 21622 at commit 
[`147c98a`](https://github.com/apache/spark/commit/147c98a94140bae505116f5af4d616dcf8d85eab).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21622
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21542: [SPARK-24529][Build][test-maven] Add spotbugs int...

2018-06-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/21542#discussion_r197613977
  
--- Diff: pom.xml ---
@@ -2606,6 +2606,35 @@
   
 
   
+  
+com.github.spotbugs
+spotbugs-maven-plugin
+3.1.3
+
+  
${basedir}/target/scala-2.11/classes
+  
${basedir}/target/scala-2.11/test-classes
+  Max
+  Low
+  true
+  FindPuzzlers
+  false
+
+
+  
+com.github.spotbugs
+spotbugs
+3.1.3
--- End diff --

Does the spotbugs 3.1.3 plugin not already have this dependency? I would 
have assumed that's the default and we don't want to manage it separately. If 
we do maybe make a property for this version to make sure it doesn't vary?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21610: Updates to LICENSE and NOTICE

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21610
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92252/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21610: Updates to LICENSE and NOTICE

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21610
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21610: Updates to LICENSE and NOTICE

2018-06-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21610
  
**[Test build #92252 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92252/testReport)**
 for PR 21610 at commit 
[`ddb32e5`](https://github.com/apache/spark/commit/ddb32e50abb48fe65e89226bbf2aa221dde160e1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r197613554
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 ---
@@ -145,10 +145,12 @@ public void write(Iterator> records) 
throws IOException {
 // included in the shuffle write time.
 writeMetrics.incWriteTime(System.nanoTime() - openStartTime);
 
+long numOfRecords = 0;
 while (records.hasNext()) {
   final Product2 record = records.next();
   final K key = record._1();
   partitionWriters[partitioner.getPartition(key)].write(key, 
record._2());
+  numOfRecords += 1;
--- End diff --

Hmm, I think it is fine. However, maybe I miss it, but I can't find 
`SortShuffleWriter` has updated `writeMetrics_recordsWritten`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16677: [SPARK-19355][SQL] Use map output statistics to i...

2018-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16677#discussion_r197613004
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -204,6 +204,13 @@ object SQLConf {
 .intConf
 .createWithDefault(4)
 
+  val LIMIT_FLAT_GLOBAL_LIMIT = 
buildConf("spark.sql.limit.flatGlobalLimit")
+.internal()
+.doc("During global limit, try to evenly distribute limited rows 
across data " +
+  "partitions. If disabled, scanning data partitions sequentially 
until reaching limit number.")
+.booleanConf
+.createWithDefault(true)
--- End diff --

I set this as true. One reason is to see if it can pass existing tests. If 
we don't feel confident or worry about behavior change, we can set this to 
false before merging.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21619: [SPARK-24635][SQL] Remove Blocks class from JavaCode cla...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21619
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/439/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21619: [SPARK-24635][SQL] Remove Blocks class from JavaCode cla...

2018-06-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21619
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >