[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r175283491
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   )
 }
   }
+
+  def testFile(fileName: String): String = {
+
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
+  }
+
+  test("json in UTF-16 with BOM") {
+val fileName = "json-tests/utf16WithBOM.json"
+val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
+val jsonDF = spark.read.schema(schema)
+  // The mode filters null rows produced because new line delimiter
+  // for UTF-8 is used by default.
--- End diff --

Also, this is where we need a decision, right? It already does not work 
correctly. Another option for a min fix to follow rfc7159 is to describe that 
we don't support other encodings for now, to be clear.

I approved https://github.com/apache/spark/pull/20614 only 
respecting/assuming that it causes an actual issue to some sites and the 
release was close (which is true I guess now).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88348/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20854
  
**[Test build #88352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88352/testReport)**
 for PR 20854 at commit 
[`d0b40a9`](https://github.com/apache/spark/commit/d0b40a9ff6368051d737224dd9931a7ef1b428cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r175283216
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   )
 }
   }
+
+  def testFile(fileName: String): String = {
+
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
+  }
+
+  test("json in UTF-16 with BOM") {
+val fileName = "json-tests/utf16WithBOM.json"
+val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
+val jsonDF = spark.read.schema(schema)
+  // The mode filters null rows produced because new line delimiter
+  // for UTF-8 is used by default.
--- End diff --

@MaxGekk, see what happens in the test code here now. Lines are separated 
by a newline with UTF-8 and then the records are parsed by a different encoding.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support DATE predict push down...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175283818
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -50,6 +50,15 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.eq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType =>
+  (n: String, v: Any) => {
+FilterApi.eq(
+  intColumn(n),
+  Option(v).map{ date =>
--- End diff --

nit: `p{` -> `p {`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20774
  
**[Test build #88348 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88348/testReport)**
 for PR 20774 at commit 
[`a16deaa`](https://github.com/apache/spark/commit/a16deaa2ba54657a69b0cb0f09ec86c80339baa9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1586/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20830
  
**[Test build #88349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88349/testReport)**
 for PR 20830 at commit 
[`b7a4a91`](https://github.com/apache/spark/commit/b7a4a914fbdaddb4c56ee24257f477ff984e170e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1587/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20830
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [W...

2018-03-18 Thread hvanhovell
GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/20854

[SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

## What changes were proposed in this pull request?
This PR adds an interpreted version of `UnsafeRowJoiner` to Spark SQL.

Its performance is almost to par with the code generated `UnsafeRowJoiner`. 
There seems to be an overhead of 10ns per call. It might be an idea to not use 
code generation at all for an `UnsafeRowJoiner`

## How was this patch tested?
Modified existing row joiner tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-23712

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20854.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20854


commit b637ded5ddd38f58e2c0d1b5172ebed5cb9014e2
Author: Herman van Hovell 
Date:   2018-03-17T13:42:13Z

Add interpreted unsafe row joiner

commit d0b40a9ff6368051d737224dd9931a7ef1b428cb
Author: Herman van Hovell 
Date:   2018-03-18T12:16:30Z

Add benchmark




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20854
  
**[Test build #88351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88351/testReport)**
 for PR 20854 at commit 
[`d0b40a9`](https://github.com/apache/spark/commit/d0b40a9ff6368051d737224dd9931a7ef1b428cb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20701: [SPARK-23528][ML] Add numIter to ClusteringSummar...

2018-03-18 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20701#discussion_r175292026
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala ---
@@ -46,6 +47,10 @@ class KMeansModel @Since("2.4.0") (@Since("1.0.0") val 
clusterCenters: Array[Vec
   private val clusterCentersWithNorm =
 if (clusterCenters == null) null else clusterCenters.map(new 
VectorWithNorm(_))
 
+  @Since("2.4.0")
--- End diff --

I think this is the right one. 0.8.0 is the annotation for the 
`KMeansModel` class, while the previous main constructor was added (by me) is a 
previous PR for 2.4.0 in order to add the `distanceMeasure` variable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88351/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20701: [SPARK-23528][ML] Add numIter to ClusteringSummar...

2018-03-18 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20701#discussion_r175292059
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/clustering/BisectingKMeans.scala ---
@@ -312,4 +312,5 @@ class BisectingKMeansSummary private[clustering] (
 predictions: DataFrame,
 predictionCol: String,
 featuresCol: String,
-k: Int) extends ClusteringSummary(predictions, predictionCol, 
featuresCol, k)
+k: Int,
+numIter: Int) extends ClusteringSummary(predictions, predictionCol, 
featuresCol, k, numIter)
--- End diff --

thanks for pointing this out, I completely missed it. Thank you, I am 
adding them.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20796: [SPARK-23649][SQL] Skipping chars disallowed in U...

2018-03-18 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20796#discussion_r175281945
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -57,12 +57,39 @@
   public Object getBaseObject() { return base; }
   public long getBaseOffset() { return offset; }
 
-  private static int[] bytesOfCodePointInUTF8 = {2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2,
-2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
-3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
-4, 4, 4, 4, 4, 4, 4, 4,
-5, 5, 5, 5,
-6, 6};
+  /**
+   * A char in UTF-8 encoding can take 1-4 bytes depending on the first 
byte which
+   * indicates the size of the char. See Unicode standard in page 126:
+   * http://www.unicode.org/versions/Unicode10.0.0/UnicodeStandard-10.0.pdf
+   *
+   * BinaryHex  Comments
+   * 0xxx  0x00..0x7F   Only byte of a 1-byte character encoding
+   * 10xx  0x80..0xBF   Continuation bytes (1-3 continuation bytes)
+   * 110x  0xC0..0xDF   First byte of a 2-byte character encoding
--- End diff --

hmm, is this `0xC2..0xDF`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r175282994
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -85,6 +85,12 @@ private[sql] class JSONOptions(
 
   val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
+  /**
+   * Standard charset name. For example UTF-8, UTF-16 and UTF-32.
+   * If charset is not specified (None), it will be detected automatically.
--- End diff --

Json's schema inference use the text datasource to separate the lines 
before we go through jackson parser where the charset for newlines should be 
respected. Shouldn't we better fix text datasource with the hadoop's line 
reader first?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r175283468
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -85,6 +85,12 @@ private[sql] class JSONOptions(
 
   val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
+  /**
+   * Standard charset name. For example UTF-8, UTF-16 and UTF-32.
+   * If charset is not specified (None), it will be detected automatically.
--- End diff --

 A fix in hadoop line reader and this PR solve 2 different problem. Any fix 
in hadoop line reader will not fix the problem of wrong encoding detection. I 
don't understand why this PR must depend on a fix in line reader. I would say a 
custom record separator will solve newline problem too 
(https://issues.apache.org/jira/browse/SPARK-23724). 

> Shouldn't we better fix text datasource with the hadoop's line reader 
first?

Could you tell me how this PR blocks solving the problem in Hadoop's 
LineReader?  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20774
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20854
  
**[Test build #88351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88351/testReport)**
 for PR 20854 at commit 
[`d0b40a9`](https://github.com/apache/spark/commit/d0b40a9ff6368051d737224dd9931a7ef1b428cb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20701: [SPARK-23528][ML] Add numIter to ClusteringSummar...

2018-03-18 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20701#discussion_r175292115
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala ---
@@ -36,8 +36,9 @@ import org.apache.spark.sql.{Row, SparkSession}
  * A clustering model for K-means. Each point belongs to the cluster with 
the closest center.
  */
 @Since("0.8.0")
-class KMeansModel @Since("2.4.0") (@Since("1.0.0") val clusterCenters: 
Array[Vector],
-  @Since("2.4.0") val distanceMeasure: String)
+class KMeansModel private[spark] (@Since("1.0.0") val clusterCenters: 
Array[Vector],
--- End diff --

I just didn't want the user to be able to create a KMeansModel setting the 
number of iterations. I moved the other constructor which is still available. I 
don't have strong reasons against making this public, so I am removing the 
private clause if you think we best let it to be public.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20701
  
**[Test build #88355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88355/testReport)**
 for PR 20701 at commit 
[`f6ee4a2`](https://github.com/apache/spark/commit/f6ee4a2b4bb2444d65ab0e26a141304b327bd998).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1591/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...

2018-03-18 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20719
  
@holdenk @sethah @srowen @viirya may you please help reviewing this PR if 
you have time? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...

2018-03-18 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20717
  
any more comments @cloud-fan ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r175282421
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 ---
@@ -2063,4 +2063,178 @@ class JsonSuite extends QueryTest with 
SharedSQLContext with TestJsonData {
   )
 }
   }
+
+  def testFile(fileName: String): String = {
+
Thread.currentThread().getContextClassLoader.getResource(fileName).toString
+  }
+
+  test("json in UTF-16 with BOM") {
+val fileName = "json-tests/utf16WithBOM.json"
+val schema = new StructType().add("firstName", 
StringType).add("lastName", StringType)
+val jsonDF = spark.read.schema(schema)
+  // The mode filters null rows produced because new line delimiter
+  // for UTF-8 is used by default.
--- End diff --

We declare that we are able to read JSON. According to the rfc7159 (8.1 
Character Encoding):

```
   JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.  The default
   encoding is UTF-8, and JSON texts that are encoded in UTF-8 are
   interoperable in the sense that they will be read successfully by the
   maximum number of implementations; there are many implementations
   that cannot successfully read texts in other encodings (such as
   UTF-16 and UTF-32).
```

Users can think that Spark can read json in charset different from UTF-8 
because it SHALL do that according to the rfc, and we DON'T directly declare 
that jsons such encodings cannot be read successfully. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20830
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1588/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20830
  
**[Test build #88350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88350/testReport)**
 for PR 20830 at commit 
[`b7a4a91`](https://github.com/apache/spark/commit/b7a4a914fbdaddb4c56ee24257f477ff984e170e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20774
  
**[Test build #88348 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88348/testReport)**
 for PR 20774 at commit 
[`a16deaa`](https://github.com/apache/spark/commit/a16deaa2ba54657a69b0cb0f09ec86c80339baa9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class PromoteStrings(conf: SQLConf) extends TypeCoercionRule `
  * `  case class InConversion(conf: SQLConf) extends TypeCoercionRule `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1590/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #88354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88354/testReport)**
 for PR 18906 at commit 
[`64f0500`](https://github.com/apache/spark/commit/64f05000a2a323f260e0ef7a385096b7a10b2ef1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-18 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r175282099
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -85,6 +85,12 @@ private[sql] class JSONOptions(
 
   val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
+  /**
+   * Standard charset name. For example UTF-8, UTF-16 and UTF-32.
+   * If charset is not specified (None), it will be detected automatically.
--- End diff --

ok. How this one helps to solve the problem that I am trying to solve by 
this PR: jackson's charset auto-detection mechanism can fail on even UTF-8 
encoding and can infer wrong charset (see 
https://github.com/apache/spark/pull/20302) due to many reasons. And an user 
doesn't have any possibilities to fix the issue and bypass the auto-detection. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20841: [SPARK-23706][PYTHON] spark.conf.get(value, default=None...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20841
  
Merged to master and branch-2.3.

Thank you @ueshin, @BryanCutler and @viirya for reviewing this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20830
  
**[Test build #88349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88349/testReport)**
 for PR 20830 at commit 
[`b7a4a91`](https://github.com/apache/spark/commit/b7a4a914fbdaddb4c56ee24257f477ff984e170e).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88349/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20854
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88353/testReport)**
 for PR 20851 at commit 
[`15bd28d`](https://github.com/apache/spark/commit/15bd28d93613acf0adb0f2762977bcd233cf3b9f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20841: [SPARK-23706][PYTHON] spark.conf.get(value, default=None...

2018-03-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20841
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175293026
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +314,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
+  def d: Date = new Date(Date.valueOf("2018-03-01").getTime + 24 * 60 
* 60 * 1000 * (int - 1))
+}
+
+withParquetDataFrame((1 to 4).map(i => Tuple1(i.d))) { implicit df =>
--- End diff --

Could you kindly give me some examples about what kind of boundary tests? I 
checked parquet integer push down and ORC date type push down, seems like have 
covered all their tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20849: [SPARK-23723] New charset option for json datasou...

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20849#discussion_r175283674
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
 ---
@@ -85,6 +85,12 @@ private[sql] class JSONOptions(
 
   val multiLine = 
parameters.get("multiLine").map(_.toBoolean).getOrElse(false)
 
+  /**
+   * Standard charset name. For example UTF-8, UTF-16 and UTF-32.
+   * If charset is not specified (None), it will be detected automatically.
--- End diff --

> Could you tell me how this PR blocks solving the problem in Hadoop's 
LineReader?

Because the exposed `charset` option is incomplete here because the 
encodings.
Also, I want to see how we can solve that problem in SPARK-23724 first too. 
I am actually not quite worried of the whole changes proposed here for now.

Why don't we just fix that problem first if you plan to fix both eventually 
anyway?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20841: [SPARK-23706][PYTHON] spark.conf.get(value, defau...

2018-03-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20841


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1589/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20830
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88350/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20830
  
**[Test build #88350 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88350/testReport)**
 for PR 20830 at commit 
[`b7a4a91`](https://github.com/apache/spark/commit/b7a4a914fbdaddb4c56ee24257f477ff984e170e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-18 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175293032
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -148,6 +193,15 @@ private[parquet] object ParquetFilters {
 case BinaryType =>
   (n: String, v: Any) =>
 FilterApi.gtEq(binaryColumn(n), 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]]))
+case DateType =>
--- End diff --

Have added, kindly help review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20856
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20856
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1595/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][PYTHON] Allow python udfs to be called wit...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20798
  
**[Test build #88360 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88360/testReport)**
 for PR 20798 at commit 
[`65de58f`](https://github.com/apache/spark/commit/65de58f04c0e54ce13274a89e8aae1346dfa93be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20851
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88356 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88356/testReport)**
 for PR 20851 at commit 
[`1f2b450`](https://github.com/apache/spark/commit/1f2b45013305fddd7bbf75a56ae5d1e3b6979d94).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20851
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88356/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20855: [SPARK-23731][SQL] FileSourceScanExec throws Null...

2018-03-18 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/20855

[SPARK-23731][SQL] FileSourceScanExec throws NullPointerException in 
subexpression elimination

## What changes were proposed in this pull request?

Avoids (not necessarily fixes) a NullPointerException in subexpression 
elimination for subqueries with FileSourceScanExec.

## How was this patch tested?

Local build. No new tests as I could not reproduce it other than using the 
query and data under NDA. Waiting for Jenkins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
SPARK-23731-FileSourceScanExec-throws-NPE

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20855.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20855


commit 8ef323c572cee181e3bdbddeeb7119eda03d78f4
Author: Dongjoon Hyun 
Date:   2018-01-17T06:32:18Z

[SPARK-23072][SQL][TEST] Add a Unicode schema test for file-based data 
sources

## What changes were proposed in this pull request?

After [SPARK-20682](https://github.com/apache/spark/pull/19651), Apache 
Spark 2.3 is able to read ORC files with Unicode schema. Previously, it raises 
`org.apache.spark.sql.catalyst.parser.ParseException`.

This PR adds a Unicode schema test for CSV/JSON/ORC/Parquet file-based data 
sources. Note that TEXT data source only has [a single column with a fixed name 
'value'](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextFileFormat.scala#L71).

## How was this patch tested?

Pass the newly added test case.

Author: Dongjoon Hyun 

Closes #20266 from dongjoon-hyun/SPARK-23072.

(cherry picked from commit a0aedb0ded4183cc33b27e369df1cbf862779e26)
Signed-off-by: Wenchen Fan 

commit bfbc2d41b8a9278b347b6df2d516fe4679b41076
Author: Henry Robinson 
Date:   2018-01-17T08:01:41Z

[SPARK-23062][SQL] Improve EXCEPT documentation

## What changes were proposed in this pull request?

Make the default behavior of EXCEPT (i.e. EXCEPT DISTINCT) more
explicit in the documentation, and call out the change in behavior
from 1.x.

Author: Henry Robinson 

Closes #20254 from henryr/spark-23062.

(cherry picked from commit 1f3d933e0bd2b1e934a233ed699ad39295376e71)
Signed-off-by: gatorsmile 

commit cbb6bda437b0d2832496b5c45f8264e5527f1cce
Author: Dongjoon Hyun 
Date:   2018-01-17T13:53:36Z

[SPARK-21783][SQL] Turn on ORC filter push-down by default

## What changes were proposed in this pull request?

ORC filter push-down is disabled by default from the beginning, 
[SPARK-2883](https://github.com/apache/spark/commit/aa31e431fc09f0477f1c2351c6275769a31aca90#diff-41ef65b9ef5b518f77e2a03559893f4dR149
).

Now, Apache Spark starts to depend on Apache ORC 1.4.1. For Apache Spark 
2.3, this PR turns on ORC filter push-down by default like Parquet 
([SPARK-9207](https://issues.apache.org/jira/browse/SPARK-21783)) as a part of 
[SPARK-20901](https://issues.apache.org/jira/browse/SPARK-20901), "Feature 
parity for ORC with Parquet".

## How was this patch tested?

Pass the existing tests.

Author: Dongjoon Hyun 

Closes #20265 from dongjoon-hyun/SPARK-21783.

(cherry picked from commit 0f8a28617a0742d5a99debfbae91222c2e3b5cec)
Signed-off-by: Wenchen Fan 

commit aae73a21a42fa366a09c2be1a4b91308ef211beb
Author: Wang Gengliang 
Date:   2018-01-17T16:05:26Z

[SPARK-23079][SQL] Fix query constraints propagation with aliases

## What changes were proposed in this pull request?

Previously, PR #19201 fix the problem of non-converging constraints.
After that PR #19149 improve the loop and constraints is inferred only once.
So the problem of non-converging constraints is gone.

However, the case below will fail.

```

spark.range(5).write.saveAsTable("t")
val t = spark.read.table("t")
val left = t.withColumn("xid", $"id" + lit(1)).as("x")
val right = t.withColumnRenamed("id", "xid").as("y")
val df = left.join(right, "xid").filter("id = 3").toDF()
checkAnswer(df, Row(4, 3))

```

Because `aliasMap` replace all the aliased child. See the test case in PR 
for details.

This PR is to fix this bug by removing useless code for preventing 
non-converging constraints.
It can be also fixed with #20270, but this is much simpler and clean up the 
code.

## How was this patch 

[GitHub] spark pull request #20855: [SPARK-23731][SQL] FileSourceScanExec throws Null...

2018-03-18 Thread jaceklaskowski
Github user jaceklaskowski closed the pull request at:

https://github.com/apache/spark/pull/20855


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20856
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88359/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20701
  
**[Test build #88355 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88355/testReport)**
 for PR 20701 at commit 
[`f6ee4a2`](https://github.com/apache/spark/commit/f6ee4a2b4bb2444d65ab0e26a141304b327bd998).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class KMeansModel (@Since(\"1.0.0\") val clusterCenters: 
Array[Vector],`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20774
  
**[Test build #88357 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88357/testReport)**
 for PR 20774 at commit 
[`5fbbc30`](https://github.com/apache/spark/commit/5fbbc30625b756b3671bce1e6677e7382fde5eec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88356/testReport)**
 for PR 20851 at commit 
[`1f2b450`](https://github.com/apache/spark/commit/1f2b45013305fddd7bbf75a56ae5d1e3b6979d94).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20742
  
**[Test build #88358 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88358/testReport)**
 for PR 20742 at commit 
[`53c1710`](https://github.com/apache/spark/commit/53c1710f54888714744b3f0934ceeb732ed88f81).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20742
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1592/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20774
  
**[Test build #88357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88357/testReport)**
 for PR 20774 at commit 
[`5fbbc30`](https://github.com/apache/spark/commit/5fbbc30625b756b3671bce1e6677e7382fde5eec).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20742
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1593/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17254
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17254
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1594/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #88354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88354/testReport)**
 for PR 18906 at commit 
[`64f0500`](https://github.com/apache/spark/commit/64f05000a2a323f260e0ef7a385096b7a10b2ef1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20798
  
**[Test build #88360 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88360/testReport)**
 for PR 20798 at commit 
[`65de58f`](https://github.com/apache/spark/commit/65de58f04c0e54ce13274a89e8aae1346dfa93be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88355/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20856
  
**[Test build #88359 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88359/testReport)**
 for PR 20856 at commit 
[`3981421`](https://github.com/apache/spark/commit/39814216026da32eee5aabf3886bbedd3b90ed08).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88354/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20851
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88353 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88353/testReport)**
 for PR 20851 at commit 
[`15bd28d`](https://github.com/apache/spark/commit/15bd28d93613acf0adb0f2762977bcd233cf3b9f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20854
  
**[Test build #88352 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88352/testReport)**
 for PR 20854 at commit 
[`d0b40a9`](https://github.com/apache/spark/commit/d0b40a9ff6368051d737224dd9931a7ef1b428cb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support for pushing down filters for ...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20851
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88353/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88357/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20742: [SPARK-23572][docs] Bring "security.md" up to date.

2018-03-18 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20742
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20856: [SPARK-23731][SQL] FileSourceScanExec throws Null...

2018-03-18 Thread jaceklaskowski
GitHub user jaceklaskowski opened a pull request:

https://github.com/apache/spark/pull/20856

[SPARK-23731][SQL] FileSourceScanExec throws NullPointerException in 
subexpression elimination

## What changes were proposed in this pull request?

Avoids ("fixes") a NullPointerException in subexpression elimination for 
subqueries with FileSourceScanExec.

## How was this patch tested?

Local build. No new tests as I could not reproduce it other than using the 
query and data under NDA. Waiting for Jenkins.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jaceklaskowski/spark 
SPARK-23731-FileSourceScanExec-throws-NPE

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20856.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20856


commit 39814216026da32eee5aabf3886bbedd3b90ed08
Author: Jacek Laskowski 
Date:   2018-03-18T17:12:32Z

[SPARK-23731][SQL] FileSourceScanExec throws NullPointerException in 
subexpression elimination




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20856
  
**[Test build #88359 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88359/testReport)**
 for PR 20856 at commit 
[`3981421`](https://github.com/apache/spark/commit/39814216026da32eee5aabf3886bbedd3b90ed08).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20798: [SPARK-23645][MINOR][DOCS][PYTHON] Add docs RE `pandas_u...

2018-03-18 Thread mstewart141
Github user mstewart141 commented on the issue:

https://github.com/apache/spark/pull/20798
  
@HyukjinKwon thanks again. i've updated this PR to add documentation. I dug 
pretty deep into the bigger issue around kwargs/partial functions, and you can 
see what i did in the commit:

https://github.com/apache/spark/pull/20798/commits/969f9073ee06d2a5641f78247b75e30d9ad1679a

Basically, throughout the udf and arrow serialization code there is no 
notion of kwargs as supported, making it more challenging than I anticipated to 
wire everything together. Definitely not impossible, but not a small 
undertaking either.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20854: [SPARK-23712][SQL] Interpreted UnsafeRowJoiner [WIP]

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88352/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20856
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expected exc...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88345/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expected exc...

2018-03-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expected exc...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20852
  
**[Test build #88345 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88345/testReport)**
 for PR 20852 at commit 
[`7544cb4`](https://github.com/apache/spark/commit/7544cb427cbbb1a0186ad5e16cf4f09fee0c0dbf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20849: [SPARK-23723] New charset option for json datasource

2018-03-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20849
  
Does charset work with newlines?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20851: [SPARK-23727][SQL] Support DATE predict push down in par...

2018-03-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20851
  
**[Test build #88344 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88344/testReport)**
 for PR 20851 at commit 
[`079af71`](https://github.com/apache/spark/commit/079af71359bd49dc59c863f1a9a4f6fa28d5a8a0).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >