[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1632/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20816
  
**[Test build #88405 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88405/testReport)**
 for PR 20816 at commit 
[`7fe9329`](https://github.com/apache/spark/commit/7fe93295df5627f2fc4e712b71aa9ce75383d410).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20816
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20852: [SPARK-23728][BRANCH-2.3] Fix ML tests with expected exc...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20852
  
(@attilapiros, just in case it should be manually closed)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20863
  
**[Test build #88404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88404/testReport)**
 for PR 20863 at commit 
[`3056e3c`](https://github.com/apache/spark/commit/3056e3c469209d72c97046f9668e30e0dbc5818d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20863
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20863
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1631/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in P...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20863
  
cc @ueshin and @BryanCutler 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf ut...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20863#discussion_r175660234
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -2409,17 +2432,13 @@ def test_join_without_on(self):
 df1 = self.spark.range(1).toDF("a")
 df2 = self.spark.range(1).toDF("b")
 
-try:
--- End diff --

Other diff are basically the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf ut...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20863#discussion_r175660221
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -201,6 +202,28 @@ def assertPandasEqual(self, expected, result):
"\n\nResult:\n%s\n%s" % (result, result.dtypes))
 self.assertTrue(expected.equals(result), msg=msg)
 
+@contextmanager
--- End diff --

This was extracted alone from 
https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20863: [SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf ut...

2018-03-19 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/20863

[SPARK-23691][PYTHON][BRANCH-2.3] Use sql_conf util in PySpark tests where 
possible

## What changes were proposed in this pull request?

This PR backports https://github.com/apache/spark/pull/20830 to reduce the 
diff against master and restore the default value back in PySpark tests.


https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e 
added an useful util. This backport extracts and brings this util:

```python
@contextmanager
def sql_conf(self, pairs):
...
```

to allow configuration set/unset within a block:

```python
with self.sql_conf({"spark.blah.blah.blah", "blah"})
# test codes
```

This PR proposes to use this util where possible in PySpark tests.

Note that there look already few places affecting tests without restoring 
the original value back in unittest classes.

## How was this patch tested?

Likewise, manually tested via:

```
./run-tests --modules=pyspark-sql --python-executables=python2
./run-tests --modules=pyspark-sql --python-executables=python3
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark backport-20830

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20863.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20863


commit 4e8045cbddc39a5b8f3488b832a1ac092da68de9
Author: hyukjinkwon 
Date:   2018-03-20T04:25:37Z

[SPARK-23691][PYTHON] Use sql_conf util in PySpark tests where possible


https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e 
added an useful util

```python
contextmanager
def sql_conf(self, pairs):
...
```

to allow configuration set/unset within a block:

```python
with self.sql_conf({"spark.blah.blah.blah", "blah"})
# test codes
```

This PR proposes to use this util where possible in PySpark tests.

Note that there look already few places affecting tests without restoring 
the original value back in unittest classes.

Manually tested via:

```
./run-tests --modules=pyspark-sql --python-executables=python2
./run-tests --modules=pyspark-sql --python-executables=python3
```

Author: hyukjinkwon 

Closes #20830 from HyukjinKwon/cleanup-sql-conf.

commit 3056e3c469209d72c97046f9668e30e0dbc5818d
Author: hyukjinkwon 
Date:   2018-03-20T05:27:26Z

Extracts and brings sql_conf util




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1630/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20774
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20774: [SPARK-23549][SQL] Cast to timestamp when comparing time...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20774
  
**[Test build #88403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88403/testReport)**
 for PR 20774 at commit 
[`a3dc357`](https://github.com/apache/spark/commit/a3dc35716bc73376155a6ea3594cbe575dac0c46).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20844: [SPARK-23707][SQL] Fresh 'initRange' name to avoi...

2018-03-19 Thread ConeyLiu
Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20844#discussion_r175658889
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -396,9 +396,11 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
 // The default size of a batch, which must be positive integer
 val batchSize = 1000
 
-val initRangeFuncName = ctx.addNewFunction("initRange",
+val initRange = ctx.freshName("initRange")
+
+val initRangeFuncName = ctx.addNewFunction(initRange,
   s"""
-| private void initRange(int idx) {
+| private void ${initRange}(int idx) {
--- End diff --

Hi @cloud-fan , before adding the comments, I have a question about why we 
still need `exchange ` if we join two `spark.range(1, 10, 1, 1)`. Because of 
both of the `range` are only one partition, does the `exchange` really needed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue:

https://github.com/apache/spark/pull/19819
  
It will create a new consumer for each thread. This could be quite resource 
consuming when several topics shared with thread pools.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88400/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20433
  
**[Test build #88400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88400/testReport)**
 for PR 20433 at commit 
[`e780fd2`](https://github.com/apache/spark/commit/e780fd2ae562cd7f9ade80cc28e0ca44f6b1cf7d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20830
  
A .. 
https://github.com/apache/spark/commit/d6632d185e147fcbe6724545488ad80dce20277e 
added the util into master only ... 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r175657704
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala
 ---
@@ -178,81 +171,76 @@ object InterpretedUnsafeProjection extends 
UnsafeProjectionCreator {
 
   case StructType(fields) =>
 val numFields = fields.length
-val rowWriter = new UnsafeRowWriter(bufferHolder, numFields)
-val structWriter = generateStructWriter(bufferHolder, rowWriter, 
fields)
+val rowWriter = new UnsafeRowWriter(writer, numFields)
+val structWriter = generateStructWriter(rowWriter, fields)
 (v, i) => {
-  val tmpCursor = bufferHolder.cursor
+  rowWriter.markCursor()
   v.getStruct(i, fields.length) match {
 case row: UnsafeRow =>
   writeUnsafeData(
-bufferHolder,
+rowWriter,
 row.getBaseObject,
 row.getBaseOffset,
 row.getSizeInBytes)
 case row =>
   // Nested struct. We don't know where this will start 
because a row can be
   // variable length, so we need to update the offsets and 
zero out the bit mask.
-  rowWriter.reset()
+  rowWriter.resetRowWriter()
   structWriter.apply(row)
   }
-  writer.setOffsetAndSize(i, tmpCursor, bufferHolder.cursor - 
tmpCursor)
+  writer.setOffsetAndSizeFromMark(i)
 }
 
   case ArrayType(elementType, containsNull) =>
-val arrayWriter = new UnsafeArrayWriter
-val elementSize = getElementSize(elementType)
+val arrayWriter = new UnsafeArrayWriter(writer, 
getElementSize(elementType))
 val elementWriter = generateFieldWriter(
-  bufferHolder,
   arrayWriter,
   elementType,
   containsNull)
 (v, i) => {
-  val tmpCursor = bufferHolder.cursor
-  writeArray(bufferHolder, arrayWriter, elementWriter, 
v.getArray(i), elementSize)
-  writer.setOffsetAndSize(i, tmpCursor, bufferHolder.cursor - 
tmpCursor)
+  arrayWriter.markCursor()
--- End diff --

From the performance view, this abstraction may have more performance 
impact since we move temporal value on local frame into [that on Java 
stack](https://github.com/apache/spark/pull/20850/files#diff-e68c5a074209b9a20ee2aa42936571ceR103)
```
arrayWriter.markCursor()
writeArray(arrayWriter, elementWriter, v.getArray(i))
writer.setOffsetAndSizeFromMark(i)
```

Is this implementation enough from the balance of performance and 
abstraction? Or, is it better to do like this?
```
val mark = arrayWriter.cursor()
writeArray(arrayWriter, elementWriter, v.getArray(i))
writer.setOffsetAndSizeFromMark(i, mark)
```

@maropo @hvanhovell WDYT? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-19 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r17565
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -86,11 +88,39 @@ public void grow(int neededSize) {
 }
   }
 
-  public void reset() {
+  byte[] buffer() {
+return buffer;
+  }
+
+  int getCursor() {
+return cursor;
+  }
+
+  void incrementCursor(int val) {
+cursor += val;
+  }
+
+  int pushCursor() {
--- End diff --

Since one `BufferHolder` is shared by multiple `UnsafeWriter`s, it seems to 
be simple to store cursors into `BufferHolders`.
WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20830
  
Sure, will open a PR soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20830
  
Hmm, it looks like the conflict is just one block with group agg tests, 
probably not a big deal - you want to take a look?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20830
  
But I am willing to do this if you think it's better to do this. No 
objection.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20830
  
Yup, that was exactly what I thought. I think it's fine to not bother 
backport too since it has conflicts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20830
  
The cherry pick to branch-2.3 did have some conflicts.  Just to check for 
the reason to backport, even though this isn't a bug it's pretty safe and will 
help keep things inline so less conflicts for future backports?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20830
  
Thanks for reviewing and merging this @ueshin, @felixcheung, @BryanCutler 
and @dongjoon-hyun.

(Just FYI, I usually manually resolve JIRAs when I accidentally failed to 
take an action with the merge script. I think that's fine.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88396/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88396 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88396/testReport)**
 for PR 20579 at commit 
[`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpark tests...

2018-03-19 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20830
  
Merged to master!  (I think it went ok..)  Thanks @HyukjinKwon !!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20830: [SPARK-23691][PYTHON] Use sql_conf util in PySpar...

2018-03-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20830


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175651639
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -72,6 +82,14 @@ private[parquet] object ParquetFilters {
   (n: String, v: Any) => FilterApi.notEq(
 binaryColumn(n),
 Option(v).map(b => 
Binary.fromReusedByteArray(v.asInstanceOf[Array[Byte]])).orNull)
+case DateType if SQLConf.get.parquetFilterPushDownDate =>
+  (n: String, v: Any) => {
--- End diff --

nit:

```
(n: String, v: Any) => FilterApi.notEq(
 intColumn(n),
 Option(v).map { d =>
   
DateTimeUtils.fromJavaDate(d.asInstanceOf[java.sql.Date]).asInstanceOf[Integer]
 }.orNull)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175649724
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -76,7 +77,9 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   expected: Seq[Row]): Unit = {
 val output = predicate.collect { case a: Attribute => a }.distinct
 
-withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true") {
+withSQLConf(
+  SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
+  SQLConf.PARQUET_FILTER_PUSHDOWN_DATE_ENABLED.key -> "true") {
--- End diff --

nit: 
```scala
withSQLConf(
  SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
  SQLConf.PARQUET_FILTER_PUSHDOWN_DATE_ENABLED.key -> "true",
  SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> "false") {
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20851: [SPARK-23727][SQL] Support for pushing down filte...

2018-03-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20851#discussion_r175650228
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -313,6 +316,36 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - date") {
+implicit class IntToDate(int: Int) {
--- End diff --

Shall we pass a string here and convert it into a date?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20862
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1629/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20862
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20862: [SPARK-23744][CORE]Fix memory leak in ReadableChannelFil...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20862
  
**[Test build #88402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88402/testReport)**
 for PR 20862 at commit 
[`5b58c57`](https://github.com/apache/spark/commit/5b58c57607551328c893a3857717e4b159ecf841).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20862: [SPARK-23744][CORE]Fix memory leak in ReadableCha...

2018-03-19 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/20862

[SPARK-23744][CORE]Fix memory leak in ReadableChannelFileRegion

## What changes were proposed in this pull request?
In the class `ReadableChannelFileRegion`,  the `buffer` is direct memory, 
we should  modify `deallocate` to free it, and `deallocate`  will be called by 
`release`

## How was this patch tested?
existing unit test 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark leakmem

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20862.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20862


commit 5b58c57607551328c893a3857717e4b159ecf841
Author: liuxian 
Date:   2018-03-20T03:19:59Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20827
  
**[Test build #88401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88401/testReport)**
 for PR 20827 at commit 
[`cd3dcc6`](https://github.com/apache/spark/commit/cd3dcc6299888b8119e2185fb6f79e8445631bca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20860
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1628/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20860
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88397/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20860
  
**[Test build #88397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88397/testReport)**
 for PR 20860 at commit 
[`192ce30`](https://github.com/apache/spark/commit/192ce305f05d4280c5c35b94a3666d313dab2733).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17272: [SPARK-19724][SQL]create a managed table with an existed...

2018-03-19 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/17272
  
@gatorsmile I will take it over :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20757: [SPARK-23595][SQL] ValidateExternalType should support i...

2018-03-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/20757
  
ping


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20433
  
**[Test build #88400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88400/testReport)**
 for PR 20433 at commit 
[`e780fd2`](https://github.com/apache/spark/commit/e780fd2ae562cd7f9ade80cc28e0ca44f6b1cf7d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20433: [SPARK-23264][SQL] Make INTERVAL keyword optional in INT...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20433
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1627/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20861
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20861
  
**[Test build #88399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88399/testReport)**
 for PR 20861 at commit 
[`306dbe8`](https://github.com/apache/spark/commit/306dbe8e26f2045b0d133e07455dedae058c0311).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expre...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20861
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1626/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...

2018-03-19 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/20861

[SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expression

## What changes were proposed in this pull request?

As stated in Jira, there are problems with current `Uuid` expression which 
uses `java.util.UUID.randomUUID` for UUID generation.

This patch uses the newly added `RandomUUIDGenerator` for UUID generation. 
So we can make `Uuid` deterministic between retries.

## How was this patch tested?

Added unit tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-23599-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20861


commit 306dbe8e26f2045b0d133e07455dedae058c0311
Author: Liang-Chi Hsieh 
Date:   2018-03-20T03:11:33Z

Use new UUID generator in Uuid expression.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20796
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20796
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88394/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20796: [SPARK-23649][SQL] Skipping chars disallowed in UTF-8

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20796
  
**[Test build #88394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88394/testReport)**
 for PR 20796 at commit 
[`5557a80`](https://github.com/apache/spark/commit/5557a80d4674e929332d9441342e5b90e314eb45).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2018-03-19 Thread tengpeng
Github user tengpeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/19666#discussion_r175646373
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala 
---
@@ -152,15 +152,13 @@ private[spark] object DecisionTreeMetadata extends 
Logging {
 // TODO(SPARK-9957): Handle this properly by filtering out those 
features.
 if (numCategories > 1) {
   // Decide if some categorical features should be treated as 
unordered features,
--- End diff --

Change , to .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19666: [SPARK-22451][ML] Reduce decision tree aggregate ...

2018-03-19 Thread tengpeng
Github user tengpeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/19666#discussion_r175646335
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/DecisionTreeMetadata.scala 
---
@@ -152,15 +152,13 @@ private[spark] object DecisionTreeMetadata extends 
Logging {
 // TODO(SPARK-9957): Handle this properly by filtering out those 
features.
 if (numCategories > 1) {
   // Decide if some categorical features should be treated as 
unordered features,
--- End diff --

Change , to . 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20853
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88389/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20853
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20853: [SPARK-23729][CORE] Respect URI fragment when resolving ...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20853
  
**[Test build #88389 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88389/testReport)**
 for PR 20853 at commit 
[`8a12452`](https://github.com/apache/spark/commit/8a124522519ed4f8fb750555f1a596c9f97b6947).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20827
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20827: [SPARK-23666][SQL] Do not display exprIds of Alias in us...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20827
  
**[Test build #88393 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88393/testReport)**
 for PR 20827 at commit 
[`bee3711`](https://github.com/apache/spark/commit/bee3711074a7d34cf19e8794f837b70eddaffbe0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class PrettyAttribute(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20818: [SPARK-23675][WEB-UI]Title add spark logo, use sp...

2018-03-19 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20818#discussion_r175644012
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -224,6 +224,7 @@ private[spark] object UIUtils extends Logging {
 {commonHeaderNodes}
 {if (showVisualization) vizHeaderNodes else Seq.empty}
 {if (useDataTables) dataTablesHeaderNodes else Seq.empty}
+
--- End diff --

Seems it should be `prependBaseUri("/static/spark-logo-77x50px-hd.png")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20818: [SPARK-23675][WEB-UI]Title add spark logo, use sp...

2018-03-19 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20818#discussion_r175644032
  
--- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
@@ -265,6 +266,7 @@ private[spark] object UIUtils extends Logging {
   
 {commonHeaderNodes}
 {if (useDataTables) dataTablesHeaderNodes else Seq.empty}
+
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20687
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88391/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20687
  
**[Test build #88391 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88391/testReport)**
 for PR 20687 at commit 
[`5926301`](https://github.com/apache/spark/commit/592630148af19adbb72703dd1ff49f82c33087d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1625/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19001: [SPARK-19256][SQL] Hive bucketing support

2018-03-19 Thread chrysan
Github user chrysan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19001#discussion_r175640958
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -184,6 +189,43 @@ case class InsertIntoHadoopFsRelationCommand(
 Seq.empty[Row]
   }
 
+  private def getBucketIdExpression(dataColumns: Seq[Attribute]): 
Option[Expression] = {
+bucketSpec.map { spec =>
+  val bucketColumns = spec.bucketColumnNames.map(c => 
dataColumns.find(_.name == c).get)
+  // Use `HashPartitioning.partitionIdExpression` as our bucket id 
expression, so that we can
+  // guarantee the data distribution is same between shuffle and 
bucketed data source, which
+  // enables us to only shuffle one side when join a bucketed table 
and a normal one.
+  HashPartitioning(
+bucketColumns,
+spec.numBuckets,
+classOf[Murmur3Hash]
+  ).partitionIdExpression
+}
+  }
+
+  /**
+   * How is `requiredOrdering` determined ?
--- End diff --

Why the definition of requiredOrdering here differs from that in 
InsertIntoHiveTable? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2018-03-19 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18666
  
@gatorsmile would you plz take a look at this.

this pr mainly want to close HiveSessionState explicitly to delete 
`hive.downloaded.resources.dir` which points to `"${system:java.io.tmpdir}" + 
File.separator + "${hive.session.id}_resources"`  by default 
`hive.exec.local.scratchdir` which points to `"${system:java.io.tmpdir}" + 
File.separator + "${system:user.name}"` by default and some other dirs which 
used only for hive but without deleting hook on shutdown.

the below code is how HiveSessionState create 
`hive.downloaded.resources.dir`, `isCleanUp` is set to `false`. 

```scala
// 3. Download resources dir
path = new Path(HiveConf.getVar(conf, 
HiveConf.ConfVars.DOWNLOADED_RESOURCES_DIR));
createPath(conf, path, scratchDirPermission, true, **isCleanUp** = false);

 Plenty of unused dirs left after submit a lot of Hive supported spark 
applications.
![popo_2018-03-20 
10-28-34](https://user-images.githubusercontent.com/8326978/37632505-7eacbec2-2c29-11e8-94b5-229ba193339f.jpg)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19108
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88398/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19108
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19108
  
**[Test build #88398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88398/testReport)**
 for PR 19108 at commit 
[`ccd22f5`](https://github.com/apache/spark/commit/ccd22f553a37ba166dd2881cb965edc19ff653fc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class JavaKolmogorovSmirnovTestSuite extends SharedSparkSession 
`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20829
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88395/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20829: [SPARK-23690][ML] Add handleinvalid to VectorAssembler

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20829
  
**[Test build #88395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88395/testReport)**
 for PR 20829 at commit 
[`ab91545`](https://github.com/apache/spark/commit/ab91545bebc6e1d0c5c3cb7c15156d546ad48f81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20657: [SPARK-23361][yarn] Allow AM to restart after initial to...

2018-03-19 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20657
  
LGTM, just one small comment.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18666
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1624/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20657: [SPARK-23361][yarn] Allow AM to restart after ini...

2018-03-19 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20657#discussion_r175638637
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosHadoopDelegationTokenManager.scala
 ---
@@ -105,7 +105,8 @@ private[spark] class MesosHadoopDelegationTokenManager(
 case e: Exception =>
   // Log the error and try to write new tokens back in an hour
   logWarning("Couldn't broadcast tokens, trying again in an 
hour", e)
--- End diff --

Shall we update the log to reflect the configured waiting hour.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2018-03-19 Thread yaooqinn
Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/18666
  
@samartinucci thanks for reminding of this, i have fixed the conflicts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20847
  
@mgaido91 this is already merged to branch 2.3. Please close this PR if it 
is not closed automatically.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20847: [SPARK-23644][CORE][UI][BACKPORT-2.3] Use absolute path ...

2018-03-19 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20847
  
Thanks, merging to branch 2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19108
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19108
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1623/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20860
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20860
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1622/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20860
  
**[Test build #88397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88397/testReport)**
 for PR 20860 at commit 
[`192ce30`](https://github.com/apache/spark/commit/192ce305f05d4280c5c35b94a3666d313dab2733).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19108: [SPARK-21898][ML] Feature parity for KolmogorovSmirnovTe...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19108
  
**[Test build #88398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88398/testReport)**
 for PR 19108 at commit 
[`ccd22f5`](https://github.com/apache/spark/commit/ccd22f553a37ba166dd2881cb965edc19ff653fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20860: [SPARK-23743][SQL] Changed a comparison logic fro...

2018-03-19 Thread jongyoul
GitHub user jongyoul opened a pull request:

https://github.com/apache/spark/pull/20860

[SPARK-23743][SQL] Changed a comparison logic from containing 'slf4j' to 
starting with 'org.slf4j'

## What changes were proposed in this pull request?
isSharedClass returns if some classes can/should be shared or not. It 
checks if the classes names have some keywords or start with some names. 
Following the logic, it can occur unintended behaviors when a custom package 
has `slf4j` inside the package or class name. As I guess, the first intention 
seems to figure out the class containing `org.slf4j`. It would be better to 
change the comparison logic to `name.startsWith("org.slf4j")`

## How was this patch tested?
This patch should pass all of the current tests and keep all of the current 
behaviors. In my case, I'm using ProtobufDeserializer to get a table schema 
from hive tables. Thus some Protobuf packages and names have `slf4j` inside. 
Without this patch, it cannot be resolved because of ClassCastException from 
different classloaders.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jongyoul/spark SPARK-23743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20860


commit 192ce305f05d4280c5c35b94a3666d313dab2733
Author: Jongyoul Lee 
Date:   2018-03-20T01:45:44Z

Changed a comparison logic from containing 'slf4j' to starting with 
'org.slf4j'




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20579
  
**[Test build #88396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88396/testReport)**
 for PR 20579 at commit 
[`3392305`](https://github.com/apache/spark/commit/339230570eab374619477f7c0d68f3451d7ff90b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1621/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20579
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20844: [SPARK-23707][SQL] Fresh 'initRange' name to avoi...

2018-03-19 Thread ConeyLiu
Github user ConeyLiu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20844#discussion_r175634287
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala
 ---
@@ -396,9 +396,11 @@ case class RangeExec(range: 
org.apache.spark.sql.catalyst.plans.logical.Range)
 // The default size of a batch, which must be positive integer
 val batchSize = 1000
 
-val initRangeFuncName = ctx.addNewFunction("initRange",
+val initRange = ctx.freshName("initRange")
+
+val initRangeFuncName = ctx.addNewFunction(initRange,
   s"""
-| private void initRange(int idx) {
+| private void ${initRange}(int idx) {
--- End diff --

OK, I can just some comments and keep the code unchanged. I changed it here 
just for better code robustness.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20579: [SPARK-23372][SQL] Writing empty struct in parquet fails...

2018-03-19 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/20579
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >