[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19598
  
I see. It makes sense for debugability. On the other hand, the NULLable 
property in schema is very important for optimizations to achieve better 
performance.

One of ideas is to add conf for debug mode to enforce insertion of null 
check at reading data from an iterator or column vector.
 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/19658
  
Thanks a lot!! @gatorsmile @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19658: [SPARK-22443][SQL]add implementation of quoteIden...

2017-11-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19658


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19658
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19658
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19656
  
**[Test build #83459 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83459/testReport)**
 for PR 19656 at commit 
[`35c38d0`](https://github.com/apache/spark/commit/35c38d04a1ed7fbc0f637a54c797fc3b26e871a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

2017-11-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19598


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19656
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19598
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19647
  
Thanks! Merged to master/2.2


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pus...

2017-11-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19647


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83458/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83458 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83458/testReport)**
 for PR 19657 at commit 
[`0ea7c9b`](https://github.com/apache/spark/commit/0ea7c9b1c26c604296c35bc1588a6a5606a10cb2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19657
  
**[Test build #83458 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83458/testReport)**
 for PR 19657 at commit 
[`0ea7c9b`](https://github.com/apache/spark/commit/0ea7c9b1c26c604296c35bc1588a6a5606a10cb2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19657
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19657
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83457/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/19657
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83456/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19657: [SPARK-22344][SPARKR] clean up install dir if running te...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19657
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19658
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83455/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19658
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19658
  
**[Test build #83455 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83455/testReport)**
 for PR 19658 at commit 
[`ed6ed37`](https://github.com/apache/spark/commit/ed6ed37c2d4c42c26f2c7a48cf82692f4138a54f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83454/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19647
  
**[Test build #83454 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83454/testReport)**
 for PR 19647 at commit 
[`0e0123b`](https://github.com/apache/spark/commit/0e0123b0887aa33c382a63ee7843be5f64d38b41).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19598
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19598
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83453/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19598
  
**[Test build #83453 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83453/testReport)**
 for PR 19598 at commit 
[`0f798cb`](https://github.com/apache/spark/commit/0f798cb46d6982e9f643af04c4d9bc665164c699).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19427: [SPARK-22294][Deploy] Reset spark.driver.bindAddr...

2017-11-04 Thread ssaavedra
Github user ssaavedra commented on a diff in the pull request:

https://github.com/apache/spark/pull/19427#discussion_r148945009
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala ---
@@ -62,6 +63,7 @@ class Checkpoint(ssc: StreamingContext, val 
checkpointTime: Time)
 
 val newSparkConf = new SparkConf(loadDefaults = 
false).setAll(sparkConfPairs)
   .remove("spark.driver.host")
+  .remove("spark.driver.bindAddress")
--- End diff --

Yes. If it is not set in the new run, it should still be meaningless 
anyway. It makes sense to know this property on the subsequent calls to 
spark-submit. If we are resuming a checkpoint it means we are re-submitting 
work, but it may be run in a different cluster configuration, and thus we may 
want to change the bindAddress or this different configuration may even wish to 
rely on falling back to the `spark.driver.host` configuration. In any case, it 
should make no sense to keep the old setting, unless we are running on a static 
configuration, in which case it is not a caveat to remove this, as the 
command-line to re-launch the job can still re-populate the property if it 
needs to keep being the same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19658
  
**[Test build #83455 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83455/testReport)**
 for PR 19658 at commit 
[`ed6ed37`](https://github.com/apache/spark/commit/ed6ed37c2d4c42c26f2c7a48cf82692f4138a54f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19658: [SPARK-22443][SQL]add implementation of quoteIden...

2017-11-04 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19658#discussion_r148944273
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala ---
@@ -42,6 +42,18 @@ private class AggregatedDialect(dialects: 
List[JdbcDialect]) extends JdbcDialect
 dialects.flatMap(_.getJDBCType(dt)).headOption
   }
 
+  override def quoteIdentifier(colName: String): String = {
+dialects.map(_.quoteIdentifier(colName)).head
--- End diff --

Thanks @srowen for the comment. I will change. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19635
  
I start worrying about the behavior change introduced by this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19635: [SPARK-22413][SQL] Type coercion for IN is not co...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19635#discussion_r148943932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -422,7 +422,7 @@ object TypeCoercion {
 
 val commonTypes = lhs.zip(rhs).flatMap { case (l, r) =>
   findCommonTypeForBinaryComparison(l.dataType, r.dataType)
-.orElse(findTightestCommonType(l.dataType, r.dataType))
+.orElse(findWiderTypeForTwo(l.dataType, r.dataType))
--- End diff --

Why we make this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19635: [SPARK-22413][SQL] Type coercion for IN is not co...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19635#discussion_r148943889
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -451,7 +451,10 @@ object TypeCoercion {
 }
 
   case i @ In(a, b) if b.exists(_.dataType != a.dataType) =>
-findWiderCommonType(i.children.map(_.dataType)) match {
+findWiderCommonType(b.map(_.dataType)).flatMap { listDataType =>
+  findCommonTypeForBinaryComparison(listDataType, a.dataType)
+.orElse(findWiderTypeForTwo(listDataType, a.dataType))
--- End diff --

What is the reason we need to call `findWiderTypeForTwo `?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19647
  
**[Test build #83454 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83454/testReport)**
 for PR 19647 at commit 
[`0e0123b`](https://github.com/apache/spark/commit/0e0123b0887aa33c382a63ee7843be5f64d38b41).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19647
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19647
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19598
  
**[Test build #83453 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83453/testReport)**
 for PR 19598 at commit 
[`0f798cb`](https://github.com/apache/spark/commit/0f798cb46d6982e9f643af04c4d9bc665164c699).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83452/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19598
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19598
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83448/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19598
  
**[Test build #83448 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83448/testReport)**
 for PR 19598 at commit 
[`4262983`](https://github.com/apache/spark/commit/4262983f1c984c5e54881c7e087874ca753ea97c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19634: [SPARK-22412][SQL] Fix incorrect comment in DataS...

2017-11-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19634


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19634: [SPARK-22412][SQL] Fix incorrect comment in DataSourceSc...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19634
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19634: [SPARK-22412][SQL] Fix incorrect comment in DataSourceSc...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19634
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19634: [SPARK-22412][SQL] Fix incorrect comment in DataS...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19634#discussion_r148937385
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -469,7 +469,7 @@ case class FileSourceScanExec(
   currentSize = 0
 }
 
-// Assign files to partitions using "First Fit Decreasing" (FFD)
+// Assign files to partitions using "Next Fit Decreasing"
--- End diff --

This is correct. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19658
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83451/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19658
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19655: [SPARK-22444][core] Spark History Server missing /enviro...

2017-11-04 Thread ambud
Github user ambud commented on the issue:

https://github.com/apache/spark/pull/19655
  
ok; @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19655: [SPARK-22444][core] Spark History Server missing ...

2017-11-04 Thread ambud
Github user ambud closed the pull request at:

https://github.com/apache/spark/pull/19655


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19652
  
Thanks for working on it. Will review it next week.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19658: [SPARK-22443][SQL]add implementation of quoteIdentifier,...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19658
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19598
  
LGTM

BTW, now, more and more optimization are added based on the NULLable 
property in schema. Normally, the database enforces null checking during 
inserting/updating the data, but we do not enforce it because the data is not 
managed by us. If the users provide wrong nullable info in the schema. This 
might be harder for us to debug these runtime issues.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19598#discussion_r148936939
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 ---
@@ -309,6 +318,9 @@ case class GetMapValue(child: Expression, key: 
Expression)
 val found = ctx.freshName("found")
 val key = ctx.freshName("key")
 val values = ctx.freshName("values")
+val nullCheck = if 
(child.dataType.asInstanceOf[MapType].valueContainsNull) {
+  s" || $values.isNullAt($index)"
+} else ""
--- End diff --

The same here


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19598#discussion_r148936905
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 ---
@@ -186,6 +186,14 @@ case class GetArrayStructFields(
   val values = ctx.freshName("values")
   val j = ctx.freshName("j")
   val row = ctx.freshName("row")
+  val nullSafeEval = if (field.nullable) {
+s"""
+ if ($row.isNullAt($ordinal)) {
+   $values[$j] = null;
+ } else
+"""
+  } else ""
--- End diff --

Nit:
```Scala
} else {
  ""
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19647: [SPARK-22211][SQL] Remove incorrect FOJ limit pushdown

2017-11-04 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19647
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19631: [SPARK-22372][core, yarn] Make cluster submission use Sp...

2017-11-04 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/19631
  
That's an expected message from the build and has no impact on it. I'll
change the build config to suppress the message when I'm back from holiday
late next week.

On Nov 1, 2017 6:11 PM, "Marcelo Vanzin"  wrote:

@shaneknapp  any idea what's going on?

chmod: cannot access `target/*': No such file or directory

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
, or mute
the thread


.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19652
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83446/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19652
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19652
  
**[Test build #83446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83446/testReport)**
 for PR 19652 at commit 
[`0d706ff`](https://github.com/apache/spark/commit/0d706fffb133c1d685e4aaa0b62758d0826bad62).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19656
  
**[Test build #83450 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83450/testReport)**
 for PR 19656 at commit 
[`7ccc3ea`](https://github.com/apache/spark/commit/7ccc3ea3207a89b8c56d224094e926ffeb800f20).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83450/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19656
  
**[Test build #83450 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83450/testReport)**
 for PR 19656 at commit 
[`7ccc3ea`](https://github.com/apache/spark/commit/7ccc3ea3207a89b8c56d224094e926ffeb800f20).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19656
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935645
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 ---
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.io.IOException
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io._
+import org.apache.orc.{OrcFile, TypeDescription}
+import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
+import org.apache.orc.storage.common.`type`.HiveDecimal
+import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+object OrcUtils extends Logging {
+
+  def listOrcFiles(pathStr: String, conf: Configuration): Seq[Path] = {
+val origPath = new Path(pathStr)
+val fs = origPath.getFileSystem(conf)
+val paths = SparkHadoopUtil.get.listLeafStatuses(fs, origPath)
+  .filterNot(_.isDirectory)
+  .map(_.getPath)
+  .filterNot(_.getName.startsWith("_"))
+  .filterNot(_.getName.startsWith("."))
+paths
+  }
+
+  private[orc] def readSchema(file: Path, conf: Configuration): 
Option[TypeDescription] = {
+try {
+  val fs = file.getFileSystem(conf)
+  val readerOptions = OrcFile.readerOptions(conf).filesystem(fs)
+  val reader = OrcFile.createReader(file, readerOptions)
+  val schema = reader.getSchema
+  if (schema.getFieldNames.size == 0) {
+None
+  } else {
+Some(schema)
+  }
+} catch {
+  case _: IOException => None
+}
+  }
+
+  private[orc] def readSchema(sparkSession: SparkSession, files: 
Seq[FileStatus])
+  : Option[StructType] = {
+val conf = sparkSession.sparkContext.hadoopConfiguration
+files.map(_.getPath).flatMap(readSchema(_, conf)).headOption.map { 
schema =>
+  logDebug(s"Reading schema from file $files, got Hive schema string: 
$schema")
+  
CatalystSqlParser.parseDataType(schema.toString).asInstanceOf[StructType]
+}
+  }
+
+  private[orc] def getSchemaString(schema: StructType): String = {
+schema.fields.map(f => 
s"${f.name}:${f.dataType.catalogString}").mkString("struct<", ",", ">")
+  }
+
+  private[orc] def getTypeDescription(dataType: DataType) = dataType match 
{
+case st: StructType => TypeDescription.fromString(getSchemaString(st))
+case _ => TypeDescription.fromString(dataType.catalogString)
+  }
+
+  /**
+   * Return a missing schema in a give ORC file.
+   */
+  private[orc] def getMissingSchema(
+  resolver: Resolver,
+  dataSchema: StructType,
+  partitionSchema: StructType,
+  file: Path,
+  conf: Configuration): Option[StructType] = {
+try {
+  val fs = file.getFileSystem(conf)
+  val readerOptions = OrcFile.readerOptions(conf).filesystem(fs)
+  val reader = OrcFile.createReader(file, readerOptions)
+  val schema = reader.getSchema
+  if (schema.getFieldNames.size == 0) {
+None
+  } else {
+val orcSchema = if 
(schema.getFieldNames.asScala.forall(_.startsWith("_col"))) {
+  

[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935592
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala ---
@@ -22,9 +22,10 @@ import org.apache.hadoop.fs.Path
 import org.apache.hadoop.hive.ql.io.orc.{OrcFile, Reader}
 import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector
 
-import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.execution.datasources.orc.OrcUtils
+import org.apache.spark.sql.hive.HiveShim
 import org.apache.spark.sql.types.StructType
 
 private[hive] object OrcFileOperator extends Logging {
--- End diff --

shall we merge this class to `OrcUtils`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935499
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOutputWriter.scala
 ---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.io.NullWritable
+import org.apache.hadoop.mapreduce._
+import org.apache.orc.mapred.OrcStruct
+import org.apache.orc.mapreduce.OrcOutputFormat
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources.OutputWriter
+import org.apache.spark.sql.types.StructType
+
+private[orc] class OrcOutputWriter(
+path: String,
+dataSchema: StructType,
+context: TaskAttemptContext)
+  extends OutputWriter {
+  private lazy val orcStruct: OrcStruct =
+OrcUtils.createOrcValue(dataSchema).asInstanceOf[OrcStruct]
+
+  private[this] val writableWrappers =
+dataSchema.fields.map(f => OrcUtils.getWritableWrapper(f.dataType))
+
+  private val recordWriter = {
+new OrcOutputFormat[OrcStruct]() {
+  override def getDefaultWorkFile(context: TaskAttemptContext, 
extension: String): Path = {
+new Path(path)
+  }
+}.getRecordWriter(context)
+  }
+
+  override def write(row: InternalRow): Unit = {
+recordWriter.write(
+  NullWritable.get,
+  OrcUtils.convertInternalRowToOrcStruct(
--- End diff --

ideally we should make it into a function a use it in `write`, like the old 
`OrcOutputWriter` did.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935413
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala
 ---
@@ -67,4 +67,11 @@ object OrcOptions {
 "snappy" -> "SNAPPY",
 "zlib" -> "ZLIB",
 "lzo" -> "LZO")
+
+  // The extensions for ORC compression codecs
+  val extensionsForCompressionCodecNames = Map(
--- End diff --

this doesn't belong to `OrcOptions`, maybe `OrcUtils`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935402
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument, 
SearchArgumentFactory}
+import org.apache.orc.storage.ql.io.sarg.SearchArgument.Builder
+import org.apache.orc.storage.serde2.io.HiveDecimalWritable
+
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.types._
+
+/**
+ * Utility functions to convert Spark data source filters to ORC filters.
+ */
+private[orc] object OrcFilters {
+
+  /**
+   * Create ORC filter as a SearchArgument instance.
+   */
+  def createFilter(schema: StructType, filters: Seq[Filter]): 
Option[SearchArgument] = {
+val dataTypeMap = schema.map(f => f.name -> f.dataType).toMap
+
+val convertibleFilters = for {
+  filter <- filters
+  _ <- buildSearchArgument(dataTypeMap, filter, 
SearchArgumentFactory.newBuilder())
--- End diff --

why call this function inside a loop? Can we put it at the beginning?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935336
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala
 ---
@@ -39,3 +58,134 @@ private[sql] object OrcFileFormat {
 names.foreach(checkFieldName)
   }
 }
+
+class DefaultSource extends OrcFileFormat
+
+/**
+ * New ORC File Format based on Apache ORC 1.4.1 and above.
+ */
+class OrcFileFormat
+  extends FileFormat
+  with DataSourceRegister
+  with Serializable {
+
+  override def shortName(): String = "orc"
+
+  override def toString: String = "ORC_1.4"
+
+  override def hashCode(): Int = getClass.hashCode()
+
+  override def equals(other: Any): Boolean = 
other.isInstanceOf[OrcFileFormat]
+
+  override def inferSchema(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  files: Seq[FileStatus]): Option[StructType] = {
+OrcUtils.readSchema(sparkSession, files)
+  }
+
+  override def prepareWrite(
+  sparkSession: SparkSession,
+  job: Job,
+  options: Map[String, String],
+  dataSchema: StructType): OutputWriterFactory = {
+val orcOptions = new OrcOptions(options, 
sparkSession.sessionState.conf)
+
+val conf = job.getConfiguration
+
+conf.set(MAPRED_OUTPUT_SCHEMA.getAttribute, 
OrcUtils.getSchemaString(dataSchema))
+
+conf.set(COMPRESS.getAttribute, orcOptions.compressionCodec)
+
+conf.asInstanceOf[JobConf]
+  
.setOutputFormat(classOf[org.apache.orc.mapred.OrcOutputFormat[OrcStruct]])
+
+new OutputWriterFactory {
+  override def newInstance(
+  path: String,
+  dataSchema: StructType,
+  context: TaskAttemptContext): OutputWriter = {
+new OrcOutputWriter(path, dataSchema, context)
+  }
+
+  override def getFileExtension(context: TaskAttemptContext): String = 
{
+val compressionExtension: String = {
+  val name = context.getConfiguration.get(COMPRESS.getAttribute)
+  OrcOptions.extensionsForCompressionCodecNames.getOrElse(name, "")
+}
+
+compressionExtension + ".orc"
+  }
+}
+  }
+
+  override def isSplitable(
+  sparkSession: SparkSession,
+  options: Map[String, String],
+  path: Path): Boolean = {
+true
+  }
+
+  override def buildReaderWithPartitionValues(
--- End diff --

we should override `buildReader` and return `GenericInternalRow` here. Then 
the parent class will merge the partition values and output `UnsafeRow`. This 
is what the current `OrcFileFormat` does and let's keep it first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935146
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 ---
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.io.IOException
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io._
+import org.apache.orc.{OrcFile, TypeDescription}
+import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
+import org.apache.orc.storage.common.`type`.HiveDecimal
+import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+object OrcUtils extends Logging {
+
+  def listOrcFiles(pathStr: String, conf: Configuration): Seq[Path] = {
+val origPath = new Path(pathStr)
+val fs = origPath.getFileSystem(conf)
+val paths = SparkHadoopUtil.get.listLeafStatuses(fs, origPath)
+  .filterNot(_.isDirectory)
+  .map(_.getPath)
+  .filterNot(_.getName.startsWith("_"))
+  .filterNot(_.getName.startsWith("."))
+paths
+  }
+
+  private[orc] def readSchema(file: Path, conf: Configuration): 
Option[TypeDescription] = {
+try {
+  val fs = file.getFileSystem(conf)
+  val readerOptions = OrcFile.readerOptions(conf).filesystem(fs)
+  val reader = OrcFile.createReader(file, readerOptions)
+  val schema = reader.getSchema
+  if (schema.getFieldNames.size == 0) {
+None
+  } else {
+Some(schema)
+  }
+} catch {
+  case _: IOException => None
+}
+  }
+
+  private[orc] def readSchema(sparkSession: SparkSession, files: 
Seq[FileStatus])
+  : Option[StructType] = {
+val conf = sparkSession.sparkContext.hadoopConfiguration
+files.map(_.getPath).flatMap(readSchema(_, conf)).headOption.map { 
schema =>
--- End diff --

shouldn't we do schema merging?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148935133
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 ---
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.io.IOException
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io._
+import org.apache.orc.{OrcFile, TypeDescription}
+import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
+import org.apache.orc.storage.common.`type`.HiveDecimal
+import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+object OrcUtils extends Logging {
+
+  def listOrcFiles(pathStr: String, conf: Configuration): Seq[Path] = {
+val origPath = new Path(pathStr)
+val fs = origPath.getFileSystem(conf)
+val paths = SparkHadoopUtil.get.listLeafStatuses(fs, origPath)
+  .filterNot(_.isDirectory)
+  .map(_.getPath)
+  .filterNot(_.getName.startsWith("_"))
+  .filterNot(_.getName.startsWith("."))
+paths
+  }
+
+  private[orc] def readSchema(file: Path, conf: Configuration): 
Option[TypeDescription] = {
+try {
+  val fs = file.getFileSystem(conf)
+  val readerOptions = OrcFile.readerOptions(conf).filesystem(fs)
+  val reader = OrcFile.createReader(file, readerOptions)
+  val schema = reader.getSchema
+  if (schema.getFieldNames.size == 0) {
+None
+  } else {
+Some(schema)
+  }
+} catch {
+  case _: IOException => None
+}
+  }
+
+  private[orc] def readSchema(sparkSession: SparkSession, files: 
Seq[FileStatus])
+  : Option[StructType] = {
+val conf = sparkSession.sparkContext.hadoopConfiguration
--- End diff --

`sparkSession.sessionState.newHadoopConf`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83449/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19635
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19635
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83445/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19635: [SPARK-22413][SQL] Type coercion for IN is not coherent ...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19635
  
**[Test build #83445 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83445/testReport)**
 for PR 19635 at commit 
[`c973bb7`](https://github.com/apache/spark/commit/c973bb7fc9df7cfed2cb00b0a1168056d02b6289).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19656: [SPARK-22445][SQL] move CodegenContext.copyResult...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19656#discussion_r148934478
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -213,19 +213,32 @@ trait CodegenSupport extends SparkPlan {
   }
 
   /**
-   * For optimization to suppress shouldStop() in a loop of 
WholeStageCodegen.
-   * Returning true means we need to insert shouldStop() into the loop 
producing rows, if any.
+   * Whether or not the result rows of this operator should be copied 
before putting into a buffer.
+   *
+   * If any operator inside WholeStageCodegen generate multiple rows from 
a single row (for
+   * example, Join), this should be true.
+   *
+   * If an operator starts a new pipeline, this should be false.
*/
-  def isShouldStopRequired: Boolean = {
-return shouldStopRequired && (this.parent == null || 
this.parent.isShouldStopRequired)
+  def needCopyResult: Boolean = {
+if (children.isEmpty) {
+  false
+} else if (children.length == 1) {
+  children.head.asInstanceOf[CodegenSupport].needStopCheck
--- End diff --

oops


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19598
  
**[Test build #83448 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83448/testReport)**
 for PR 19598 at commit 
[`4262983`](https://github.com/apache/spark/commit/4262983f1c984c5e54881c7e087874ca753ea97c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19598
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19598
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19598
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83447/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19656
  
LGTM except for two comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19656: [SPARK-22445][SQL] move CodegenContext.copyResult...

2017-11-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19656#discussion_r148933507
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
 ---
@@ -76,6 +76,20 @@ case class BroadcastHashJoinExec(
 streamedPlan.asInstanceOf[CodegenSupport].inputRDDs()
   }
 
+  override def needCopyResult: Boolean = joinType match {
+case _: InnerLike | LeftOuter | RightOuter =>
+  // For inner and outer joins, one row from the streamed side may 
produce multiple result rows,
+  // if the build side has duplicated keys. Then we need to copy the 
result rows before putting
+  // them in a buffer, because these result rows share one UnsafeRow 
instance. Note that here
+  // we wait for the broadcast to be finished, which is a no-op 
because it's already finished
+  // when we wait it in `doProduce`.
+  buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique
--- End diff --

Seems to be 
`!buildPlan.executeBroadcast[HashedRelation]().value.keyIsUnique`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19656: [SPARK-22445][SQL] move CodegenContext.copyResult...

2017-11-04 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19656#discussion_r148933391
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -213,19 +213,32 @@ trait CodegenSupport extends SparkPlan {
   }
 
   /**
-   * For optimization to suppress shouldStop() in a loop of 
WholeStageCodegen.
-   * Returning true means we need to insert shouldStop() into the loop 
producing rows, if any.
+   * Whether or not the result rows of this operator should be copied 
before putting into a buffer.
+   *
+   * If any operator inside WholeStageCodegen generate multiple rows from 
a single row (for
+   * example, Join), this should be true.
+   *
+   * If an operator starts a new pipeline, this should be false.
*/
-  def isShouldStopRequired: Boolean = {
-return shouldStopRequired && (this.parent == null || 
this.parent.isShouldStopRequired)
+  def needCopyResult: Boolean = {
+if (children.isEmpty) {
+  false
+} else if (children.length == 1) {
+  children.head.asInstanceOf[CodegenSupport].needStopCheck
--- End diff --

`needCopyResult` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19656
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83444/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19656: [SPARK-22445][SQL] move CodegenContext.copyResult to Cod...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19656
  
**[Test build #83444 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83444/testReport)**
 for PR 19656 at commit 
[`d1de300`](https://github.com/apache/spark/commit/d1de3002f0fad391cdea44011ce1458ca0348956).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148932552
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcOptions.scala
 ---
@@ -67,4 +67,11 @@ object OrcOptions {
 "snappy" -> "SNAPPY",
 "zlib" -> "ZLIB",
 "lzo" -> "LZO")
+
+  // The extensions for ORC compression codecs
+  val extensionsForCompressionCodecNames = Map(
--- End diff --

This is moved from [object 
ORCFileFormat](https://github.com/apache/spark/pull/19651/files#diff-01999ccbf13e95a0ea2d223f69d8ae23L265)
 in `sql/hive`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148932599
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala ---
@@ -87,15 +88,10 @@ private[hive] object OrcFileOperator extends Logging {
 getFileReader(path, 
conf).map(_.getObjectInspector.asInstanceOf[StructObjectInspector])
   }
 
-  def listOrcFiles(pathStr: String, conf: Configuration): Seq[Path] = {
-// TODO: Check if the paths coming in are already qualified and 
simplify.
-val origPath = new Path(pathStr)
-val fs = origPath.getFileSystem(conf)
-val paths = SparkHadoopUtil.get.listLeafStatuses(fs, origPath)
-  .filterNot(_.isDirectory)
-  .map(_.getPath)
-  .filterNot(_.getName.startsWith("_"))
-  .filterNot(_.getName.startsWith("."))
-paths
+  def setRequiredColumns(
--- End diff --

This is moved from [object 
ORCFileFormat](https://github.com/apache/spark/pull/19651/files#diff-01999ccbf13e95a0ea2d223f69d8ae23L265)
 inside `sql/hive`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r148932512
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
 ---
@@ -0,0 +1,370 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import java.io.IOException
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.io._
+import org.apache.orc.{OrcFile, TypeDescription}
+import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
+import org.apache.orc.storage.common.`type`.HiveDecimal
+import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
+
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.Resolver
+import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
+import org.apache.spark.sql.catalyst.parser.CatalystSqlParser
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+object OrcUtils extends Logging {
+
+  def listOrcFiles(pathStr: String, conf: Configuration): Seq[Path] = {
--- End diff --

This is moved from 
[OrcFileOperator](https://github.com/apache/spark/pull/19651/files#diff-7888d134b0898e6ec7914b2fc9c8a1b6L90)
 in `sql/hive`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-11-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19651
  
Hi, @cloud-fan and @gatorsmile .
According to the decision at #19571 , I made a ORCFileFormat under 
`sql/core` again.
Could you review this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19652
  
**[Test build #83446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83446/testReport)**
 for PR 19652 at commit 
[`0d706ff`](https://github.com/apache/spark/commit/0d706fffb133c1d685e4aaa0b62758d0826bad62).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19598: [SPARK-22378][SQL] Eliminate redundant null check in gen...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19598
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19598: [SPARK-22378][SQL] Eliminate redundant null check...

2017-11-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19598#discussion_r148931423
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 ---
@@ -242,9 +243,11 @@ case class GetArrayItem(child: Expression, ordinal: 
Expression)
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
 nullSafeCodeGen(ctx, ev, (eval1, eval2) => {
   val index = ctx.freshName("index")
+  val nullCheck = if 
(!child.dataType.asInstanceOf[ArrayType].containsNull) ""
+else s" || $eval1.isNullAt($index)"
--- End diff --

nit:
```
if {

} else {

}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >