[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19452
  
**[Test build #82533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82533/testReport)**
 for PR 19452 at commit 
[`8c2a39f`](https://github.com/apache/spark/commit/8c2a39fcb3e425a91d25505ae9d29ba8ac670e0e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  case class JoinConditionSplitPredicates(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19438
  
**[Test build #82534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82534/testReport)**
 for PR 19438 at commit 
[`49262d1`](https://github.com/apache/spark/commit/49262d1bc2bc30c635e727952eda2dc5612b887c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

2017-10-06 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19438#discussion_r143321930
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala
 ---
@@ -58,7 +58,7 @@ class QuantileSummariesSuite extends SparkFunSuite {
 if (data.nonEmpty) {
   val approx = summary.query(quant).get
   // The rank of the approximation.
-  val rank = data.count(_ < approx) // has to be <, not <= to be exact
+  val rank = data.count(_ <= approx)
--- End diff --

@srowen @jiangxb1987 @thunterdb Not sure I did the right change, is the 
previous comment (has to be <, not <= to be exact) still correct? Similar 
changes were made in `DataFrameStatSuite`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/19294
  
Thanks for the fix @szhem, great work !
Merged to master and 2.2.1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19294


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82532/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19394
  
**[Test build #82532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82532/testReport)**
 for PR 19394 at commit 
[`56089f5`](https://github.com/apache/spark/commit/56089f5ba65f1d7d9e11b76673bcde3df37cd240).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r143321383
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -274,19 +274,26 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 val byteArrayRdd = getByteArrayRdd()
 
 val results = ArrayBuffer[InternalRow]()
-byteArrayRdd.collect().foreach { bytes =>
-  decodeUnsafeRows(bytes).foreach(results.+=)
+byteArrayRdd.collect().foreach { rdd =>
+  decodeUnsafeRows(rdd._2).foreach(results.+=)
 }
 results.toArray
   }
 
+  private[spark] def executeCollectIterator(): (Long, 
Iterator[InternalRow]) = {
+val countsAndBytes = getByteArrayRdd().collect()
+val total = countsAndBytes.map(_._1).sum
+val rows = countsAndBytes.iterator.flatMap(rdd => 
decodeUnsafeRows(rdd._2))
--- End diff --

nit: `rdd` maybe confusing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-10-06 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18931#discussion_r143321347
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -181,7 +181,7 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 val codeWithShortFunctions = genGroupByCode(3)
 val (_, maxCodeSize1) = CodeGenerator.compile(codeWithShortFunctions)
 assert(maxCodeSize1 < 
SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.defaultValue.get)
-val codeWithLongFunctions = genGroupByCode(20)
+val codeWithLongFunctions = genGroupByCode(50)
--- End diff --

In my pr, I changed the code to just check if long functions have the 
larger value of max code size:

https://github.com/apache/spark/pull/19082/files#diff-0314224342bb8c30143ab784b3805d19R185
but, just increasing the value seems better.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18931
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r143321349
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -274,19 +274,26 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 val byteArrayRdd = getByteArrayRdd()
 
 val results = ArrayBuffer[InternalRow]()
-byteArrayRdd.collect().foreach { bytes =>
-  decodeUnsafeRows(bytes).foreach(results.+=)
+byteArrayRdd.collect().foreach { rdd =>
+  decodeUnsafeRows(rdd._2).foreach(results.+=)
 }
 results.toArray
   }
 
+  private[spark] def executeCollectIterator(): (Long, 
Iterator[InternalRow]) = {
+val countsAndBytes = getByteArrayRdd().collect()
+val total = countsAndBytes.map(_._1).sum
+val rows = countsAndBytes.iterator.flatMap(rdd => 
decodeUnsafeRows(rdd._2))
+(total, rows)
+  }
+
   /**
* Runs this query returning the result as an iterator of InternalRow.
*
* @note Triggers multiple jobs (one for each partition).
*/
   def executeToIterator(): Iterator[InternalRow] = {
-getByteArrayRdd().toLocalIterator.flatMap(decodeUnsafeRows)
+getByteArrayRdd().toLocalIterator.flatMap(rdd => 
decodeUnsafeRows(rdd._2))
--- End diff --

We don't need to collect the counts back to driver. Besides it should not 
be a rdd but bytes in the `flatMap`.

Maybe:

```scala
getByteArrayRdd().map(_._2).toLocalIterator.flatMap(decodeUnsafeRows(_))
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18931
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82531/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18931
  
**[Test build #82531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82531/testReport)**
 for PR 18931 at commit 
[`601c225`](https://github.com/apache/spark/commit/601c2251c397b30f2ea9a42f6a23e3636129d5bc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-10-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18931#discussion_r143321131
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ---
@@ -181,7 +181,7 @@ class WholeStageCodegenSuite extends SparkPlanTest with 
SharedSQLContext {
 val codeWithShortFunctions = genGroupByCode(3)
 val (_, maxCodeSize1) = CodeGenerator.compile(codeWithShortFunctions)
 assert(maxCodeSize1 < 
SQLConf.WHOLESTAGE_HUGE_METHOD_LIMIT.defaultValue.get)
-val codeWithLongFunctions = genGroupByCode(20)
+val codeWithLongFunctions = genGroupByCode(50)
--- End diff --

We reduce the length of generated codes. So to make this test work, we 
increase the number of expressions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19452
  
**[Test build #82533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82533/testReport)**
 for PR 19452 at commit 
[`8c2a39f`](https://github.com/apache/spark/commit/8c2a39fcb3e425a91d25505ae9d29ba8ac670e0e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19447
  
@gatorsmile I am happy to take it. We have [one test 
case](https://github.com/apache/spark/pull/16648/files#diff-c7f041fda7fd9aa1a5225f86bab4b1b0R69)
 that cause the error `JVM limit of 0x`. I am not sure this new option may 
help this or not. I will try it this long weekend (Monday is a holiday in 
Japan).

Ideally, we want to find better metric rather than string length of the 
class file. 
Since constant pool entry is used for [various 
purposes](https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html#jvms-4.4),
 I imagine that the majorities in this cause is `CONSTANT_Fieldref`. (i.e. 
instance variable). It may be helpful to watch # of instance variables in 
`CodegenContext.mutableStates`. cc: @rednaxelafx



@maropu it would be good to make `currClassSize` more precise.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19447
  
How about making a separated function for checking the threshold and then 
test it like #18810: 
https://github.com/apache/spark/pull/18810/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cR363


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19294
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19294
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82529/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19294
  
**[Test build #82529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82529/testReport)**
 for PR 19294 at commit 
[`f55b7c2`](https://github.com/apache/spark/commit/f55b7c2e14ea9cb49255359b294e6d275cda4380).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19090: [SPARK-21877][DEPLOY, WINDOWS] Handle quotes in Windows ...

2017-10-06 Thread minixalpha
Github user minixalpha commented on the issue:

https://github.com/apache/spark/pull/19090
  
Thanks, @HyukjinKwon @jsnowacki @felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82530/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19394
  
**[Test build #82530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82530/testReport)**
 for PR 19394 at commit 
[`5b92fe2`](https://github.com/apache/spark/commit/5b92fe252530f4c8dab4d41772011e86ac308f80).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r143318406
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -929,7 +929,7 @@ class CodegenContext {
 // be extremely expensive in certain cases, such as deeply-nested 
expressions which operate over
 // inputs with wide schemas. For more details on the performance 
issues that motivated this
 // flat, see SPARK-15680.
-if (SparkEnv.get != null && 
SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
--- End diff --

Thank you for your comments. Is it better to pass `SQLConf` to the 
constructor of `CodegenContext`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19394
  
**[Test build #82532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82532/testReport)**
 for PR 19394 at commit 
[`56089f5`](https://github.com/apache/spark/commit/56089f5ba65f1d7d9e11b76673bcde3df37cd240).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r143317742
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
@@ -58,7 +58,7 @@ class ConfigBehaviorSuite extends QueryTest with 
SharedSQLContext {
   withSQLConf(SQLConf.RANGE_EXCHANGE_SAMPLE_SIZE_PER_PARTITION.key -> 
"1") {
 // If we only sample one point, the range boundaries will be 
pretty bad and the
 // chi-sq value would be very high.
-assert(computeChiSquareTest() > 1000)
+assert(computeChiSquareTest() > 300)
--- End diff --

Updating the SparkPlan code to avoid creating unnecessary RDDs fixed this 
test because the same random seed is used. But, I think we should keep this 
change so we don't have to track down this problem again in the future. This 
bound is perfectly safe because the balanced chi-sq value is 10, while the 
unbalanced is much larger.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r143317686
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 results.toArray
   }
 
+  private[spark] def executeCollectIterator(): (Long, 
Iterator[InternalRow]) = {
+val countsAndBytes = getByteArrayRdd().collect()
+val total = countsAndBytes.map(_._1).sum
+val rows = countsAndBytes.iterator.map(_._2).flatMap(decodeUnsafeRows)
+(total, rows)
+  }
+
   /**
* Runs this query returning the result as an iterator of InternalRow.
*
* @note Triggers multiple jobs (one for each partition).
*/
   def executeToIterator(): Iterator[InternalRow] = {
-getByteArrayRdd().toLocalIterator.flatMap(decodeUnsafeRows)
+getByteArrayRdd().toLocalIterator.map(_._2).flatMap(decodeUnsafeRows)
--- End diff --

Fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18931: [SPARK-21717][SQL] Decouple consume functions of physica...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18931
  
**[Test build #82531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82531/testReport)**
 for PR 18931 at commit 
[`601c225`](https://github.com/apache/spark/commit/601c2251c397b30f2ea9a42f6a23e3636129d5bc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread rdblue
Github user rdblue commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r143317518
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 results.toArray
   }
 
+  private[spark] def executeCollectIterator(): (Long, 
Iterator[InternalRow]) = {
+val countsAndBytes = getByteArrayRdd().collect()
+val total = countsAndBytes.map(_._1).sum
+val rows = countsAndBytes.iterator.map(_._2).flatMap(decodeUnsafeRows)
+(total, rows)
+  }
+
   /**
* Runs this query returning the result as an iterator of InternalRow.
*
* @note Triggers multiple jobs (one for each partition).
*/
   def executeToIterator(): Iterator[InternalRow] = {
-getByteArrayRdd().toLocalIterator.flatMap(decodeUnsafeRows)
+getByteArrayRdd().toLocalIterator.map(_._2).flatMap(decodeUnsafeRows)
--- End diff --

Good point. I guess from the tests that this will create a new RDD, which 
isn't what we want.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-06 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19082
  
Sure. I'm totally agreed. We need to know the advantages and possible 
impacts if any when merging this PR and #18931,. It is good @kiszk and 
@rednaxelafx can help review this PR and #18931.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82528/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19452
  
**[Test build #82528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82528/testReport)**
 for PR 19452 at commit 
[`026a33b`](https://github.com/apache/spark/commit/026a33bdd01c28327543b416d4716e53fb259181).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-06 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/19448
  
+1

I completely agree that using a ParquetOutputCommitter should be optional.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r143315153
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 results.toArray
   }
 
+  private[spark] def executeCollectIterator(): (Long, 
Iterator[InternalRow]) = {
+val countsAndBytes = getByteArrayRdd().collect()
+val total = countsAndBytes.map(_._1).sum
+val rows = countsAndBytes.iterator.map(_._2).flatMap(decodeUnsafeRows)
+(total, rows)
+  }
+
   /**
* Runs this query returning the result as an iterator of InternalRow.
*
* @note Triggers multiple jobs (one for each partition).
*/
   def executeToIterator(): Iterator[InternalRow] = {
-getByteArrayRdd().toLocalIterator.flatMap(decodeUnsafeRows)
+getByteArrayRdd().toLocalIterator.map(_._2).flatMap(decodeUnsafeRows)
--- End diff --

The same here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19394: [SPARK-22170][SQL] Reduce memory consumption in b...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19394#discussion_r143315127
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -280,13 +280,20 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] 
with Logging with Serializ
 results.toArray
   }
 
+  private[spark] def executeCollectIterator(): (Long, 
Iterator[InternalRow]) = {
+val countsAndBytes = getByteArrayRdd().collect()
+val total = countsAndBytes.map(_._1).sum
+val rows = countsAndBytes.iterator.map(_._2).flatMap(decodeUnsafeRows)
--- End diff --

-> `flatMap(o => decodeUnsafeRows(o._2))`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82527/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18460
  
**[Test build #82527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82527/testReport)**
 for PR 18460 at commit 
[`67a037c`](https://github.com/apache/spark/commit/67a037c053e596432f83dc0e2383bad895e9ce21).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-06 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18732#discussion_r143313284
  
--- Diff: python/pyspark/sql/group.py ---
@@ -192,7 +193,69 @@ def pivot(self, pivot_col, values=None):
 jgd = self._jgd.pivot(pivot_col)
 else:
 jgd = self._jgd.pivot(pivot_col, values)
-return GroupedData(jgd, self.sql_ctx)
+return GroupedData(jgd, self._df)
+
+@since(2.3)
+def apply(self, udf):
+"""
+Maps each group of the current :class:`DataFrame` using a pandas 
udf and returns the result
+as a :class:`DataFrame`.
+
+The user-defined function should take a `pandas.DataFrame` and 
return another
+`pandas.DataFrame`. For each group, all columns are passed 
together as a `pandas.DataFrame`
+to the user-function and the returned `pandas.DataFrame` are 
combined as a
+:class:`DataFrame`. The returned `pandas.DataFrame` can be 
arbitrary length and its schema
+must match the returnType of the pandas udf.
+
+:param udf: A wrapped udf function returned by 
:meth:`pyspark.sql.functions.pandas_udf`
+
+>>> from pyspark.sql.functions import pandas_udf
+>>> df = spark.createDataFrame(
+... [(1, 1.0), (1, 2.0), (2, 3.0), (2, 5.0), (2, 10.0)],
+... ("id", "v"))
+>>> @pandas_udf(returnType=df.schema)
+... def normalize(pdf):
+... v = pdf.v
+... return pdf.assign(v=(v - v.mean()) / v.std())
+>>> df.groupby('id').apply(normalize).show()  # doctest: +SKIP
++---+---+
+| id|  v|
++---+---+
+|  1|-0.7071067811865475|
+|  1| 0.7071067811865475|
+|  2|-0.8320502943378437|
+|  2|-0.2773500981126146|
+|  2| 1.1094003924504583|
++---+---+
+
+.. seealso:: :meth:`pyspark.sql.functions.pandas_udf`
+
+"""
+from pyspark.sql.functions import pandas_udf
+
+# Columns are special because hasattr always return True
+if isinstance(udf, Column) or not hasattr(udf, 'func') or not 
udf.vectorized:
+raise ValueError("The argument to apply must be a pandas_udf")
+if not isinstance(udf.returnType, StructType):
+raise ValueError("The returnType of the pandas_udf must be a 
StructType")
+
+df = self._df
+func = udf.func
+returnType = udf.returnType
--- End diff --

is it necessary to make all these copies?  I could understand maybe copying 
`func` and `columns` because they are in the wrapped function, but not sure if 
`df` and `returnType` need to be copied


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82526/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-06 Thread skonto
Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/19437#discussion_r143312903
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala
 ---
@@ -17,10 +17,14 @@
 
 package org.apache.spark.scheduler.cluster.mesos
 
-import org.apache.mesos.Protos.{ContainerInfo, Image, NetworkInfo, 
Parameter, Volume}
+import org.apache.mesos.Protos._
--- End diff --

avoid wild card imports if possible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-06 Thread skonto
Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/19437#discussion_r143312857
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala
 ---
@@ -122,7 +126,7 @@ private[mesos] object MesosSchedulerBackendUtil extends 
Logging {
 .toList
   }
 
-  def containerInfo(conf: SparkConf): ContainerInfo.Builder = {
+  def containerInfo(conf: SparkConf, secretConfig: MesosSecretConfig): 
ContainerInfo.Builder = {
--- End diff --

IMHO a better name would be createContainerInfo or buildContainerInfo


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19394
  
**[Test build #82530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82530/testReport)**
 for PR 19394 at commit 
[`5b92fe2`](https://github.com/apache/spark/commit/5b92fe252530f4c8dab4d41772011e86ac308f80).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19452
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19394
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r143312237
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -929,7 +929,7 @@ class CodegenContext {
 // be extremely expensive in certain cases, such as deeply-nested 
expressions which operate over
 // inputs with wide schemas. For more details on the performance 
issues that motivated this
 // flat, see SPARK-15680.
-if (SparkEnv.get != null && 
SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
--- End diff --

There are places in codebase which do `SQLConf.get` (esp. 
`CodeGenerator.scala` which is being modified in this PR). Are you suggesting 
to change that everywhere like #18568 ? I think it will be good thing to do.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19452
  
**[Test build #82526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82526/testReport)**
 for PR 19452 at commit 
[`c90fb3c`](https://github.com/apache/spark/commit/c90fb3cb1e112edeacb4be2604a7b628f55697f4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18460
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82523/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18460
  
**[Test build #82523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82523/testReport)**
 for PR 18460 at commit 
[`c72aa18`](https://github.com/apache/spark/commit/c72aa18ccbf23053a06faa35deda77ff1135b818).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/19394
  
Anyone have a clue what the python error could be? It doesn't look related.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r143307261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -929,7 +929,7 @@ class CodegenContext {
 // be extremely expensive in certain cases, such as deeply-nested 
expressions which operate over
 // inputs with wide schemas. For more details on the performance 
issues that motivated this
 // flat, see SPARK-15680.
-if (SparkEnv.get != null && 
SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
--- End diff --

See the PR: https://github.com/apache/spark/pull/18568


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-06 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/19267
  
Sorry I haven't been having time to look at anything around here lately. I 
recommend you make some noise in the mailing list or ping other people if you 
want to get attention to this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r143306861
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -929,7 +929,7 @@ class CodegenContext {
 // be extremely expensive in certain cases, such as deeply-nested 
expressions which operate over
 // inputs with wide schemas. For more details on the performance 
issues that motivated this
 // flat, see SPARK-15680.
-if (SparkEnv.get != null && 
SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
--- End diff --

`SQLConf` is based on the active spark session. The active spark session is 
not always correctly set. Thus, we do not encourage the community to use 
`SQLConf`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82524/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19394
  
**[Test build #82524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82524/testReport)**
 for PR 19394 at commit 
[`f6613c2`](https://github.com/apache/spark/commit/f6613c2651691ca149390ea2407856ba390590cb).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19267: [WIP][SPARK-20628][CORE] Blacklist nodes when they trans...

2017-10-06 Thread juanrh
Github user juanrh commented on the issue:

https://github.com/apache/spark/pull/19267
  
Hi @vanzin, do you have any comments on the design document attached above? 

Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread szhem
Github user szhem commented on a diff in the pull request:

https://github.com/apache/spark/pull/19294#discussion_r143306215
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -57,6 +60,15 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: 
String)
*/
   private def absPathStagingDir: Path = new Path(path, "_temporary-" + 
jobId)
 
+  /**
+   * Checks whether there are files to be committed to an absolute output 
location.
+   *
+   * As the committing and aborting the job occurs on driver where 
`addedAbsPathFiles` is always
+   * null, it is necessary to check whether the output path is specified, 
that may not be the case
+   * for committers not writing to distributed file systems.
--- End diff --

Thanks a lot, guys! I've just updated the comment 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19294: [SPARK-21549][CORE] Respect OutputFormats with no output...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19294
  
**[Test build #82529 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82529/testReport)**
 for PR 19294 at commit 
[`f55b7c2`](https://github.com/apache/spark/commit/f55b7c2e14ea9cb49255359b294e6d275cda4380).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19294: [SPARK-21549][CORE] Respect OutputFormats with no...

2017-10-06 Thread szhem
Github user szhem commented on a diff in the pull request:

https://github.com/apache/spark/pull/19294#discussion_r143305853
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
 ---
@@ -57,6 +60,15 @@ class HadoopMapReduceCommitProtocol(jobId: String, path: 
String)
*/
   private def absPathStagingDir: Path = new Path(path, "_temporary-" + 
jobId)
 
+  /**
+   * Checks whether there are files to be committed to an absolute output 
location.
+   *
+   * As the committing and aborting the job occurs on driver where 
`addedAbsPathFiles` is always
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19270
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82522/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19270
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19270
  
**[Test build #82522 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82522/testReport)**
 for PR 19270 at commit 
[`25b5215`](https://github.com/apache/spark/commit/25b5215cd6dbbc11b5e5d8d56bff6743578de2b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19450: [SPARK-22218] spark shuffle services fails to update sec...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19450
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82521/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19450: [SPARK-22218] spark shuffle services fails to update sec...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19450
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19450: [SPARK-22218] spark shuffle services fails to update sec...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19450
  
**[Test build #82521 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82521/testReport)**
 for PR 19450 at commit 
[`5a9ef13`](https://github.com/apache/spark/commit/5a9ef1396a28b535042e336a2f43d80104ce95e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19394
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82525/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19394: [SPARK-22170][SQL] Reduce memory consumption in broadcas...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19394
  
**[Test build #82525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82525/testReport)**
 for PR 19394 at commit 
[`5b92fe2`](https://github.com/apache/spark/commit/5b92fe252530f4c8dab4d41772011e86ac308f80).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18664
  
Made [SPARK-1](https://issues.apache.org/jira/browse/SPARK-1) for 
user doc, once we decide what to do with timestampes it can be completed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread icexelloss
Github user icexelloss commented on the issue:

https://github.com/apache/spark/pull/18664
  
Bryan, I haven't created. Go ahead!
On Fri, Oct 6, 2017 at 5:45 PM Bryan Cutler 
wrote:

> Thanks all for the discussion. I think there are a lot of subtleties at
> play here, and what may or may not be considered a bug can depend on the
> users intent. Regardless, I agree that there needs to be user facing
> documentation that will address these details, such as the questions posed
> by @gatorsmile  above. I can create a JIRA
> for this if @icexelloss  hasn't already.
>
> I am okay with proceeding separately for dealing with timezone, and
> matching the behaviour with Arrow to the existing behaviour without Arrow
> here with respect to timezone.
>
> @HyukjinKwon  so you are suggesting to
> not use a timezone for Arrow timestamps? We discussed that earlier in this
> PR, but maybe that is the best solution for now
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19452
  
**[Test build #82528 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82528/testReport)**
 for PR 19452 at commit 
[`026a33b`](https://github.com/apache/spark/commit/026a33bdd01c28327543b416d4716e53fb259181).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19372: [SPARK-22156][MLLIB] Fix update equation of learning rat...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19372
  
**[Test build #3943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3943/testReport)**
 for PR 19372 at commit 
[`2ea3f18`](https://github.com/apache/spark/commit/2ea3f18bb002c38f387e01c31c6fb3ce2cb37231).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-06 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/18664
  
Thanks all for the discussion. I think there are a lot of subtleties at 
play here, and what may or may not be considered a bug can depend on the users 
intent. Regardless, I agree that there needs to be user facing documentation 
that will address these details, such as the questions posed by @gatorsmile 
above. I can create a JIRA for this if @icexelloss hasn't already.

> I am okay with proceeding separately for dealing with timezone, and 
matching the behaviour with Arrow to the existing behaviour without Arrow here 
with respect to timezone.

@HyukjinKwon so you are suggesting to not use a timezone for Arrow 
timestamps?  We discussed that earlier in this PR, but maybe that is the best 
solution for now


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18460
  
**[Test build #82527 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82527/testReport)**
 for PR 18460 at commit 
[`67a037c`](https://github.com/apache/spark/commit/67a037c053e596432f83dc0e2383bad895e9ce21).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18460#discussion_r143295197
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -100,6 +101,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if 
t1.sameType(t2) =>
+  Some(StructType(fields1.zip(fields2).map { case (f1, f2) =>
+// Since `t1.sameType(t2)` is true, two StructTypes have the same 
DataType
+// except `name` (in case of `spark.sql.caseSensitive=false`) and 
`nullable`.
+// - Different names: use a lower case name because 
findTightestCommonType is commutative.
+// - Different nullabilities: `nullable` is true iff one of them 
is nullable.
+val name = if (f1.name == f2.name) f1.name else 
f1.name.toLowerCase(Locale.ROOT)
--- End diff --

Yes. It works. I updated the [test 
cases](https://github.com/apache/spark/pull/18460/files#diff-5a2e7f03d14856c8769fd3ddea8742bdR2657).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19450: [SPARK-22218] spark shuffle services fails to upd...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19450#discussion_r143294931
  
--- Diff: 
common/network-shuffle/src/test/java/org/apache/spark/network/sasl/ShuffleSecretManagerSuite.java
 ---
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl;
+
+import java.nio.ByteBuffer;
+
+import org.junit.Test;
+import static org.junit.Assert.*;
+
+public class ShuffleSecretManagerSuite {
+  static String app1 = "app1";
--- End diff --

`private final` for all of these?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2017-10-06 Thread tejasapatil
Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/19449
  
LGTM. There are already multiple places in codegen where `SQLConf.get` is 
being used so this will make things consistent


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19449: [SPARK-22219][SQL] Refactor code to get a value f...

2017-10-06 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/19449#discussion_r143291724
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -929,7 +929,7 @@ class CodegenContext {
 // be extremely expensive in certain cases, such as deeply-nested 
expressions which operate over
 // inputs with wide schemas. For more details on the performance 
issues that motivated this
 // flat, see SPARK-15680.
-if (SparkEnv.get != null && 
SparkEnv.get.conf.getBoolean("spark.sql.codegen.comments", false)) {
--- End diff --

note to other reviewers: for historical context why this was done, see 
https://github.com/apache/spark/pull/13421/files#r65268674


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19452: [SPARK-22136][SS] Evaluate one-sided conditions early in...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19452
  
**[Test build #82526 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82526/testReport)**
 for PR 19452 at commit 
[`c90fb3c`](https://github.com/apache/spark/commit/c90fb3cb1e112edeacb4be2604a7b628f55697f4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19452: [SPARK-22136][SS] Evaluate one-sided conditions e...

2017-10-06 Thread joseph-torres
GitHub user joseph-torres opened a pull request:

https://github.com/apache/spark/pull/19452

[SPARK-22136][SS] Evaluate one-sided conditions early in stream-stream 
joins.

## What changes were proposed in this pull request?

Evaluate one-sided conditions early in stream-stream joins.

This is in addition to normal filter pushdown, because integrating it with 
the join logic allows it to take place in outer join scenarios. This means that 
rows which can never satisfy the join condition won't clog up the state. 

## How was this patch tested?
new unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/joseph-torres/spark SPARK-22136

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19452.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19452


commit c90fb3cb1e112edeacb4be2604a7b628f55697f4
Author: Jose Torres 
Date:   2017-10-06T20:44:20Z

Evaluate one-sided conditions early in stream-stream joins.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19449
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2017-10-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19449
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82520/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...

2017-10-06 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19449
  
**[Test build #82520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82520/testReport)**
 for PR 19449 at commit 
[`be4220d`](https://github.com/apache/spark/commit/be4220dcb218366f81f4015af56bee9fade90066).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-06 Thread sathiyapk
Github user sathiyapk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19451#discussion_r143289752
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1242,6 +1243,53 @@ object ReplaceIntersectWithSemiJoin extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * If one or both of the datasets in the logical [[Except]] operator are 
purely transformed using
+ * [[Filter]], this rule will replace logical [[Except]] operator with a 
[[Filter]] operator by
+ * flipping the filter condition of the right child.
+ * {{{
+ *   SELECT a1, a2 FROM Tab1 WHERE a2 = 12 EXCEPT SELECT a1, a2 FROM Tab1 
WHERE a1 = 5
+ *   ==>  SELECT a1, a2 FROM Tab1 WHERE a2 = 12 AND a1 <> 5
--- End diff --

@gatorsmile Thanks for the comment. Sorry i didn't pay attention, i'll 
update the query example. But the code produces the correct behaviour of the 
Except operator. Do you have any comment in the code? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18460#discussion_r143289255
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -100,6 +101,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if 
t1.sameType(t2) =>
+  Some(StructType(fields1.zip(fields2).map { case (f1, f2) =>
+// Since `t1.sameType(t2)` is true, two StructTypes have the same 
DataType
+// except `name` (in case of `spark.sql.caseSensitive=false`) and 
`nullable`.
+// - Different names: use a lower case name because 
findTightestCommonType is commutative.
+// - Different nullabilities: `nullable` is true iff one of them 
is nullable.
+val name = if (f1.name == f2.name) f1.name else 
f1.name.toLowerCase(Locale.ROOT)
--- End diff --

Oh, I see. I thought so. Let me check that again.
> I mean, does the case sensitivity conf works in nest column name 
resolution?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18460#discussion_r143288998
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -100,6 +101,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if 
t1.sameType(t2) =>
+  Some(StructType(fields1.zip(fields2).map { case (f1, f2) =>
+// Since `t1.sameType(t2)` is true, two StructTypes have the same 
DataType
+// except `name` (in case of `spark.sql.caseSensitive=false`) and 
`nullable`.
+// - Different names: use a lower case name because 
findTightestCommonType is commutative.
+// - Different nullabilities: `nullable` is true iff one of them 
is nullable.
+val name = if (f1.name == f2.name) f1.name else 
f1.name.toLowerCase(Locale.ROOT)
--- End diff --

For Hive, your example works like the following.
```
hive> CREATE TABLE S1 AS SELECT named_struct('A',1) Col1;
hive> SELECT S1.Col1.a, S1.Col1.A FROM S1;
OK
1   1
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18460#discussion_r143288861
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -100,6 +101,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if 
t1.sameType(t2) =>
+  Some(StructType(fields1.zip(fields2).map { case (f1, f2) =>
+// Since `t1.sameType(t2)` is true, two StructTypes have the same 
DataType
+// except `name` (in case of `spark.sql.caseSensitive=false`) and 
`nullable`.
+// - Different names: use a lower case name because 
findTightestCommonType is commutative.
+// - Different nullabilities: `nullable` is true iff one of them 
is nullable.
+val name = if (f1.name == f2.name) f1.name else 
f1.name.toLowerCase(Locale.ROOT)
--- End diff --

I mean, does the case sensitivity conf works in nest column name resolution?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19250: [SPARK-12297] Table timezone correction for Times...

2017-10-06 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/19250#discussion_r143288711
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -266,6 +267,10 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
* @since 1.4.0
*/
   def insertInto(tableName: String): Unit = {
+extraOptions.get(TimestampTableTimeZone.TIMEZONE_PROPERTY).foreach { 
tz =>
--- End diff --

Hmm. I tried a couple of things, and while it may be possible to remove 
some of these checks and replace them with a check in 
`DateTimeUtils.computeTimeZone`, that doesn't cover all cases. For example, you 
could use "ALTER TABLE" with an invalid time zone and that wouldn't trigger the 
check.

So given the spec I'm inclined to leave the checks as is, unless @zivanfi 
is open to making the spec more lax in that area. 
(`TimeZone.getTimeZone(invalidId)` actually returns the UTC time zone, as 
unexpected as that behavior may be, so things won't necessarily break without 
the checks.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143288308
  
--- Diff: docs/sql-programming-guide.md ---
@@ -479,6 +481,47 @@ source type can be converted into other types using 
this syntax.
 
 
 
+To load a csv file you can use:
+
+
+
+{% include_example manual_load_options_csv 
scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala %}
+
+
+
+{% include_example manual_load_options_csv 
java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java %}
+
+
+
+{% include_example manual_load_options_csv python/sql/datasource.py %}
+
+
+
+{% include_example manual_load_options_csv r/RSparkSQLExample.R %}
+
+
+
+To load a csv file you can use:
--- End diff --

This is also a duplicate. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18460#discussion_r143288219
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -100,6 +101,17 @@ object TypeCoercion {
 case (_: TimestampType, _: DateType) | (_: DateType, _: TimestampType) 
=>
   Some(TimestampType)
 
+case (t1 @ StructType(fields1), t2 @ StructType(fields2)) if 
t1.sameType(t2) =>
+  Some(StructType(fields1.zip(fields2).map { case (f1, f2) =>
+// Since `t1.sameType(t2)` is true, two StructTypes have the same 
DataType
+// except `name` (in case of `spark.sql.caseSensitive=false`) and 
`nullable`.
+// - Different names: use a lower case name because 
findTightestCommonType is commutative.
+// - Different nullabilities: `nullable` is true iff one of them 
is nullable.
+val name = if (f1.name == f2.name) f1.name else 
f1.name.toLowerCase(Locale.ROOT)
--- End diff --

When case sensitive is off, Spark considers them in lower cases.
For example, in [the test 
case](https://github.com/apache/spark/pull/18460/files#diff-5a2e7f03d14856c8769fd3ddea8742bdR2657),
 we need table name `struct1` and `struct2`. In case of `a = A`, it raises 
ambiquous column exceptions.
```
checkAnswer(sql("SELECT * FROM struct1, struct2 WHERE struct1.a = 
struct2.A"), Seq.empty)
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143288202
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/sql/JavaSQLDataSourceExample.java
 ---
@@ -115,7 +115,20 @@ private static void 
runBasicDataSourceExample(SparkSession spark) {
 Dataset peopleDF =
   
spark.read().format("json").load("examples/src/main/resources/people.json");
 peopleDF.select("name", 
"age").write().format("parquet").save("namesAndAges.parquet");
-// $example off:manual_load_options$
+// $example on:manual_load_options_csv$
+Dataset peopleDFCsv = spark.read().format("csv")
+  .option("sep", ";")
+  .option("inferSchema", "true")
+  .option("header", "true")
+  .load("examples/src/main/resources/people.csv");
+// $example off:manual_load_options_csv$
+// $example on:manual_load_options_csv$
+Dataset peopleDFCsv = spark.read().format("csv")
+  .option("sep", ";")
+  .option("inferSchema", "true")
+  .option("header", "true")
+  .load("examples/src/main/resources/people.csv");
+// $example off:manual_load_options_csv$
--- End diff --

Line 125-131 is a duplicate. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19429: [SPARK-20055] [Docs] Added documentation for load...

2017-10-06 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19429#discussion_r143287943
  
--- Diff: examples/src/main/resources/people.csv ---
@@ -0,0 +1,3 @@
+name;age;job
+Jorge;30;Developer
+Bob;32;Developer
--- End diff --

Add an empty line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19270: [SPARK-21809] : Change Stage Page to use datatabl...

2017-10-06 Thread ajbozarth
Github user ajbozarth commented on a diff in the pull request:

https://github.com/apache/spark/pull/19270#discussion_r143287894
  
--- Diff: core/src/main/resources/org/apache/spark/ui/static/utils.js ---
@@ -46,3 +46,64 @@ function formatBytes(bytes, type) {
 var i = Math.floor(Math.log(bytes) / Math.log(k));
 return parseFloat((bytes / Math.pow(k, i)).toFixed(dm)) + ' ' + 
sizes[i];
 }
+
+function formatLogsCells(execLogs, type) {
+if (type !== 'display') return Object.keys(execLogs);
+if (!execLogs) return;
+var result = '';
+$.each(execLogs, function (logName, logUrl) {
+result += '' + logName + ''
+});
+return result;
+}
+
+function getStandAloneAppId(cb) {
--- End diff --

exactly, I'm still not sure why it was implemented this way in the first 
place


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >