date:20180725

[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21758
  
**[Test build #93580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93580/testReport)**
 for PR 21758 at commit 
[`c7600c2`](https://github.com/apache/spark/commit/c7600c24221d29fde31dca921d9d5863af2666e9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21758
  
> What's the failure mode if there are not enough slots for the barrier 
mode? We should throw an exception right?

Yes, as mentioned in 
https://github.com/apache/spark/pull/21758/files/c16a47f0d15998133b9d61d8df5310f1f66b11b0#diff-d4000438827afe3a185ae75b24987a61R372
 , we shall fail the job on submit if there is no enough slots for the barrier 
stage. I'll submit another PR to add this check (tracked by SPARK-24819).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21867
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93574/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...

2018-07-25 Thread zuotingbing

Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/21789
  
@HyukjinKwon could you help to merge this to master branch? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21867
  
**[Test build #93574 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93574/testReport)**
 for PR 21867 at commit 
[`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21867
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205325587
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper {
   }
 
   protected def mapColumnInSetOperation(plan: LogicalPlan): 
Option[Attribute] = plan match {
-case _: Intersect | _: Except | _: Distinct =>
+case _: Intersect | _: ExceptBase | _: Distinct =>
--- End diff --

I am fine about that. Please make a change and avoid introducing a new 
LogicalPlan node. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21102
  
I agree with @ueshin's. I wouldn't make a guarantee of returning order here 
in documentation yet though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93576/testReport)**
 for PR 21852 at commit 
[`0b67e2e`](https://github.com/apache/spark/commit/0b67e2efcb6f827248ee11fffe9eca44a86fceaa).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93577/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)**
 for PR 21878 at commit 
[`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1340/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20861: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uui...

2018-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20861#discussion_r205337069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1994,6 +1996,20 @@ class Analyzer(
 }
   }
 
+  /**
+   * Set the seed for random number generation in Uuid expressions.
+   */
+  object ResolvedUuidExpressions extends Rule[LogicalPlan] {
+private lazy val random = new Random()
+
+override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
+  case p if p.resolved => p
+  case p => p transformExpressionsUp {
+case Uuid(None) => Uuid(Some(random.nextLong()))
--- End diff --

what's the current behavior for rand in streaming?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21320
  
do we have comments other than code style issues? Generally we should not 
block a PR just for code style issues, as long as the PR passes the style check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFr...

2018-07-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21821


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread ajacques

Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21320
  
Hey @mallman, I want to thank you for your work on this so far. I've been 
watching this pull request hoping this would get merged into 2.4 since it would 
be a benefit to me, but can see how it might be frustrating.

Unfortunately, I've only been following the comments and not the 
code/architecture itself, so I can't take over effectively, but I did try to 
make the minor comments as requested hopefully to help out. I've started in 
7ee616076f93d6cfd55b6646314f3d4a6d1530d3. This may not be super helpful right 
now, but if these were the only blockers for getting this change into mainline 
in time for 2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-25 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21758#discussion_r205318258
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -359,20 +366,55 @@ private[spark] class TaskSchedulerImpl(
 // of locality levels so that it gets a chance to launch local tasks 
on all of them.
 // NOTE: the preferredLocality order: PROCESS_LOCAL, NODE_LOCAL, 
NO_PREF, RACK_LOCAL, ANY
 for (taskSet <- sortedTaskSets) {
-  var launchedAnyTask = false
-  var launchedTaskAtCurrentMaxLocality = false
-  for (currentMaxLocality <- taskSet.myLocalityLevels) {
-do {
-  launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(
-taskSet, currentMaxLocality, shuffledOffers, availableCpus, 
tasks)
-  launchedAnyTask |= launchedTaskAtCurrentMaxLocality
-} while (launchedTaskAtCurrentMaxLocality)
-  }
-  if (!launchedAnyTask) {
-taskSet.abortIfCompletelyBlacklisted(hostToExecutors)
+  // Skip the barrier taskSet if the available slots are less than the 
number of pending tasks.
+  if (taskSet.isBarrier && availableSlots < taskSet.numTasks) {
--- End diff --

We plan to fail the job on submit if it requires more slots than available. 
Are there other scenarios we shall fail fast with dynamic allocation? IIUC the 
barrier tasks that have not get launched are still counted into the number of 
pending tasks, so dynamic resource allocation shall still be able to compute a 
correct expected number of executors.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21875
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21875
  
**[Test build #93579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93579/testReport)**
 for PR 21875 at commit 
[`027b6c4`](https://github.com/apache/spark/commit/027b6c43f8c448d3231d19b21c64ab8306881fde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21789: [SPARK-24829][STS]In Spark Thrift Server, CAST AS FLOAT ...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21789
  
Let me leave this open for few days in case some reviewers have more 
comments on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93578/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93582 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93582/testReport)**
 for PR 21103 at commit 
[`cf76c1f`](https://github.com/apache/spark/commit/cf76c1f4a2a41ec88fcd744470a113321e897a71).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205336425
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
--- End diff --

nit: we can simplify the code if we do `override def dataType: ArrayType = 
child.dataType.asInstanceOf[ArrayType]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21830: [SPARK-24878][SQL] Fix reverse function for array type o...

2018-07-25 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21830
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205338428
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
+  s"""
+ |if ($childName.isNullAt($i)) {
+ |  $arrayData.setNullAt($j);
+ |} else {
+ |  $arrayData.$setFunc($j, $getValue);
+ |}
+   """.stripMargin
 } else {
-  s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, 
elementType, "l")});"
+  s"$arrayData.$setFunc($j, $getValue);"
 }
 
 s"""
-   |final int $length = $childName.numElements();
-   |${ev.value} = $initialization;
-   |for(int k = 0; k < $numberOfIterations; k++) {
-   |  int l = $length - k - 1;
-   |  $swapAssigments
+   |final int $numElements = $childName.numElements();
+   |$initialization
+   |for (int $i = 0; $i < $numElements; $i++) {
+   |  int $j = $numElements - $i - 1;
--- End diff --

We still need to calculate the index of the opposite side?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread maryannxue

Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/21875
  
Programming guide updated. Thank you, @dilipbiswal and @HyukjinKwon!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1338/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93576/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21875
  
@maryannxue It looks good to me. As a minor comment, could we state the 
default value for this parameter as well ? For some of the other parameters, we 
specify the default value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...

2018-07-25 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21403
  
@maryannxue as I said my initial proposal was like that. I think that this 
has the advantage of avoiding some code duplication as the same logic which is 
added in ResolveInValues has to be spread over all the places where a In is 
build and avoiding to change the In signature, so that if a user is using In 
directly in his/her code we don't break it. On the other side, I agree with you 
that the approach having a `Seq[Expression]` is cleaner IMO (that's why it was 
my original proposal). @cloud-fan @gatorsmile what do you think about this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93578 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)**
 for PR 21852 at commit 
[`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1339/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #93581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)**
 for PR 21306 at commit 
[`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #93581 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93581/testReport)**
 for PR 21306 at commit 
[`f95800c`](https://github.com/apache/spark/commit/f95800c737f160255122da6bbe336309a4e1532e).
 * This patch **fails Java style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21306: [SPARK-24252][SQL] Add catalog registration and table ca...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93581/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread ajacques

Github user ajacques commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205329769
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala
 ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that builds a 
[[org.apache.spark.sql.types.StructField]] from a Catalyst
+ * complex type extractor. For example, consider a relation with the 
following schema:
+ *
+ *   {{{
+ *   root
+ *|-- name: struct (nullable = true)
+ *||-- first: string (nullable = true)
+ *||-- last: string (nullable = true)
+ *}}}
+ *
+ * Further, suppose we take the select expression `name.first`. This will 
parse into an
+ * `Alias(child, "first")`. Ignoring the alias, `child` matches the 
following pattern:
+ *
+ *   {{{
+ *   GetStructFieldObject(
+ * AttributeReference("name", StructType(_), _, _),
+ * StructField("first", StringType, _, _))
+ *   }}}
+ *
+ * [[SelectedField]] converts that expression into
+ *
+ *   {{{
+ *   StructField("name", StructType(Array(StructField("first", 
StringType
+ *   }}}
+ *
+ * by mapping each complex type extractor to a 
[[org.apache.spark.sql.types.StructField]] with the
+ * same name as its child (or "parent" going right to left in the select 
expression) and a data
+ * type appropriate to the complex type extractor. In our example, the 
name of the child expression
+ * is "name" and its data type is a 
[[org.apache.spark.sql.types.StructType]] with a single string
+ * field named "first".
+ *
+ * @param expr the top-level complex type extractor
+ */
+object SelectedField {
+  def unapply(expr: Expression): Option[StructField] = {
--- End diff --

```
Error:(61, 12) constructor cannot be instantiated to expected type;
 found   : org.apache.spark.sql.catalyst.expressions.Alias
 required: org.apache.spark.sql.catalyst.expressions.ExtractValue
  case Alias(child, _) => child
```

Alias takes: `Alias(child: Expression, name: String)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread ajacques

Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21320
  
@HyukjinKwon, I'm not totally familiar with Spark internals yet, so to be 
honest I don't feel confident making big changes and hopefully can keep them 
simple at first.

I've gone through the code review comments and made as many changes as 
possible 
[here](https://github.com/apache/spark/compare/master...ajacques:spark-4502-parquet_column_pruning-foundation).
 If this PR is mostly feature complete and it's just small things, then I can 
push forward.

If the feedback comments push past simple refactoring level right now I 
would prefer to let someone else take over, but feel free to use what I've done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread ajacques

Github user ajacques commented on a diff in the pull request:

https://github.com/apache/spark/pull/21320#discussion_r205329633
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala
 ---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.planning
+
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types._
+
+/**
+ * A Scala extractor that projects an expression over a given schema. Data 
types,
+ * field indexes and field counts of complex type extractors and attributes
+ * are adjusted to fit the schema. All other expressions are left as-is. 
This
+ * class is motivated by columnar nested schema pruning.
+ */
+case class ProjectionOverSchema(schema: StructType) {
--- End diff --

We can move this to `sql.execution` if we move all three classes: 
`ProjectionOverSchema`, `GetStructFieldObject`, and `SelectedField`. Is there a 
difference in the catalyst.planning vs the execution packages?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205331401
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -52,7 +52,7 @@ trait CheckAnalysis extends PredicateHelper {
   }
 
   protected def mapColumnInSetOperation(plan: LogicalPlan): 
Option[Attribute] = plan match {
-case _: Intersect | _: Except | _: Distinct =>
+case _: Intersect | _: ExceptBase | _: Distinct =>
--- End diff --

@gatorsmile @maropu OK


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205336334
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
+  s"""
+ |if ($childName.isNullAt($i)) {
+ |  $arrayData.setNullAt($j);
+ |} else {
+ |  $arrayData.$setFunc($j, $getValue);
+ |}
+   """.stripMargin
 } else {
-  s"${ev.value}.update(k, ${CodeGenerator.getValue(childName, 
elementType, "l")});"
+  s"$arrayData.$setFunc($j, $getValue);"
 }
 
 s"""
-   |final int $length = $childName.numElements();
-   |${ev.value} = $initialization;
-   |for(int k = 0; k < $numberOfIterations; k++) {
-   |  int l = $length - k - 1;
-   |  $swapAssigments
+   |final int $numElements = $childName.numElements();
+   |$initialization
+   |for (int $i = 0; $i < $numElements; $i++) {
+   |  int $j = $numElements - $i - 1;
--- End diff --

we don't need `j` if we do
```
for (int i = numElements - 1; i >=0; i--)
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21830: [SPARK-24878][SQL] Fix reverse function for array...

2018-07-25 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21830#discussion_r205338930
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1244,46 +1244,50 @@ case class Reverse(child: Expression) extends 
UnaryExpression with ImplicitCastI
   }
 
   private def arrayCodeGen(ctx: CodegenContext, ev: ExprCode, childName: 
String): String = {
-val length = ctx.freshName("length")
-val javaElementType = CodeGenerator.javaType(elementType)
+
 val isPrimitiveType = CodeGenerator.isPrimitiveType(elementType)
 
+val numElements = ctx.freshName("numElements")
+val arrayData = ctx.freshName("arrayData")
+
 val initialization = if (isPrimitiveType) {
-  s"$childName.copy()"
+  ctx.createUnsafeArray(arrayData, numElements, elementType, s" 
$prettyName failed.")
 } else {
-  s"new ${classOf[GenericArrayData].getName()}(new Object[$length])"
-}
-
-val numberOfIterations = if (isPrimitiveType) s"$length / 2" else 
length
-
-val swapAssigments = if (isPrimitiveType) {
-  val setFunc = "set" + CodeGenerator.primitiveTypeName(elementType)
-  val getCall = (index: String) => CodeGenerator.getValue(ev.value, 
elementType, index)
-  s"""|boolean isNullAtK = ${ev.value}.isNullAt(k);
-  |boolean isNullAtL = ${ev.value}.isNullAt(l);
-  |if(!isNullAtK) {
-  |  $javaElementType el = ${getCall("k")};
-  |  if(!isNullAtL) {
-  |${ev.value}.$setFunc(k, ${getCall("l")});
-  |  } else {
-  |${ev.value}.setNullAt(k);
-  |  }
-  |  ${ev.value}.$setFunc(l, el);
-  |} else if (!isNullAtL) {
-  |  ${ev.value}.$setFunc(k, ${getCall("l")});
-  |  ${ev.value}.setNullAt(l);
-  |}""".stripMargin
+  val arrayDataClass = classOf[GenericArrayData].getName
+  s"$arrayDataClass $arrayData = new $arrayDataClass(new 
Object[$numElements]);"
+}
+
+val i = ctx.freshName("i")
+val j = ctx.freshName("j")
+
+val getValue = CodeGenerator.getValue(childName, elementType, i)
+
+val setFunc = if (isPrimitiveType) {
+  s"set${CodeGenerator.primitiveTypeName(elementType)}"
+} else {
+  "update"
+}
+
+val assignment = if (isPrimitiveType && 
dataType.asInstanceOf[ArrayType].containsNull) {
--- End diff --

We can't override `dataType` only for `ArrayType` because `Reverse` is also 
used for `StringType`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...

2018-07-25 Thread TomaszGaweda

Github user TomaszGaweda commented on a diff in the pull request:

https://github.com/apache/spark/pull/21875#discussion_r205268370
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
 ---
@@ -183,6 +183,9 @@ class JDBCOptions(
 }
   // An option to execute custom SQL before fetching data from the remote 
DB
   val sessionInitStatement = parameters.get(JDBC_SESSION_INIT_STATEMENT)
+
+  // An option to allow/disallow pushing down predicate into JDBC data 
source
+  val pushDownPredicate = parameters.getOrElse(JDBC_PUSHDOWN_PREDICATE, 
"true").toBoolean
--- End diff --

Yeah, consistency is a very good argument :) Indeed it will be plural or 
not, depending from which side we are looking at it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread icexelloss

Github user icexelloss commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205268767
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -94,36 +95,94 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
  */
 object ExtractPythonUDFs extends Rule[SparkPlan] with PredicateHelper {
 
-  private def hasPythonUDF(e: Expression): Boolean = {
+  private case class LazyEvalType(var evalType: Int = -1) {
+
+def isSet: Boolean = evalType >= 0
+
+def set(evalType: Int): Unit = {
+  if (isSet) {
+throw new IllegalStateException("Eval type has already been set")
+  } else {
+this.evalType = evalType
+  }
+}
+
+def get(): Int = {
+  if (!isSet) {
+throw new IllegalStateException("Eval type is not set")
+  } else {
+evalType
+  }
+}
+  }
+
+  private def hasScalarPythonUDF(e: Expression): Boolean = {
 e.find(PythonUDF.isScalarPythonUDF).isDefined
   }
 
-  private def canEvaluateInPython(e: PythonUDF): Boolean = {
-e.children match {
-  // single PythonUDF child could be chained and evaluated in Python
-  case Seq(u: PythonUDF) => canEvaluateInPython(u)
-  // Python UDF can't be evaluated directly in JVM
-  case children => !children.exists(hasPythonUDF)
+  /**
+   * Check whether a PythonUDF expression can be evaluated in Python.
+   *
+   * If the lazy eval type is not set, this method checks for either 
Batched Python UDF and Scalar
+   * Pandas UDF. If the lazy eval type is set, this method checks for the 
expression of the
+   * specified eval type.
+   *
+   * This method will also set the lazy eval type to be the type of the 
first evaluable expression,
+   * i.e., if lazy eval type is not set and we find a evaluable Python UDF 
expression, lazy eval
+   * type will be set to the eval type of the expression.
+   *
+   */
+  private def canEvaluateInPython(e: PythonUDF, lazyEvalType: 
LazyEvalType): Boolean = {
--- End diff --

Yes it's in the most recent commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...

2018-07-25 Thread maryannxue

Github user maryannxue commented on a diff in the pull request:

https://github.com/apache/spark/pull/21875#discussion_r205268701
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
@@ -172,7 +172,11 @@ private[sql] case class JDBCRelation(
 
   // Check if JDBCRDD.compileFilter can accept input filters
   override def unhandledFilters(filters: Array[Filter]): Array[Filter] = {
-filters.filter(JDBCRDD.compileFilter(_, 
JdbcDialects.get(jdbcOptions.url)).isEmpty)
+if (jdbcOptions.pushDownPredicate) {
--- End diff --

No. I share your opinion actually. It is confusing here... maybe we should 
change the parameter names at some point.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...

2018-07-25 Thread TomaszGaweda

Github user TomaszGaweda commented on a diff in the pull request:

https://github.com/apache/spark/pull/21875#discussion_r205269684
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
@@ -172,7 +172,11 @@ private[sql] case class JDBCRelation(
 
   // Check if JDBCRDD.compileFilter can accept input filters
   override def unhandledFilters(filters: Array[Filter]): Array[Filter] = {
-filters.filter(JDBCRDD.compileFilter(_, 
JdbcDialects.get(jdbcOptions.url)).isEmpty)
+if (jdbcOptions.pushDownPredicate) {
--- End diff --

Indeed it's confusing. `buildScan` argument may be named `pushedFilters`, 
variables also, then code will be self-describing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21875: [SPARK-24288][SQL] Add a JDBC Option to enable pr...

2018-07-25 Thread TomaszGaweda

Github user TomaszGaweda commented on a diff in the pull request:

https://github.com/apache/spark/pull/21875#discussion_r205269695
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 ---
@@ -172,7 +172,11 @@ private[sql] case class JDBCRelation(
 
   // Check if JDBCRDD.compileFilter can accept input filters
   override def unhandledFilters(filters: Array[Filter]): Array[Filter] = {
-filters.filter(JDBCRDD.compileFilter(_, 
JdbcDialects.get(jdbcOptions.url)).isEmpty)
+if (jdbcOptions.pushDownPredicate) {
--- End diff --

Indeed it's confusing. `buildScan` argument may be named `pushedFilters`, 
variables also, then code will be self-describing


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config fo...

2018-07-25 Thread maryannxue

GitHub user maryannxue opened a pull request:

https://github.com/apache/spark/pull/21876

[SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optimization Rule 
Exclusion

## What changes were proposed in this pull request?

This is an extension to the original PR, in which rule exclusion did not 
work for classes derived from Optimizer, e.g., SparkOptimizer.
To solve this issue, Optimizer and its derived classes will define/override 
`defaultBatches` and `nonExcludableRules` in order to define its default rule 
set as well as rules that cannot be excluded by the SQL config. In the 
meantime, Optimizer's `batches` method is dedicated to the rule exclusion logic 
and is defined "final".

## How was this patch tested?

Added UT.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maryannxue/spark rule-exclusion

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21876.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21876


commit eaec2f5f2b4e3193de41655b84a1dc936b0e50a3
Author: maryannxue 
Date:   2018-07-13T21:32:01Z

[SPARK-24802] Optimization Rule Exclusion

commit 84f1a6b5cba08df8684179e9d7195545be655e76
Author: maryannxue 
Date:   2018-07-18T06:13:50Z

Address review comments

commit ff23edf81a4a78d1589ed582a1802b94a8ebf4c6
Author: maryannxue 
Date:   2018-07-21T02:37:54Z

Address review comments

commit b154979236e211dc7185ca8e450493f0c6b0f469
Author: maryannxue 
Date:   2018-07-21T02:41:21Z

change test name

commit 87afe4fbcaf71d303b07612f9ceb9ad25dd3dcda
Author: maryannxue 
Date:   2018-07-23T00:35:01Z

address review comments

commit 39b6ce9548c99363e81cb246b4cbe5534d710f3e
Author: maryannxue 
Date:   2018-07-23T04:28:00Z

address review comments

commit a2161ef1f333f2cc039df0ecc8c96e5ec27e00ff
Author: maryannxue 
Date:   2018-07-25T18:52:13Z

Merge remote-tracking branch 'origin/master' into rule-exclusion

commit 3730053d7386188042b2f2d4bd6784c3de722df6
Author: maryannxue 
Date:   2018-07-25T20:08:19Z

Extend rule-exclusion to Optimizer sub-classes, esp. SparkOptimizer




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21876
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21876
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1329/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21876: [SPARK-24802][SQL][FOLLOW-UP] Add a new config for Optim...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21876
  
**[Test build #93571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93571/testReport)**
 for PR 21876 at commit 
[`3730053`](https://github.com/apache/spark/commit/3730053d7386188042b2f2d4bd6784c3de722df6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-25 Thread squito

Github user squito commented on the issue:

https://github.com/apache/spark/pull/21451
  
fyi, I did finally run my scale tests again on a cluster, and shuffles, 
remote reads, and replication worked for blocks over 2gb (sorry got sidetracked 
with a few other things in the meantime)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21821
  
**[Test build #93555 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93555/testReport)**
 for PR 21821 at commit 
[`ddbd9f7`](https://github.com/apache/spark/commit/ddbd9f7c796e8bedfbae3141c9c7098370c217ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21821
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93555/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21821
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20414
  
**[Test build #93558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93558/testReport)**
 for PR 20414 at commit 
[`6910ed6`](https://github.com/apache/spark/commit/6910ed62c272bedfa251cab589bb52bed36be3ed).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20414
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93558/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD could l...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20414
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21320
  
> Few comments like #21320 (comment) or ^ are not minor or nits. I leave 
hard -1 if they are not addressed.

I'm sorry to say I'm very close to hanging up this PR. I put a lot of care, 
time and effort into my work. After more than two years of off and on review, 
discussion/debate, nitpicking, commits, steps forward and backwards, to have 
someone swoop in at this time with a new raft of nitpicking and stylistic 
issues that set the review back again further is beyond maddening. And that's 
why you have not received much cooperation from me. Perhaps I should have 
stated that up front, but I've lost patience defending myself. Contributing to 
this PR is a tax on what is completely voluntary, unpaid time. I have no 
professional responsibility to this effort. Maybe it's better off done by 
someone who does.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20405
  
**[Test build #93559 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93559/testReport)**
 for PR 20405 at commit 
[`47bb245`](https://github.com/apache/spark/commit/47bb245353202208f2c41634c3796c8e4d2be663).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20405
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93559/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20405
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21650: [SPARK-24624][SQL][PYTHON] Support mixture of Pyt...

2018-07-25 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/21650#discussion_r205275406
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -94,36 +95,94 @@ object ExtractPythonUDFFromAggregate extends 
Rule[LogicalPlan] {
  */
 object ExtractPythonUDFs extends Rule[SparkPlan] with PredicateHelper {
 
-  private def hasPythonUDF(e: Expression): Boolean = {
+  private case class LazyEvalType(var evalType: Int = -1) {
+
+def isSet: Boolean = evalType >= 0
+
+def set(evalType: Int): Unit = {
+  if (isSet) {
+throw new IllegalStateException("Eval type has already been set")
+  } else {
+this.evalType = evalType
+  }
+}
+
+def get(): Int = {
+  if (!isSet) {
+throw new IllegalStateException("Eval type is not set")
+  } else {
+evalType
+  }
+}
+  }
+
+  private def hasScalarPythonUDF(e: Expression): Boolean = {
 e.find(PythonUDF.isScalarPythonUDF).isDefined
   }
 
-  private def canEvaluateInPython(e: PythonUDF): Boolean = {
-e.children match {
-  // single PythonUDF child could be chained and evaluated in Python
-  case Seq(u: PythonUDF) => canEvaluateInPython(u)
-  // Python UDF can't be evaluated directly in JVM
-  case children => !children.exists(hasPythonUDF)
+  /**
+   * Check whether a PythonUDF expression can be evaluated in Python.
+   *
+   * If the lazy eval type is not set, this method checks for either 
Batched Python UDF and Scalar
+   * Pandas UDF. If the lazy eval type is set, this method checks for the 
expression of the
+   * specified eval type.
+   *
+   * This method will also set the lazy eval type to be the type of the 
first evaluable expression,
+   * i.e., if lazy eval type is not set and we find a evaluable Python UDF 
expression, lazy eval
+   * type will be set to the eval type of the expression.
+   *
+   */
+  private def canEvaluateInPython(e: PythonUDF, lazyEvalType: 
LazyEvalType): Boolean = {
--- End diff --

Ok, I think I see the problem. Since there was a map over 
`plan.expressions`, a new `FirstEvalType` object was being created for each 
expression.  Changing this to the following corrected the failure:
```
val setEvalType = new FirstEvalType
val udfs = plan.expressions.flatMap(collectEvaluableUDFs(_, setEvalType))
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21868#discussion_r205278202
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -381,6 +381,26 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val IS_PARQUET_PARTITION_ADAPTIVE_ENABLED = 
buildConf("spark.sql.parquet.adaptiveFileSplit")
+.doc("For columnar file format (e.g., Parquet), it's possible that 
only few (not all) " +
+  "columns are needed. So, it's better to make sure that the total 
size of the selected " +
+  "columns is about 128 MB "
+)
+.booleanConf
+.createWithDefault(false)
+
+  val PARQUET_STRUCT_LENGTH = buildConf("spark.sql.parquet.struct.length")
+.intConf
+.createWithDefault(StructType.defaultConcreteType.defaultSize)
+
+  val PARQUET_MAP_LENGTH = buildConf("spark.sql.parquet.map.length")
+.intConf
+.createWithDefault(MapType.defaultConcreteType.defaultSize)
+
+  val PARQUET_ARRAY_LENGTH = buildConf("spark.sql.parquet.array.length")
+.intConf
+.createWithDefault(ArrayType.defaultConcreteType.defaultSize)
--- End diff --

`ArrayType.defaultConcreteType` is `ArrayType(NullType, containsNull = 
true)`. I think using this you won't get a reasonable number.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21877
  
**[Test build #93572 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93572/testReport)**
 for PR 21877 at commit 
[`d308d3c`](https://github.com/apache/spark/commit/d308d3c75f78242c822eab6d11fb651d94f10aa6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`
  * `case class ReplaceTableAsSelect(`
  * `case class TableV2Relation(`
  * `case class AppendDataExec(`
  * `case class CreateTableAsSelectExec(`
  * `case class ReplaceTableAsSelectExec(`
  * `case class WriteToDataSourceV2Exec(`
  * `abstract case class V2TableWriteExec(`
  * `  implicit class CatalogHelper(catalog: CatalogProvider) `
  * `  implicit class TableHelper(table: Table) `
  * `  implicit class SourceHelper(source: DataSourceV2) `
  * `  implicit class OptionsHelper(options: Map[String, String]) `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21877
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93572/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS ...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21877
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21867
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21867
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21852
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1334/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93577/testReport)**
 for PR 21878 at commit 
[`d2759cc`](https://github.com/apache/spark/commit/d2759cce48eb9a85145e90d8a126fb83351d0fda).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1335/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21221
  
**[Test build #93570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93570/testReport)**
 for PR 21221 at commit 
[`8905d23`](https://github.com/apache/spark/commit/8905d231c3a959f70266223d3546b17a655cee39).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
> After more than two years of off and on review, discussion/debate, 
nitpicking, commits, steps forward and backwards, to have someone swoop in at 
this time with a new raft of nitpicking and stylistic issues that set the 
review back again further is beyond maddening. 

I think that's primarily because the change looks incomplete but the 
feature itself sounds good to have. I think that's why people try to take a 
look a lot.

Stepping forward and backwards is bad. That's why I am sticking with this 
PR to get this change in and help you address other people's comments and 
prevent such forward and backward.

Stylistic issues are virtually based upon 
https://github.com/databricks/scala-style-guide .

Nitpicking from me is basically from referring other codes or PRs in Spark, 
or other committer's preference so that we can get through this. I guess nits 
are still good to fix if you happen to push more changes. I guess it would take 
few seconds to address them. If that's not, please ignore my nit or minor 
comments. They don't block the PR usually.

For clarification, few comments mentioned in 
https://github.com/apache/spark/pull/21320#issuecomment-407714036 are pretty 
reject comments in general in other PRs too.

> Contributing to this PR is a tax on what is completely voluntary, unpaid 
time.

FWIW, all my works have been unpaid and completely voluntary to me more 
than 3 years in the past except the recent half 6 months (which basically means 
until I became a committer). To be honest, I believe I still work on Spark like 
when I worked individually before.

> I have no professional responsibility to this effort. Maybe it's better 
off done by someone who does.

I completely agree. There should be no professional responsibility like a 
task to do in an open source in general. I think no one has that professional 
responsibility to take this and here we should be transparent on this. If 
anyone interested in this finds that you want someone else to take over, this 
might be taken over _voluntarily_ with a comment saying I want to take over 
this.

I might cc some people who might be interested in this in order to inform 
them but it doesn't mean I hand it off to someone else.

I am sorry if you felt I am pushing or rushing you - was trying to get this 
change in since people find it's a good feature to have. That's why I 
prioritized this and stick to this PR.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-25 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205310335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,330 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
+  override def dataType: DataType = left.dataType
+
+  var hsInt: OpenHashSet[Int] = _
+  var hsLong: OpenHashSet[Long] = _
+
+  def assignInt(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getInt(idx)
+if (!hsInt.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setInt(pos, elem)
+  }
+  hsInt.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def assignLong(array: ArrayData, idx: Int, resultArray: ArrayData, pos: 
Int): Boolean = {
+val elem = array.getLong(idx)
+if (!hsLong.contains(elem)) {
+  if (resultArray != null) {
+resultArray.setLong(pos, elem)
+  }
+  hsLong.add(elem)
+  true
+} else {
+  false
+}
+  }
+
+  def evalIntLongPrimitiveType(
+  array1: ArrayData,
+  array2: ArrayData,
+  resultArray: ArrayData,
+  isLongType: Boolean): Int = {
+// store elements into resultArray
+var notFoundNullElement = true
+var i = 0
+while (i < array2.numElements()) {
+  if (array2.isNullAt(i)) {
+notFoundNullElement = false
+  } else {
+val assigned = if (!isLongType) {
+  hsInt.add(array2.getInt(i))
+} else {
+  hsLong.add(array2.getLong(i))
+}
+  }
+  i += 1
+}
+var pos = 0
+i = 0
+while (i < array1.numElements()) {
+  if (array1.isNullAt(i)) {
+if (notFoundNullElement) {
+  if (resultArray != null) {
+resultArray.setNullAt(pos)
+  }
+  pos += 1
+  notFoundNullElement = false
+}
+  } else {
+val assigned = if (!isLongType) {
+  assignInt(array1, i, resultArray, pos)
+} else {
+  assignLong(array1, i, resultArray, pos)
+}
+if (assigned) {
+  pos += 1
+}
+  }
+  i += 1
+}
+pos
+  }
+
+  override def nullSafeEval(input1: Any, input2: Any): Any = {
+val array1 = input1.asInstanceOf[ArrayData]
+val array2 = input2.asInstanceOf[ArrayData]
+
+if (elementTypeSupportEquals) {
+  elementType match {
+case IntegerType =>
+  // avoid boxing of primitive int array elements
+  // calculate result array size
+  hsInt = new OpenHashSet[Int]
+  val elements = evalIntLongPrimitiveType(array1, array2, null, 
false)
+  // allocate result array
+  hsInt = new OpenHashSet[Int]
+  val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData(
+IntegerType.defaultSize, elements)) {
+new GenericArrayData(new Array[Any](elements))
+  } else {
+UnsafeArrayData.forPrimitiveArray(
+  Platform.INT_ARRAY_OFFSET, elements, IntegerType.defaultSize)
+  }
+  // assign elements into the result array
+  evalIntLongPrimitiveType(array1, array2, resultArray, false)
+  resultArray
+case LongType =>
+  // avoid boxing of primitive long array elements
+  // calculate result array size
+  hsLong = new OpenHashSet[Long]
+  val elements = evalIntLongPrimitiveType(array1, array2, null, 
true)
+  // allocate result array
+  hsLong = new OpenHashSet[Long]
+  val resultArray = if (UnsafeArrayData.shouldUseGenericArrayData(
+LongType.defaultSize, elements)) {
+new GenericArrayData(new Array[Any](elements))
+  } else {
+UnsafeArrayData.forPrimitiveArray(
+  Platform.LONG_ARRAY_OFFSET, elements, LongType.defaultSize)
+  }
+

[GitHub] spark issue #21875: [SPARK-24288][SQL] Add a JDBC Option to enable preventin...

2018-07-25 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21875
  
which is here 
https://github.com/apache/spark/blob/master/docs/sql-programming-guide.md#jdbc-to-other-databases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21867: [SPARK-24307][CORE] Add conf to revert to old code.

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21867
  
**[Test build #93566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93566/testReport)**
 for PR 21867 at commit 
[`a5b00b8`](https://github.com/apache/spark/commit/a5b00b8a05538a6adb3a4525c2fecc1e15575f7c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21868: [SPARK-24906][SQL] Adaptively enlarge split / par...

2018-07-25 Thread habren

Github user habren commented on a diff in the pull request:

https://github.com/apache/spark/pull/21868#discussion_r205287123
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -381,6 +381,26 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
+  val IS_PARQUET_PARTITION_ADAPTIVE_ENABLED = 
buildConf("spark.sql.parquet.adaptiveFileSplit")
+.doc("For columnar file format (e.g., Parquet), it's possible that 
only few (not all) " +
+  "columns are needed. So, it's better to make sure that the total 
size of the selected " +
+  "columns is about 128 MB "
+)
+.booleanConf
+.createWithDefault(false)
+
+  val PARQUET_STRUCT_LENGTH = buildConf("spark.sql.parquet.struct.length")
+.intConf
+.createWithDefault(StructType.defaultConcreteType.defaultSize)
+
+  val PARQUET_MAP_LENGTH = buildConf("spark.sql.parquet.map.length")
+.intConf
+.createWithDefault(MapType.defaultConcreteType.defaultSize)
+
+  val PARQUET_ARRAY_LENGTH = buildConf("spark.sql.parquet.array.length")
+.intConf
+.createWithDefault(ArrayType.defaultConcreteType.defaultSize)
--- End diff --

Thanks for your comments. I set the default value to StringType.defaultSize 
(8). It's default size, use should configure it according to the real data


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21877: [SPARK-24923][SQL][WIP] Add unpartitioned CTAS an...

2018-07-25 Thread rdblue

GitHub user rdblue opened a pull request:

https://github.com/apache/spark/pull/21877

[SPARK-24923][SQL][WIP] Add unpartitioned CTAS and RTAS support for 
DataSourceV2

## What changes were proposed in this pull request?

* Remove extends from `ReadSupport` and `WriteSupport` classes for use with 
`Table`
* Add CTAS and RTAS logical plans
* Refactor physical write plans so AppendData, CTAS, and RTAS use the same 
base class
* Add support for `TableCatalog` to `DataFrameReader` and `DataFrameWriter`
* Add `TableV2Relation` for tables that are loaded by `TableCatalog` and 
have no `DataSource` instance
* Move implicit helpers into `DataSourceV2Implicits` to avoid future churn

Note that this doesn't handle `partitionBy` in `DataFrameWriter`. Adding 
support for partitioned tables will require validation rules.

This is based on unmerged work and includes the commits from #21306 and 
#21305.

## How was this patch tested?

Adding unit tests for CTAS and RTAS.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rdblue/spark add-ctas-rtas-v2-plans

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21877.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21877


commit 8372f5bb47a0d6269bb16b3dc16f6f3278d2f5fd
Author: Ryan Blue 
Date:   2018-05-05T01:13:01Z

SPARK-24252: Add v2 data source mix-in for catalog support.

commit 1238af73872b0105d0c5dfbbd8da5c8f18afe408
Author: Ryan Blue 
Date:   2018-05-07T15:54:37Z

SPARK-24251: Add AppendData logical plan.

This adds a new logical plan, AppendData, that was proposed in
SPARK-23521. This also adds an analyzer rule to validate data written
with AppendData against the target table. DataFrameWriter is also
updated so that v2 writes use the new AppendData logical plan.

commit d308d3c75f78242c822eab6d11fb651d94f10aa6
Author: Ryan Blue 
Date:   2018-07-25T18:11:45Z

Add CTAS and RTAS support.

This uses the catalog API introduced in SPARK-24252 to implement CTAS
and RTAS plans.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-25 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21857#discussion_r205289050
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -1275,6 +1276,64 @@ object ReplaceExceptWithAntiJoin extends 
Rule[LogicalPlan] {
   }
 }
 
+/**
+ * Replaces logical [[ExceptAll]] operator using a combination of Union, 
Aggregate
+ * and Generate operator.
+ *
+ * Input Query :
+ * {{{
+ *SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2
+ * }}}
+ *
+ * Rewritten Query:
+ * {{{
+ *   SELECT c1
+ *   FROM (
+ * SELECT replicate_rows(sum_val, c1) AS (sum_val, c1)
+ *   FROM (
+ * SELECT c1, cnt, sum_val
--- End diff --

@ueshin Thanks.. will change.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21878
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)**
 for PR 21878 at commit 
[`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21596: [SPARK-24601] Bump Jackson version

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21596
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93569/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21821: [SPARK-24867] [SQL] Add AnalysisBarrier to DataFrameWrit...

2018-07-25 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21821
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205305691
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/SimplifyConditionalSuite.scala
 ---
@@ -122,4 +126,25 @@ class SimplifyConditionalSuite extends PlanTest with 
PredicateHelper {
 None),
   CaseWhen(normalBranch :: trueBranch :: Nil, None))
   }
+
+  test("remove entire CaseWhen if all the outputs are semantic 
equivalence") {
--- End diff --

Yes, I plan to add couple more tests tonight.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if all th...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21852
  
**[Test build #93578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93578/testReport)**
 for PR 21852 at commit 
[`4acda6f`](https://github.com/apache/spark/commit/4acda6fbf4fb5b1be30a0ad213cd5369b64b02b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93570/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...

2018-07-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21221
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21852: [SPARK-24893] [SQL] Remove the entire CaseWhen if...

2018-07-25 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21852#discussion_r205309619
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -416,6 +416,21 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case e @ CaseWhen(branches, Some(elseValue)) if {
+val list = branches.map(_._2) :+ elseValue
+list.tail.forall(list.head.semanticEquals)
+  } =>
+// For non-deterministic conditions with side effect, we can not 
remove it.
+// Since the output of all the branches are semantic equivalence, 
`elseValue`
+// is picked for all the branches.
+val newBranches = 
branches.map(_._1).filter(!_.deterministic).map(cond => (cond, elseValue))
--- End diff --

All conds must be deterministic, otherwise a non deterministic one not run 
before can be run after this rule.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21867: [SPARK-24307][CORE] Add conf to revert to old cod...

2018-07-25 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21867#discussion_r205312971
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -731,7 +731,14 @@ private[spark] class BlockManager(
   }
 
   if (data != null) {
-return Some(ChunkedByteBuffer.fromManagedBuffer(data, chunkSize))
+// SPARK-24307 undocumented "escape-hatch" in case there are any 
issues in converting to
+// to ChunkedByteBuffer, to go back to old code-path.  Can be 
removed post Spark 2.4 if
+// new path is stable.
+if (conf.getBoolean("spark.fetchToNioBuffer", false)) {
--- End diff --

Maybe we'd better to rename that one "spark.maxRemoteBlockSizeFetchToMem" 
also ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21878: [SPARK-24924][SQL] Add mapping for built-in Avro data so...

2018-07-25 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21878
  
**[Test build #93575 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93575/testReport)**
 for PR 21878 at commit 
[`d95ba40`](https://github.com/apache/spark/commit/d95ba4081ac1188515b7e6363640700d56f2c93f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 503 matches

Mail list logo