[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94539/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20637
  
**[Test build #94539 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94539/testReport)**
 for PR 20637 at commit 
[`0f7ae11`](https://github.com/apache/spark/commit/0f7ae11df7e53cdcf4576b7f558f3135e951dcb7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22066
  
**[Test build #94543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94543/testReport)**
 for PR 22066 at commit 
[`8ee56bb`](https://github.com/apache/spark/commit/8ee56bbfaacdd64b1712d72650a39939ca3b13f2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21994: [SPARK-24529][Build][test-maven][follow-up] Add s...

2018-08-09 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21994#discussion_r209147728
  
--- Diff: pom.xml ---
@@ -2609,6 +2609,28 @@
   
 
   
+  
+com.github.spotbugs
+spotbugs-maven-plugin
--- End diff --

let me check the elapsed time on my environment. +1 for holding on for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22066
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22066
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22066: [SPARK-25084][SQL] "distribute by" on multiple co...

2018-08-09 Thread yucai
GitHub user yucai opened a pull request:

https://github.com/apache/spark/pull/22066

[SPARK-25084][SQL] "distribute by" on multiple columns may lead to codegen 
issue

## What changes were proposed in this pull request?

"distribute by" on multiple columns may lead to codegen issue

## How was this patch tested?

UTs.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yucai/spark SPARK-25084

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22066.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22066


commit 8ee56bbfaacdd64b1712d72650a39939ca3b13f2
Author: yucai 
Date:   2018-08-10T05:19:43Z

"distribute by" on multiple columns may lead to codegen issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22011
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94530/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22011
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22011
  
**[Test build #94530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94530/testReport)**
 for PR 22011 at commit 
[`d508fc5`](https://github.com/apache/spark/commit/d508fc5df6680a8f30ce4c17004a1677a96d91eb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22060: [DO NOT MERGE][TEST ONLY] Add once-policy rule check

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22060
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22060: [DO NOT MERGE][TEST ONLY] Add once-policy rule check

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94540/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22060: [DO NOT MERGE][TEST ONLY] Add once-policy rule check

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22060
  
**[Test build #94540 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94540/testReport)**
 for PR 22060 at commit 
[`3236568`](https://github.com/apache/spark/commit/323656872799b8dd636061220f3ed139379c9c79).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21994: [SPARK-24529][Build][test-maven][follow-up] Add s...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21994#discussion_r209145351
  
--- Diff: pom.xml ---
@@ -2609,6 +2609,28 @@
   
 
   
+  
+com.github.spotbugs
+spotbugs-maven-plugin
--- End diff --

Yea, this slows down 16ish mins and that was my concern at the very first 
place. Currently, it only affects Maven build though. +1 for holding on for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21320
  
> @mallman, can we close this PR? Are you willing to update here or not?

I pushed an update less than a day ago, and I intend to continue pushing 
updates as needed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #94542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94542/testReport)**
 for PR 21732 at commit 
[`80506f4`](https://github.com/apache/spark/commit/80506f4e98184ccd66dbaac14ec52d69c358020d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2030/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-09 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21732
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21732: [SPARK-24762][SQL] Enable Option of Product encod...

2018-08-09 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21732#discussion_r209144900
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -43,20 +43,17 @@ import org.apache.spark.util.Utils
  *to the name `value`.
  */
 object ExpressionEncoder {
-  def apply[T : TypeTag](): ExpressionEncoder[T] = {
+  // Constructs an encoder for top-level row.
+  def apply[T : TypeTag](): ExpressionEncoder[T] = apply(topLevel = true)
+
+  /**
+   * @param topLevel whether the encoders to construct are for top-level 
row.
+   */
+  def apply[T : TypeTag](topLevel: Boolean): ExpressionEncoder[T] = {
--- End diff --

In `Aggregator`, we can call this apply with `topLevel = false` to avoid 
resulting a nested struct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22062: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22062
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
I think that this is not a data correctness issue. This may cause 
unexpected program abort due to hardware memory access error.
BTW, it would be good to backport it to increase stability.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22062: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22062
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94529/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22062: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22062
  
**[Test build #94529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94529/testReport)**
 for PR 22062 at commit 
[`54799ca`](https://github.com/apache/spark/commit/54799cae8ef0727988bbb863d326ea61b4d9ae72).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread eatoncys
Github user eatoncys commented on the issue:

https://github.com/apache/spark/pull/22053
  
@cloud-fan Unaligned accesses are not supported on SPARC architecture, 
which is discussed on the issure: 
https://issues.apache.org/jira/browse/SPARK-16962.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22065
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22065
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2029/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22065: [SPARK-23992][CORE] ShuffleDependency does not need to b...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22065
  
**[Test build #94541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94541/testReport)**
 for PR 22065 at commit 
[`a99769d`](https://github.com/apache/spark/commit/a99769dd1aac779e972ed2e23aa7598e6d7c7105).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22065: [SPARK-23992][CORE] ShuffleDependency does not ne...

2018-08-09 Thread 10110346
GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/22065

[SPARK-23992][CORE] ShuffleDependency does not need to be deserialized 
every time

In the same stage, 'ShuffleDependency' is not necessary to be deserialized 
each time.

I have tested 3 times in my production environment , it has a bit of 
performance improvement( about 0.7%):
Before this PR:
duration: 20189(s)
App count: 4736
the sum of time for all Apps: 530470(s)

Afterthis PR:
duration: 20035(s)
the count of Apps : 4736
the sum of time for all Apps: 525220(s)
How was this patch tested?

Existing unit test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark notdeserializedep22

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22065.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22065


commit a99769dd1aac779e972ed2e23aa7598e6d7c7105
Author: liuxian 
Date:   2018-08-10T04:20:54Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21698
  
IIUC streaming query always need to specify a checkpoint location?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...

2018-08-09 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22037#discussion_r209140501
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
---
@@ -139,7 +152,22 @@ object SchemaConverters {
 
   case FloatType => builder.floatType()
   case DoubleType => builder.doubleType()
-  case _: DecimalType | StringType => builder.stringType()
+  case StringType => builder.stringType()
+  case d: DecimalType =>
+val avroType = LogicalTypes.decimal(d.precision, d.scale)
+val fixedSize = minBytesForPrecision(d.precision)
+// Use random name to avoid conflict in naming of fixed field.
+// Field names must start with [A-Za-z_], while the charset of 
Random.alphanumeric contains
+// [0-9]. So add a single character "f" to ensure the name is 
valid.
+val name = "f" + Random.alphanumeric.take(32).mkString("")
+if (nullable) {
+  val schema = avroType.addToSchema(
+SchemaBuilder.builder().fixed(name).size(fixedSize))
+  builder.`type`(schema)
+} else {
+  avroType.addToSchema(builder.fixed(name).size(fixedSize))
--- End diff --

Here we can add the schema to `builder` directly. If the builder is 
nullable, we need to create schema with logical type and then add it to the 
nullable builder (complete the type as `union` with `null`)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21087
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21847


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21087
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94528/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21087
  
**[Test build #94528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94528/testReport)**
 for PR 21087 at commit 
[`ebd9265`](https://github.com/apache/spark/commit/ebd926530c1d8b2f515a4a233f5963eafc17e460).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...

2018-08-09 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/21847
  
Thanks all. Merged into master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21320
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

2018-08-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22063#discussion_r209136524
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -114,6 +114,7 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
   val types = (1 to x).foldRight("RT")((i, s) => {s"A$i, $s"})
   val typeTags = (1 to x).map(i => s"A$i: TypeTag").foldLeft("RT: 
TypeTag")(_ + ", " + _)
   val inputTypes = (1 to x).foldRight("Nil")((i, s) => 
{s"ScalaReflection.schemaFor[A$i].dataType :: $s"})
+  val nullableTypes = (1 to x).foldRight("Nil")((i, s) => 
{s"ScalaReflection.schemaFor[A$i].nullable :: $s"})
--- End diff --

Yeah that can be optimized. I'll fix the MiMa issue too by restoring a 
constructor.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21079: [SPARK-23992][CORE] ShuffleDependency does not ne...

2018-08-09 Thread 10110346
Github user 10110346 closed the pull request at:

https://github.com/apache/spark/pull/21079


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22063
  
the idea LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22063#discussion_r209135836
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala ---
@@ -114,6 +114,7 @@ class UDFRegistration private[sql] (functionRegistry: 
FunctionRegistry) extends
   val types = (1 to x).foldRight("RT")((i, s) => {s"A$i, $s"})
   val typeTags = (1 to x).map(i => s"A$i: TypeTag").foldLeft("RT: 
TypeTag")(_ + ", " + _)
   val inputTypes = (1 to x).foldRight("Nil")((i, s) => 
{s"ScalaReflection.schemaFor[A$i].dataType :: $s"})
+  val nullableTypes = (1 to x).foldRight("Nil")((i, s) => 
{s"ScalaReflection.schemaFor[A$i].nullable :: $s"})
--- End diff --

instead of having 2 list, shall we just keep a 
`Seq[ScalaReflection.Schema]` or `Seq[(DataType, Boolean)]`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22053
  
LGTM

is this a data correctness issue? how far shall we backport it?

cc @tgravescs 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22061: [SPARK-25079][PYTHON] preparing for python 3.5 bump

2018-08-09 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/22061
  
this probably won't be able to get merged for a while.

On Thu, Aug 9, 2018 at 7:01 PM, Hyukjin Kwon 
wrote:

> *@HyukjinKwon* approved this pull request.
>
> Change itself LGTM. Will push this in when it's ready and if my hand is
> needed to get this in. Please let me know later @shaneknapp
> .
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> ,
> or mute the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22043: [SPARK-24251][SQL] Add analysis tests for AppendD...

2018-08-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22043


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22043: [SPARK-24251][SQL] Add analysis tests for AppendData.

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22043
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22060: [DO NOT MERGE][TEST ONLY] Add once-policy rule check

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22060
  
**[Test build #94540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94540/testReport)**
 for PR 22060 at commit 
[`3236568`](https://github.com/apache/spark/commit/323656872799b8dd636061220f3ed139379c9c79).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22060: [DO NOT MERGE][TEST ONLY] Add once-policy rule check

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22060
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2028/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22060: [DO NOT MERGE][TEST ONLY] Add once-policy rule check

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22060
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22017
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22017
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94526/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22017
  
**[Test build #94526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94526/testReport)**
 for PR 22017 at commit 
[`3c849cb`](https://github.com/apache/spark/commit/3c849cbe70922bd22029b41f2558100dfbc16d9e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22060: [DO NOT MERGE][TEST ONLY] Add once-policy rule check

2018-08-09 Thread maryannxue
Github user maryannxue commented on the issue:

https://github.com/apache/spark/pull/22060
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22009#discussion_r209133225
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java
 ---
@@ -27,10 +27,10 @@
 @InterfaceStability.Evolving
 public interface SessionConfigSupport extends DataSourceV2 {
 
-/**
- * Key prefix of the session configs to propagate. Spark will extract 
all session configs that
- * starts with `spark.datasource.$keyPrefix`, turn 
`spark.datasource.$keyPrefix.xxx - yyy`
- * into `xxx - yyy`, and propagate them to all data source 
operations in this session.
- */
-String keyPrefix();
+  /**
+   * Key prefix of the session configs to propagate. Spark will extract 
all session configs that
+   * starts with `spark.datasource.$keyPrefix`, turn 
`spark.datasource.$keyPrefix.xxx - yyy`
--- End diff --

`datasource` is a string literal here. So kafka source should implement the 
`keyPrefix` as `kafka`, and then all configs starts with 
`spark.datasource.kafka` will be propagated to kafka source.

see the test: 
https://github.com/windjammertechnologies/spark/commit/9c289a5cb46e00cd60db4794357f070dfdf80691#diff-e6ed4ac7b1ceb2f3a25e92b031aaecbbR24


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20637
  
**[Test build #94539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94539/testReport)**
 for PR 20637 at commit 
[`0f7ae11`](https://github.com/apache/spark/commit/0f7ae11df7e53cdcf4576b7f558f3135e951dcb7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20637
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2027/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22064
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2026/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22064
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22037
  
**[Test build #94538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94538/testReport)**
 for PR 22037 at commit 
[`1c5d228`](https://github.com/apache/spark/commit/1c5d228a05af6bab0c89a432b5446647746f117d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22064: [MINOR][BUILD] Add ECCN notice required by http://www.ap...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22064
  
**[Test build #94537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94537/testReport)**
 for PR 22064 at commit 
[`878e5ca`](https://github.com/apache/spark/commit/878e5ca274a3b9e5fe37f4e0c2ed4b499bc81676).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22064: [MINOR][BUILD] Add ECCN notice required by http:/...

2018-08-09 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/22064

[MINOR][BUILD] Add ECCN notice required by 
http://www.apache.org/dev/crypto.html

## What changes were proposed in this pull request?

Add ECCN notice required by http://www.apache.org/dev/crypto.html
See https://issues.apache.org/jira/browse/LEGAL-398

This should probably be backported to 2.3, 2.2, as that's when the key dep 
(commons crypto) turned up. BC is actually unused, but still there.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark ECCN

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22064


commit 878e5ca274a3b9e5fe37f4e0c2ed4b499bc81676
Author: Sean Owen 
Date:   2018-08-10T02:35:32Z

Add ECCN notice required by http://www.apache.org/dev/crypto.html




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2025/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20637
  
The failure of 
`org.apache.spark.sql.catalyst.expressions.JsonExpressionsSuite.from_json 
missing fields` is due to passing `null` while the schema has `nullable=false`.

This inconsistency is agreed in the discussion at 
[SPARK-23173](https://issues.apache.org/jira/browse/SPARK-23173). 
`Assume that each field in schema passed to from_json is nullable, and 
ignore the nullability information set in the passed schema.`

When `spark.sql.fromJsonForceNullableSchema=false`, I think that a test is 
invalid to pass `nullable=false` in the corresponding schema to the missing 
field.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22037
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22037#discussion_r209130685
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -475,6 +498,41 @@ class AvroSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils {
 checkAnswer(df, expected)
   }
 
+  test("Logical type: Decimal") {
+val expected = Seq((1.23, 45.67), (65.37, 81.39))
+  .map { d =>
+Row(new java.math.BigDecimal(d._1.toString), new 
java.math.BigDecimal(d._2.toString))
+  }
+val df = spark.read.format("avro").load(decimalAvro)
+
+checkAnswer(df, expected)
+
+val avroSchema = s"""
+  {
+"namespace": "logical",
+"type": "record",
+"name": "test",
+"fields": [
+  {"name": "bytes", "type":
+ {"type": "bytes", "logicalType": "decimal", "precision": 4, 
"scale": 2}
+  },
+  {"name": "fixed", "type":
+{"type": "fixed", "size": 5, "logicalType": "decimal",
+  "precision": 4, "scale": 2, "name": "foo"}
--- End diff --

One option might be to use json4s and convert it to JSON string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130247
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -297,9 +318,44 @@ class RelationalGroupedDataset protected[sql](
   }
 
   /**
-   * Pivots a column of the current `DataFrame` and performs the specified 
aggregation.
+   * Compute the logical and of all boolean columns for each group.
+   * The resulting `DataFrame` will also contain the grouping columns.
+   * When specified columns are given, only compute the sum for them.
+   *
+   * @since 2.2.0
--- End diff --

nit: since version


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130219
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala ---
@@ -88,7 +88,7 @@ class RelationalGroupedDataset protected[sql](
   }
 
   private[this] def aggregateNumericColumns(colNames: String*)(f: 
Expression => AggregateFunction)
-: DataFrame = {
--- End diff --

nit: previous indentation was correct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -1555,9 +1555,11 @@ case class Left(str: Expression, len: Expression, 
child: Expression) extends Run
  * number of bytes of the given binary expression.
  */
 @ExpressionDescription(
-  usage = "_FUNC_(expr) - Returns the character length of string data or 
number of bytes of " +
-"binary data. The length of string data includes the trailing spaces. 
The length of binary " +
-"data includes binary zeros.",
+  usage = """
--- End diff --

Looks unrelated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22047: [SPARK-19851] Add support for EVERY and ANY (SOME...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22047#discussion_r209130078
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -202,6 +202,12 @@ def _():
""",
 }
 
+_functions_2_2 = {
--- End diff --

hm, looks unrelated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21732: [SPARK-24762][SQL] Enable Option of Product encod...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21732#discussion_r209130069
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
 ---
@@ -43,20 +43,17 @@ import org.apache.spark.util.Utils
  *to the name `value`.
  */
 object ExpressionEncoder {
-  def apply[T : TypeTag](): ExpressionEncoder[T] = {
+  // Constructs an encoder for top-level row.
+  def apply[T : TypeTag](): ExpressionEncoder[T] = apply(topLevel = true)
+
+  /**
+   * @param topLevel whether the encoders to construct are for top-level 
row.
+   */
+  def apply[T : TypeTag](topLevel: Boolean): ExpressionEncoder[T] = {
--- End diff --

where do we call this apply with `topLevel = false`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22047
  
> Please give credit to @ptkool for this work.

FWIW, we can now credit to multiple people per 
https://github.com/apache/spark/commit/51bee7aca13451167fa3e701fcd60f023eae5e61 
:-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94536/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
cc @hvanhovell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
From a cursory look, the last failure looks unrelated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
@mallman, can we close this PR? Are you willing to update here or not?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21490: [SPARK-24462][SS] Initialize the offsets correctly when ...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21490
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22037#discussion_r209129230
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala 
---
@@ -138,10 +142,21 @@ class AvroDeserializer(rootAvroType: Schema, 
rootCatalystType: DataType) {
 bytes
   case b: Array[Byte] => b
   case other => throw new RuntimeException(s"$other is not a valid 
avro binary.")
-
 }
 updater.set(ordinal, bytes)
 
+  case (FIXED, d: DecimalType) => (updater, ordinal, value) =>
+val bigDecimal = 
decimalConversions.fromFixed(value.asInstanceOf[GenericFixed], avroType,
+  LogicalTypes.decimal(d.precision, d.scale))
--- End diff --

parquet can convert binary to unscaled long directly, shall we follow?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21490: [SPARK-24462][SS] Initialize the offsets correctly when ...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21490
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94525/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94535/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21490: [SPARK-24462][SS] Initialize the offsets correctly when ...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21490
  
**[Test build #94525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94525/testReport)**
 for PR 21490 at commit 
[`c5ff731`](https://github.com/apache/spark/commit/c5ff731f2172b52fd1b42fa40ba38d564d203434).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22063
  
**[Test build #94535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94535/testReport)**
 for PR 22063 at commit 
[`92598f0`](https://github.com/apache/spark/commit/92598f0bbf55b19fc833745c88e273ffaea2b139).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22037: [SPARK-24774][SQL] Avro: Support logical decimal ...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22037#discussion_r209127634
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -455,6 +455,8 @@ object Decimal {
   def apply(unscaled: Long, precision: Int, scale: Int): Decimal =
 new Decimal().set(unscaled, precision, scale)
 
+  def apply(value: Array[Byte]): Decimal = 
Decimal(value.map(_.toChar).mkString)
--- End diff --

why do we need it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22063
  
**[Test build #94535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94535/testReport)**
 for PR 22063 at commit 
[`92598f0`](https://github.com/apache/spark/commit/92598f0bbf55b19fc833745c88e273ffaea2b139).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22063
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2024/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

2018-08-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22063#discussion_r209127319
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2149,28 +2149,29 @@ class Analyzer(
 
   case p => p transformExpressionsUp {
 
-case udf @ ScalaUDF(func, _, inputs, _, _, _, _) =>
-  val parameterTypes = ScalaReflection.getParameterTypes(func)
-  assert(parameterTypes.length == inputs.length)
-
-  // TODO: skip null handling for not-nullable primitive inputs 
after we can completely
-  // trust the `nullable` information.
-  // (cls, expr) => cls.isPrimitive && expr.nullable
-  val needsNullCheck = (cls: Class[_], expr: Expression) =>
-cls.isPrimitive && !expr.isInstanceOf[KnownNotNull]
-  val inputsNullCheck = parameterTypes.zip(inputs)
-.filter { case (cls, expr) => needsNullCheck(cls, expr) }
-.map { case (_, expr) => IsNull(expr) }
-.reduceLeftOption[Expression]((e1, e2) => Or(e1, e2))
-  // Once we add an `If` check above the udf, it is safe to mark 
those checked inputs
-  // as not nullable (i.e., wrap them with `KnownNotNull`), 
because the null-returning
-  // branch of `If` will be called if any of these checked inputs 
is null. Thus we can
-  // prevent this rule from being applied repeatedly.
-  val newInputs = parameterTypes.zip(inputs).map{ case (cls, expr) 
=>
-if (needsNullCheck(cls, expr)) KnownNotNull(expr) else expr }
-  inputsNullCheck
-.map(If(_, Literal.create(null, udf.dataType), 
udf.copy(children = newInputs)))
-.getOrElse(udf)
+case udf@ScalaUDF(func, _, inputs, _, _, _, _, nullableTypes) =>
+  if (nullableTypes.isEmpty) {
--- End diff --

This is probably the weak point: unless there is nullability info, don't do 
anything to the UDF plan, but, that's probably wrong in some cases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

2018-08-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22063#discussion_r209127367
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -39,6 +39,7 @@ import org.apache.spark.sql.types.DataType
  * @param nullable  True if the UDF can return null value.
  * @param udfDeterministic  True if the UDF is deterministic. 
Deterministic UDF returns same result
  *  each time it is invoked with a particular 
input.
+ * @param nullableTypes which of the inputTypes are nullable (i.e. not 
primitive)
--- End diff --

The approach here is to capture at registration time whether the arg types 
are primitive, or nullable. Not a great way to record this, but might be the 
least hack for now


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22063: [SPARK-25044][SQL] Address translation of LMF clo...

2018-08-09 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/22063

[SPARK-25044][SQL] Address translation of LMF closure primitive args to 
Object in Scala 2.12

## What changes were proposed in this pull request?

First attempt to resolve issue with inferring func types in 2.12 by instead 
using info captured when UDF is registered -- capturing which types are 
nullable (i.e. not primitive)

## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-25044

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22063.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22063


commit 0242218cdd512cc6e23a96621852fd3e019b7fc3
Author: Sean Owen 
Date:   2018-08-10T01:56:54Z

First attempt to resolve issue with inferring func types in 2.12 by instead 
using info captured when UDF is registered -- capturing which types are 
nullable (i.e. not primitive)




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22009#discussion_r209126618
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java
 ---
@@ -21,33 +21,39 @@
 
 import org.apache.spark.annotation.InterfaceStability;
 import org.apache.spark.sql.SaveMode;
-import org.apache.spark.sql.sources.v2.writer.DataSourceWriter;
+import org.apache.spark.sql.sources.v2.writer.BatchWriteSupport;
 import org.apache.spark.sql.types.StructType;
 
 /**
  * A mix-in interface for {@link DataSourceV2}. Data sources can implement 
this interface to
- * provide data writing ability and save the data to the data source.
+ * provide data writing ability for batch processing.
+ *
+ * This interface is used when end users want to use a data source 
implementation directly, e.g.
+ * {@code Dataset.write.format(...).option(...).save()}.
  */
 @InterfaceStability.Evolving
-public interface WriteSupport extends DataSourceV2 {
+public interface BatchWriteSupportProvider extends DataSourceV2 {
 
   /**
-   * Creates an optional {@link DataSourceWriter} to save the data to this 
data source. Data
+   * Creates an optional {@link BatchWriteSupport} to save the data to 
this data source. Data
* sources can return None if there is no writing needed to be done 
according to the save mode.
*
* If this method fails (by throwing an exception), the action will fail 
and no Spark job will be
* submitted.
*
-   * @param writeUUID A unique string for the writing job. It's possible 
that there are many writing
-   *  jobs running at the same time, and the returned 
{@link DataSourceWriter} can
-   *  use this job id to distinguish itself from other 
jobs.
+   * @param queryId A unique string for the writing query. It's possible 
that there are many
+   *writing queries running at the same time, and the 
returned
+   *{@link BatchWriteSupport} can use this id to 
distinguish itself from others.
* @param schema the schema of the data to be written.
* @param mode the save mode which determines what to do when the data 
are already in this data
* source, please refer to {@link SaveMode} for more details.
* @param options the options for the returned data source writer, which 
is an immutable
*case-insensitive string-to-string map.
-   * @return a writer to append data to this data source
+   * @return a write support to write data to this data source.
*/
-  Optional createWriter(
-  String writeUUID, StructType schema, SaveMode mode, 
DataSourceOptions options);
+  Optional createBatchWriteSupport(
+  String queryId,
+  StructType schema,
+  SaveMode mode,
--- End diff --

The problem here is, if we don't take `saveMode`, the only end-user API to 
write to a data source is: `df.write.format(...).mode("append").save()`. That 
makes data source v2 totally unusable before we introduce the new write APIs.

I hope we can get this in before Spark 2.4, so that some data source 
projects can start migrating and experimenting.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22059: [SPARK-25036][SQL] Avoid discarding unmoored doc comment...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22059
  
Yea, It would be better to put those error into fewer PRs if possible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...

2018-08-09 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21847
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22053
  
**[Test build #94534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94534/testReport)**
 for PR 22053 at commit 
[`d95d357`](https://github.com/apache/spark/commit/d95d35794528702a2de5523ca00334d479598c57).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22053
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22053
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2023/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22059: [SPARK-25036][SQL] Avoid discarding unmoored doc comment...

2018-08-09 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22059
  
cc @srowen @ueshin @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >