date:20180427

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184731765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1208,6 +1208,13 @@ object SQLConf {
   .stringConf
   .createWithDefault("")
 
+  val REJECT_TIMEZONE_IN_STRING = 
buildConf("spark.sql.function.rejectTimezoneInString")
+.internal()
+.doc("If true, `to_utc_timestamp` and `from_utc_timestamp` return null 
if the input string " +
+  "contains a timezone part, e.g. `2000-10-10 00:00:00+00:00`.")
+.booleanConf
+.createWithDefault(true)
+
--- End diff --

existing workloads may depend on the previous behavior and think that's 
corrected. It's safer to provide an internal conf and tell users about it when 
they complain about behavior change. It's an internal conf and is invisible to 
end users. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21028
  
**[Test build #89930 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89930/testReport)**
 for PR 21028 at commit 
[`076fc69`](https://github.com/apache/spark/commit/076fc698d4054b757e5afb14d1d6bfc190c2c6f7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21182
  
**[Test build #89929 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89929/testReport)**
 for PR 21182 at commit 
[`8a8ff3f`](https://github.com/apache/spark/commit/8a8ff3f5bfdfaee7ec73e362cfa34261d199f407).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21182: [SPARK-24068] Propagating DataFrameReader's options to T...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21182
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should contai...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21181
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2721/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should contai...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21181
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully method ...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21175
  
**[Test build #89928 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89928/testReport)**
 for PR 21175 at commit 
[`fb527c8`](https://github.com/apache/spark/commit/fb527c87a1f4ddb05eb601038736aeb4ec3f7223).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully method ...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21175
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89928/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r184730604
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -19,14 +19,41 @@ package org.apache.spark.sql.catalyst.expressions
 import java.util.Comparator
 
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import org.apache.spark.sql.catalyst.analysis.{TypeCheckResult, 
TypeCoercion}
 import org.apache.spark.sql.catalyst.expressions.codegen._
 import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData, 
MapData, TypeUtils}
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.Platform
 import org.apache.spark.unsafe.array.ByteArrayMethods
 import org.apache.spark.unsafe.types.{ByteArray, UTF8String}
 
+/**
+ * Base trait for [[BinaryExpression]]s with two arrays of the same 
element type and implicit
+ * casting.
+ */
+trait BinaryArrayExpressionWithImplicitCast extends BinaryExpression
--- End diff --

The `ImplicitCastInputTypes` trait is able to work with any number of 
children. Would it be possible to implement this trait to behave in the same 
way?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21182: [SPARK-24068] Propagating DataFrameReader's optio...

2018-04-27 Thread MaxGekk

GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/21182

[SPARK-24068] Propagating DataFrameReader's options to Text datasource on 
schema inferring

## What changes were proposed in this pull request?

While reading CSV or JSON files, DataFrameReader's options are converted to 
Hadoop's parameters, for example there:

https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L302

but the options are not propagated to Text datasource on schema inferring, 
for instance:

https://github.com/apache/spark/blob/branch-2.3/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala#L184-L188

The PR proposes propagation of user's options to Text datasource on scheme 
inferring in similar way as user's options are converted to Hadoop parameters 
if schema is specified.

## How was this patch tested?
The changes were tested manually by using 
https://github.com/twitter/hadoop-lzo:

```
hadoop-lzo> mvn clean package
hadoop-lzo> ln -s ./target/hadoop-lzo-0.4.21-SNAPSHOT.jar ./hadoop-lzo.jar
```
Create 2 test files in JSON and CSV format and compress them:
```shell
$ cat test.csv
col1|col2
a|1
$ lzop test.csv
$ cat test.json
{"col1":"a","col2":1}
$ lzop test.json
```
Run `spark-shell` with hadoop-lzo:
```
bin/spark-shell --jars ~/hadoop-lzo/hadoop-lzo.jar
```
reading compressing CSV and JSON without schema:
```scala
spark.read.option("io.compression.codecs", 
"com.hadoop.compression.lzo.LzopCodec").option("inferSchema",true).option("header",true).option("sep","|").csv("test.csv.lzo").show()
+++
|col1|col2|
+++
|   a|   1|
+++
```
```scala
spark.read.option("io.compression.codecs", 
"com.hadoop.compression.lzo.LzopCodec").option("multiLine", 
true).json("test.json.lzo").printSchema
root
 |-- col1: string (nullable = true)
 |-- col2: long (nullable = true)
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 text-options

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21182.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21182


commit 8a8ff3f5bfdfaee7ec73e362cfa34261d199f407
Author: Maxim Gekk 
Date:   2018-04-27T13:23:40Z

Propagating DataFrameReader's options to the text datasource on schema 
inferring




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should contai...

2018-04-27 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21181
  
cc @mn-mikke @ueshin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should contai...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21181
  
**[Test build #89927 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89927/testReport)**
 for PR 21181 at commit 
[`5660de8`](https://github.com/apache/spark/commit/5660de8d93ff96aa4dd840e74a1b87294e543838).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully method ...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21175
  
**[Test build #89928 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89928/testReport)**
 for PR 21175 at commit 
[`fb527c8`](https://github.com/apache/spark/commit/fb527c87a1f4ddb05eb601038736aeb4ec3f7223).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully method ...

2018-04-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21175
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21181: [SPARK-23736][SQL][FOLLOWUP] Error message should...

2018-04-27 Thread mgaido91

GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/21181

[SPARK-23736][SQL][FOLLOWUP] Error message should contains SQL types

## What changes were proposed in this pull request?

In the error messages we should return the SQL types (like `string` rather 
than the internal types like `StringType`).

## How was this patch tested?

added UT

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23736_followup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21181


commit 5660de8d93ff96aa4dd840e74a1b87294e543838
Author: Marco Gaido 
Date:   2018-04-27T15:45:21Z

[SPARK-23736][SQL][FOLLOWUP] Error message should contains SQL types




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-27 Thread manbuyun

Github user manbuyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184728954
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,12 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
 val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
 bytes.limit(bytes.position() + ioSize)
 channel.write(bytes)
+bytes.limit(curChunkLimit)
--- End diff --

I have commit this modified


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-27 Thread manbuyun

Github user manbuyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184727560
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,12 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
 val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
 bytes.limit(bytes.position() + ioSize)
 channel.write(bytes)
+bytes.limit(curChunkLimit)
--- End diff --

Right. When channel write throw IOException


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21141: [SPARK-23853][PYSPARK][TEST] Run Hive-related PySpark te...

2018-04-27 Thread bersprockets

Github user bersprockets commented on the issue:

https://github.com/apache/spark/pull/21141
  
@dongjoon-hyun @HyukjinKwon My PR is no longer addressing the issue 
described its associated Jira 
[(SPARK-23776),](https://issues.apache.org/jira/browse/SPARK-23776) which is 
that developers don't know what to do when they run the pyspark tests and get a 
failure with a UDF registration error (or Hive assembly missing error), as 
Holden experienced earlier. My PR morphed into a "skip the tests for missing 
components" change.

After these "skip tests" PRs go through, I will revisit this. In the 
meantime, feel free to use/ignore whatever is in #20909.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2720/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) qu...

2018-04-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21177#discussion_r184725980
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -78,7 +81,7 @@ object TPCDSQueryBenchmark extends Logging {
   }
   val numRows = queryRelations.map(tableSizes.getOrElse(_, 0L)).sum
   val benchmark = new Benchmark(s"TPCDS Snappy", numRows, 5)
-  benchmark.addCase(name) { i =>
+  benchmark.addCase(s"$name$nameSuffix") { _ =>
--- End diff --

how about 
```
benchmark.addCase(s"$name$nameSuffix")(_ => 
spark.sql(queryString).collect())
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) qu...

2018-04-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/21177#discussion_r184724132
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -87,10 +90,20 @@ object TPCDSQueryBenchmark extends Logging {
 }
   }
 
+  def filterQueries(
+  origQueries: Seq[String],
+  args: TPCDSQueryBenchmarkArguments): Seq[String] = {
+if (args.queryFilter.nonEmpty) {
+  origQueries.filter { case queryName => 
args.queryFilter.contains(queryName) }
--- End diff --

```
origQueries.filter(args.queryFilter.contains)
```
maybe better? :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89926/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21028
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21028
  
**[Test build #89926 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89926/testReport)**
 for PR 21028 at commit 
[`1dbcd0c`](https://github.com/apache/spark/commit/1dbcd0c68171ee5375e54a320b4314741a135fbd).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21028
  
**[Test build #89926 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89926/testReport)**
 for PR 21028 at commit 
[`1dbcd0c`](https://github.com/apache/spark/commit/1dbcd0c68171ee5375e54a320b4314741a135fbd).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21136: [SPARK-24061][SS]Add TypedFilter support for cont...

2018-04-27 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/21136#discussion_r184719162
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
 ---
@@ -771,6 +778,16 @@ class UnsupportedOperationsSuite extends SparkFunSuite 
{
 }
   }
 
+  /** Assert that the logical plan is supportd for continuous procsssing 
mode */
--- End diff --

nit:`supportd` -> `supported`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21136: [SPARK-24061][SS]Add TypedFilter support for cont...

2018-04-27 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/21136#discussion_r184718790
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
 ---
@@ -840,4 +857,9 @@ class UnsupportedOperationsSuite extends SparkFunSuite {
 def this(attribute: Attribute) = this(Seq(attribute))
 override def isStreaming: Boolean = true
   }
+
+  case class StreamingRelationV2(output: Seq[Attribute]) extends LeafNode {
--- End diff --

nit: rename this to `TestStreamingRelationV2`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20946: [SPARK-23565] [SS] New error message for structur...

2018-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20946


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20946: [SPARK-23565] [SS] New error message for structured stre...

2018-04-27 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/20946
  
Thanks! Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-27 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184713695
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/io/ChunkedByteBuffer.scala ---
@@ -63,10 +63,12 @@ private[spark] class ChunkedByteBuffer(var chunks: 
Array[ByteBuffer]) {
*/
   def writeFully(channel: WritableByteChannel): Unit = {
 for (bytes <- getChunks()) {
-  while (bytes.remaining() > 0) {
+  val curChunkLimit = bytes.limit()
+  while (bytes.hasRemaining) {
 val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
 bytes.limit(bytes.position() + ioSize)
 channel.write(bytes)
+bytes.limit(curChunkLimit)
--- End diff --

I would rewrite this using:
```
try {
val ioSize = Math.min(bytes.remaining(), bufferWriteChunkSize)
bytes.limit(bytes.position() + ioSize)
channel.write(bytes)
} finally {
bytes.limit(curChunkLimit)
}
```
to be safe.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple for pick...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21180
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple for pick...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21180
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21180: [SPARK-22674][PYTHON] Disabled _hack_namedtuple f...

2018-04-27 Thread superbobry

GitHub user superbobry opened a pull request:

https://github.com/apache/spark/pull/21180

[SPARK-22674][PYTHON] Disabled _hack_namedtuple for picklable namedtuples

Prior to this PR ``_hack_namedtuple`` was applied to all namedtuples,
regardless if they were defined on the top level of some module, and
are therefore picklable using the default ``__reduce__`` implementation,
or not.

The PR ensures that only the non-picklable namedtuples are hacked, i.e.
the ones defined in the REPL or locally in a function or method.

Note that the namedtuple might be defined locally but still be picklable
without the hack applied.

def define():
global Foo
Foo = namedtuple("Foo", [])

The current implementation does not cover such cases and will apply
the hack anyway.

Sidenote: the PR also fixes the module name of the hacked namedtuples.
Due to an extra layer of indirection added by ``_hijack_namedtuple``,
all hacked namedtuples had "collections" as ``__module__``. This
behaviour is no longer the case.

SerializationTestCase and RDDTests.

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/criteo-forks/spark 
hijack-non-importable-namedtuple

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21180


commit 37b0f6d14fcd48b9bd05b6f43b5cfb6284200367
Author: Sergei Lebedev 
Date:   2018-04-27T14:34:12Z

[SPARK-22674][PYTHON] Disabled _hack_namedtuple for picklable namedtuples

Prior to this PR ``_hack_namedtuple`` was applied to all namedtuples,
regardless if they were defined on the top level of some module, and
are therefore picklable using the default ``__reduce__`` implementation,
or not.

The PR ensures that only the non-picklable namedtuples are hacked, i.e.
the ones defined in the REPL or locally in a function or method.

Note that the namedtuple might be defined locally but still be picklable
without the hack applied.

def define():
global Foo
Foo = namedtuple("Foo", [])

The current implementation do not cover such cases, and will apply
the hack anyway.

Sidenote: the PR also fixes the module name of the hacked namedtuples.
Due to an extra layer of indirection added by ``_hijack_namedtuple``,
all hacked namedtuples had "collections" as ``__module__``. This
behaviour is no longer the case.

SerializationTestCase and RDDTests.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21157: [SPARK-22674][PYTHON] Removed the namedtuple pickling pa...

2018-04-27 Thread superbobry

Github user superbobry commented on the issue:

https://github.com/apache/spark/pull/21157
  
Closing in favour of #21180.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21157: [SPARK-22674][PYTHON] Removed the namedtuple pick...

2018-04-27 Thread superbobry

Github user superbobry closed the pull request at:

https://github.com/apache/spark/pull/21157


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-27 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/21133
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r184706765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -288,6 +288,114 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  private lazy val elementType = 
inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def dataType: DataType = BooleanType
+
+  override def inputTypes: Seq[AbstractDataType] = left.dataType match {
--- End diff --

implicit type cast is allowed in Presto. I am pushing here a proposal of 
trait, let me know what you think about it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-27 Thread manbuyun

Github user manbuyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184706100
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -56,6 +56,15 @@ class ChunkedByteBufferSuite extends SparkFunSuite {
 assert(chunkedByteBuffer.getChunks().head.position() === 0)
   }
 
+  test("SPARK-24107: writeFully() write buffer which is larger than 
bufferWriteChunkSize") {
+val bufferWriteChunkSize = 
Option(SparkEnv.get).map(_.conf.get(config.BUFFER_WRITE_CHUNK_SIZE))
+
.getOrElse(config.BUFFER_WRITE_CHUNK_SIZE.defaultValue.get).toInt
--- End diff --

Ok. I have added. Please check


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89924/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21133
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #89924 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89924/testReport)**
 for PR 21153 at commit 
[`526fa4a`](https://github.com/apache/spark/commit/526fa4a96b61f4b5adb6606a92ed440879747a28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21133
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89921/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21133
  
**[Test build #89921 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89921/testReport)**
 for PR 21133 at commit 
[`2fa8da7`](https://github.com/apache/spark/commit/2fa8da744b1726284577deca6c70d184cdae3579).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #89925 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89925/testReport)**
 for PR 21021 at commit 
[`175d981`](https://github.com/apache/spark/commit/175d98195fc172655584b0dcf4087014e1377d12).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2719/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21021
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread mn-mikke

Github user mn-mikke commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r184700686
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -288,6 +288,114 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  private lazy val elementType = 
inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def dataType: DataType = BooleanType
+
+  override def inputTypes: Seq[AbstractDataType] = left.dataType match {
--- End diff --

@mgaido91 Sorry, I should have been more explicit. I've been referring to 
the below case that I added into `FunctionArgumentConversion` due to enabling 
type coercion of array types.
```
case c @ Concat(children) if children.forall(c => 
ArrayType.acceptsType(c.dataType)) &&
  !haveSameType(children) =>
  val types = children.map(_.dataType)
  findWiderCommonType(types) match {
case Some(finalDataType) => Concat(children.map(Cast(_, finalDataType)))
case None => c
  }
``` 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2718/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21153
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21153: [SPARK-24058][ML][PySpark] Default Params in ML should b...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21153
  
**[Test build #89924 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89924/testReport)**
 for PR 21153 at commit 
[`526fa4a`](https://github.com/apache/spark/commit/526fa4a96b61f4b5adb6606a92ed440879747a28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184693332
  
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, 
extraMetadata=None, paramMap=None):
 """
 uid = instance.uid
 cls = instance.__module__ + '.' + instance.__class__.__name__
-params = instance.extractParamMap()
+
+# User-supplied param values
+params = instance._paramMap
 jsonParams = {}
 if paramMap is not None:
 jsonParams = paramMap
 else:
 for p in params:
 jsonParams[p.name] = params[p]
+
+# Default param values
+jsonDefaultParams = {}
+for p in instance._defaultParamMap:
+jsonDefaultParams[p.name] = instance._defaultParamMap[p]
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21153: [SPARK-24058][ML][PySpark] Default Params in ML s...

2018-04-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21153#discussion_r184693307
  
--- Diff: python/pyspark/ml/util.py ---
@@ -417,15 +419,24 @@ def _get_metadata_to_save(instance, sc, 
extraMetadata=None, paramMap=None):
 """
 uid = instance.uid
 cls = instance.__module__ + '.' + instance.__class__.__name__
-params = instance.extractParamMap()
+
+# User-supplied param values
+params = instance._paramMap
 jsonParams = {}
 if paramMap is not None:
 jsonParams = paramMap
 else:
 for p in params:
 jsonParams[p.name] = params[p]
--- End diff --

`_paramMap`'s keys are `Param` not string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-27 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184687634
  
--- Diff: sql/core/src/test/resources/sql-tests/results/datetime.sql.out ---
@@ -82,9 +82,138 @@ struct
 1  2
 2  3
 
+
 -- !query 9
 select weekday('2007-02-03'), weekday('2009-07-30'), 
weekday('2017-05-27'), weekday(null), weekday('1582-10-15 13:10:15')
--- !query 3 schema
+-- !query 9 schema
 struct
--- !query 3 output
+-- !query 9 output
 5  3   5   NULL4
+
+
+-- !query 10
+select from_utc_timestamp('2015-07-24 00:00:00', 'PST')
+-- !query 10 schema
+struct
+-- !query 10 output
+2015-07-23 17:00:00
+
+
+-- !query 11
+select from_utc_timestamp('2015-01-24 00:00:00', 'PST')
+-- !query 11 schema
+struct
+-- !query 11 output
+2015-01-23 16:00:00
+
+
+-- !query 12
+select from_utc_timestamp(null, 'PST')
+-- !query 12 schema
+struct
+-- !query 12 output
+NULL
+
+
+-- !query 13
+select from_utc_timestamp('2015-07-24 00:00:00', null)
+-- !query 13 schema
+struct
+-- !query 13 output
+NULL
+
+
+-- !query 14
+select from_utc_timestamp(null, null)
+-- !query 14 schema
+struct
+-- !query 14 output
+NULL
+
+
+-- !query 15
+select from_utc_timestamp(cast(0 as timestamp), 'PST')
+-- !query 15 schema
+struct
+-- !query 15 output
+1969-12-31 08:00:00
--- End diff --

Since we're adding new SQLConf settings anyway, we could have a 
"timestamp.hive.compatibility" (or something like that), that is true by 
default and allows select from_utc_timestamp(cast(0 as timestamp), 'PST') to 
continue to produce the above answer. However, when false, it would treat 0 as 
1970-01-01T00:00:00 UTC, so the above would instead produce the answer 
'1969-12-31 16:00:00' (which we both agree in the Jira is probably the more 
correct answer). Just a thought.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21040: [SPARK-23930][SQL] Add slice function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21040
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2717/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21040: [SPARK-23930][SQL] Add slice function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21040
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20604: [SPARK-23365][CORE] Do not adjust num executors when kil...

2018-04-27 Thread Ngone51

Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20604
  
ping @squito 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21040: [SPARK-23930][SQL] Add slice function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21040
  
**[Test build #89923 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89923/testReport)**
 for PR 21040 at commit 
[`72ed607`](https://github.com/apache/spark/commit/72ed607c7d59b1f7f821f48f74e19d06b73758a7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21169: [SPARK-23715][SQL] the input of to/from_utc_times...

2018-04-27 Thread bersprockets

Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/21169#discussion_r184685378
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1208,6 +1208,13 @@ object SQLConf {
   .stringConf
   .createWithDefault("")
 
+  val REJECT_TIMEZONE_IN_STRING = 
buildConf("spark.sql.function.rejectTimezoneInString")
+.internal()
+.doc("If true, `to_utc_timestamp` and `from_utc_timestamp` return null 
if the input string " +
+  "contains a timezone part, e.g. `2000-10-10 00:00:00+00:00`.")
+.booleanConf
+.createWithDefault(true)
+
--- End diff --

Why would we need this? Currently, if a user passes '2000-10-10 
00:00:00+00:00' to _utc_timestamp, they get the wrong answer. If they 
switch off this setting, they will continue to get the wrong answer rather than 
null. Are we accommodating the users who experienced this bug and are manually 
shifting the result?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21028: [SPARK-23922][SQL] Add arrays_overlap function

2018-04-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21028#discussion_r184683126
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -288,6 +288,114 @@ case class ArrayContains(left: Expression, right: 
Expression)
   override def prettyName: String = "array_contains"
 }
 
+/**
+ * Checks if the two arrays contain at least one common element.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Returns true if a1 contains at least an 
element present also in a2.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(3, 4, 5));
+   true
+  """, since = "2.4.0")
+case class ArraysOverlap(left: Expression, right: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  private lazy val elementType = 
inputTypes.head.asInstanceOf[ArrayType].elementType
+
+  override def dataType: DataType = BooleanType
+
+  override def inputTypes: Seq[AbstractDataType] = left.dataType match {
--- End diff --

@mn-mikke I am not sure, since it is quite a strange case, since it allows 
also string and byte. I am not sure we can do this with implicit type cast.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89922/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21178
  
**[Test build #89922 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89922/testReport)**
 for PR 21178 at commit 
[`684fb26`](https://github.com/apache/spark/commit/684fb26b2de91b0c0e33874e06e9b93fa338b2cf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2716/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21178
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserFromKeyt...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21178
  
**[Test build #89922 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89922/testReport)**
 for PR 21178 at commit 
[`684fb26`](https://github.com/apache/spark/commit/684fb26b2de91b0c0e33874e06e9b93fa338b2cf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-27 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184675304
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -56,6 +56,15 @@ class ChunkedByteBufferSuite extends SparkFunSuite {
 assert(chunkedByteBuffer.getChunks().head.position() === 0)
   }
 
+  test("SPARK-24107: writeFully() write buffer which is larger than 
bufferWriteChunkSize") {
+val bufferWriteChunkSize = 
Option(SparkEnv.get).map(_.conf.get(config.BUFFER_WRITE_CHUNK_SIZE))
+
.getOrElse(config.BUFFER_WRITE_CHUNK_SIZE.defaultValue.get).toInt
--- End diff --

How about setting this value via `spark.buffer.write.chunkSize`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89920/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #89920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89920/testReport)**
 for PR 21021 at commit 
[`175d981`](https://github.com/apache/spark/commit/175d98195fc172655584b0dcf4087014e1377d12).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ArraySortUtil extends ExpectsInputTypes `
  * `case class ArraySort(child: Expression) extends UnaryExpression with 
ArraySortUtil `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21106: [SPARK-23711][SQL][WIP] Add fallback logic for UnsafePro...

2018-04-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21106
  
A sql conf sounds good to me. @hvanhovell What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21178: [SPARK-24110][Thrift-Server] Avoid UGI.loginUserF...

2018-04-27 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/21178#discussion_r184672313
  
--- Diff: 
sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java
 ---
@@ -362,4 +371,34 @@ public static void verifyProxyAccess(String realUser, 
String proxyUser, String i
 }
   }
 
+  public static boolean needUgiLogin(UserGroupInformation ugi, String 
principal, String keytab) {
+return null == ugi || !ugi.hasKerberosCredentials() || 
!ugi.getUserName().equals(principal) ||
+  !keytab.equals(getKeytabFromUgi());
+  }
+
+  private static String getKeytabFromUgi() {
+Class clz = UserGroupInformation.class;
+try {
+  synchronized (clz) {
+Field field = clz.getDeclaredField("keytabFile");
+field.setAccessible(true);
+return (String) field.get(null);
+  }
+} catch (NoSuchFieldException e) {
+  try {
+synchronized (clz) {
+  // In Hadoop 3 we don't have "keytabFile" field, instead we 
should use private method
+  // getKeytab().
+  Method method = clz.getDeclaredMethod("getKeytab");
+  method.setAccessible(true);
+  return (String) 
method.invoke(UserGroupInformation.getCurrentUser());
--- End diff --

What is the purpose of moving both field and method out of this method? I'm 
not sure is there any difference.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-27 Thread mn-mikke

Github user mn-mikke commented on the issue:

https://github.com/apache/spark/pull/21121
  
@ueshin What about combining `zip_with_index` with 
[`map_from_entries`](https://issues.apache.org/jira/browse/SPARK-23934)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21175: [SPARK-24107][CORE] ChunkedByteBuffer.writeFully ...

2018-04-27 Thread manbuyun

Github user manbuyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21175#discussion_r184668664
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/ChunkedByteBufferSuite.scala ---
@@ -56,6 +56,13 @@ class ChunkedByteBufferSuite extends SparkFunSuite {
 assert(chunkedByteBuffer.getChunks().head.position() === 0)
   }
 
+  test("SPARK-24107: writeFully() write buffer which is larger than 
bufferWriteChunkSize") {
+val chunkedByteBuffer = new 
ChunkedByteBuffer(Array(ByteBuffer.allocate(80 * 1024 * 1024)))
--- End diff --

I have modified.Please check


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21121: [SPARK-24042][SQL] Collection function: zip_with_index

2018-04-27 Thread lokm01

Github user lokm01 commented on the issue:

https://github.com/apache/spark/pull/21121
  
@ueshin Currently we use our own implementation of zipWithIndex when we do 
explode and need to preserve the ordering of the array elements (especially if 
there is a shuffle involved in the subsequent transformation).

Sure, once transform becomes available, it will be much better and more 
performant to use that, but since we're dealing with production applications, 
we would like to start rewriting these jobs with those small "drop-in" 
replacements for functions such as zipWithIndex before going for a major 
rewrite with HOFs in spark SQL.

I've seen many threads in the community, which recommend the same approach 
when dealing with these difficult array cases - I'm pretty sure it will benefit 
other users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21040: [SPARK-23930][SQL] Add slice function

2018-04-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21040#discussion_r184666357
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -287,3 +287,101 @@ case class ArrayContains(left: Expression, right: 
Expression)
 
   override def prettyName: String = "array_contains"
 }
+
+
+/**
+ * Slices an array according to the requested start index and length
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(a1, a2) - Subsets array x starting from index start (or 
starting from the end if start is negative) with the specified length.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3, 4), 2, 2);
+   [2,3]
+  > SELECT _FUNC_(array(1, 2, 3, 4), -2, 2);
+   [3,4]
+  """, since = "2.4.0")
+// scalastyle:on line.size.limit
+case class Slice(x: Expression, start: Expression, length: Expression)
+  extends TernaryExpression with ImplicitCastInputTypes {
+
+  override def dataType: DataType = x.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(ArrayType, 
IntegerType, IntegerType)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def children: Seq[Expression] = Seq(x, start, length)
+
+  override def nullSafeEval(xVal: Any, startVal: Any, lengthVal: Any): Any 
= {
+val startInt = startVal.asInstanceOf[Int]
+val lengthInt = lengthVal.asInstanceOf[Int]
+val arr = xVal.asInstanceOf[ArrayData]
+val startIndex = if (startInt == 0) {
+  throw new RuntimeException(
+s"Unexpected value for start in function $prettyName:  SQL array 
indices start at 1.")
+} else if (startInt < 0) {
+  startInt + arr.numElements()
+} else {
+  startInt - 1
+}
+if (lengthInt < 0) {
+  throw new RuntimeException(s"Unexpected value for length in function 
$prettyName: " +
+s"length must be greater than or equal to 0.")
+}
+// this can happen if start is negative and its absolute value is 
greater than the
+// number of elements in the array
+if (startIndex < 0) {
+  return new GenericArrayData(Array.empty[AnyRef])
+}
+val elementType = x.dataType.asInstanceOf[ArrayType].elementType
+val data = arr.toArray[AnyRef](elementType)
+new GenericArrayData(data.slice(startIndex, startIndex + lengthInt))
+  }
+
+  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+val elementType = x.dataType.asInstanceOf[ArrayType].elementType
+nullSafeCodeGen(ctx, ev, (x, start, length) => {
+  val arrayClass = classOf[GenericArrayData].getName
+  val values = ctx.freshName("values")
+  val i = ctx.freshName("i")
+  val startIdx = ctx.freshName("startIdx")
+  val resLength = ctx.freshName("resLength")
+  val defaultIntValue = 
CodeGenerator.defaultValue(CodeGenerator.JAVA_INT, false)
+  s"""
+ |${CodeGenerator.JAVA_INT} $startIdx = $defaultIntValue;
+ |${CodeGenerator.JAVA_INT} $resLength = $defaultIntValue;
+ |if ($start == 0) {
+ |  throw new RuntimeException("Unexpected value for start in 
function $prettyName: "
+ |+ "SQL array indices start at 1.");
+ |} else if ($start < 0) {
+ |  $startIdx = $start + $x.numElements();
+ |} else {
+ |  // arrays in SQL are 1-based instead of 0-based
+ |  $startIdx = $start - 1;
+ |}
+ |if ($length < 0) {
+ |  throw new RuntimeException("Unexpected value for length in 
function $prettyName: "
+ |+ "length must be greater than or equal to 0.");
+ |} else if ($length > $x.numElements() - $startIdx) {
+ |  $resLength = $x.numElements() - $startIdx;
+ |} else {
+ |  $resLength = $length;
+ |}
+ |Object[] $values;
+ |if ($startIdx < 0) {
+ |  $values = new Object[0];
+ |} else {
+ |  $values = new Object[$resLength];
+ |  for (int $i = 0; $i < $resLength; $i ++) {
+ |$values[$i] = ${CodeGenerator.getValue(x, elementType, s"$i 
+ $startIdx")};
--- End diff --

You are right, I am not sure why I missed it...maybe I checked outdated 
code. Sorry, I am fixing it, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085][SQL] Query returns UnsupportedOperationExc...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21174
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89918/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085][SQL] Query returns UnsupportedOperationExc...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21174
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21174: [SPARK-24085][SQL] Query returns UnsupportedOperationExc...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21174
  
**[Test build #89918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89918/testReport)**
 for PR 21174 at commit 
[`e6e9397`](https://github.com/apache/spark/commit/e6e9397b42c1ad39d58d7b1c11f7cb152f019c82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread juliuszsompolski

Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184662359
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ---
@@ -279,4 +282,11 @@ class ApproximatePercentileQuerySuite extends 
QueryTest with SharedSQLContext {
   checkAnswer(query, expected)
 }
   }
+
+  test("SPARK-24013: unneeded compress can cause performance issues with 
sorted input") {
+failAfter(30 seconds) {
+  checkAnswer(sql("select approx_percentile(id, array(0.1)) from 
range(1000)"),
+Row(Array(999160)))
--- End diff --

Ok. Yeah, looking at the other tests in this suite it's definitely fine :-).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21143
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21143
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89919/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21143: [SPARK-24072][SQL] clearly define pushed filters

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21143
  
**[Test build #89919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89919/testReport)**
 for PR 21143 at commit 
[`172dca0`](https://github.com/apache/spark/commit/172dca0f86f041ab9c53041f0715b7e5a682a89a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184658151
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ---
@@ -279,4 +282,11 @@ class ApproximatePercentileQuerySuite extends 
QueryTest with SharedSQLContext {
   checkAnswer(query, expected)
 }
   }
+
+  test("SPARK-24013: unneeded compress can cause performance issues with 
sorted input") {
+failAfter(30 seconds) {
+  checkAnswer(sql("select approx_percentile(id, array(0.1)) from 
range(1000)"),
+Row(Array(999160)))
--- End diff --

it is not the only place where it is checked with an exact answer, so I 
don't think it is an issue, a small change would anyway require to change many 
test cases answers. What do you think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread juliuszsompolski

Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184656896
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ---
@@ -279,4 +282,11 @@ class ApproximatePercentileQuerySuite extends 
QueryTest with SharedSQLContext {
   checkAnswer(query, expected)
 }
   }
+
+  test("SPARK-24013: unneeded compress can cause performance issues with 
sorted input") {
+failAfter(30 seconds) {
+  checkAnswer(sql("select approx_percentile(id, array(0.1)) from 
range(1000)"),
+Row(Array(999160)))
--- End diff --

nit:
With the approx nature of the algorithm, could the exact answer not get 
flakty through some small changes in code or config? (like e.g. the split of 
range into tasks, and then different merging of partial aggrs producing 
slightly different results)
maybe just asserting on collect().length == 1 would do?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21133
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2715/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21133
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21133: [SPARK-24013][SQL] Remove unneeded compress in Approxima...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21133
  
**[Test build #89921 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89921/testReport)**
 for PR 21133 at commit 
[`2fa8da7`](https://github.com/apache/spark/commit/2fa8da744b1726284577deca6c70d184cdae3579).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184654618
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -238,12 +238,6 @@ object ApproximatePercentile {
   summaries = summaries.insert(value)
   // The result of QuantileSummaries.insert is un-compressed
   isCompressed = false
-
-  // Currently, QuantileSummaries ignores the construction parameter 
compressThresHold,
-  // which may cause QuantileSummaries to occupy unbounded memory. We 
have to hack around here
-  // to make sure QuantileSummaries doesn't occupy infinite memory.
-  // TODO: Figure out why QuantileSummaries ignores construction 
parameter compressThresHold
-  if (summaries.sampled.length >= compressThresHoldBufferLength) 
compress()
--- End diff --

no problem at all, thanks for checking this :) I addressed you comment on 
the test. Any more comments?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread juliuszsompolski

Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184654132
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -238,12 +238,6 @@ object ApproximatePercentile {
   summaries = summaries.insert(value)
   // The result of QuantileSummaries.insert is un-compressed
   isCompressed = false
-
-  // Currently, QuantileSummaries ignores the construction parameter 
compressThresHold,
-  // which may cause QuantileSummaries to occupy unbounded memory. We 
have to hack around here
-  // to make sure QuantileSummaries doesn't occupy infinite memory.
-  // TODO: Figure out why QuantileSummaries ignores construction 
parameter compressThresHold
-  if (summaries.sampled.length >= compressThresHoldBufferLength) 
compress()
--- End diff --

Sorry, it's my fault of not reading the description attentively :-).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21021
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2714/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184653021
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ApproximatePercentileQuerySuite.scala
 ---
@@ -279,4 +282,10 @@ class ApproximatePercentileQuerySuite extends 
QueryTest with SharedSQLContext {
   checkAnswer(query, expected)
 }
   }
+
+  test("SPARK-24013: unneeded compress can cause performance issues with 
sorted input") {
+failAfter(20 seconds) {
+  assert(sql("select approx_percentile(id, array(0.1)) from 
range(1000)").count() == 1)
--- End diff --

nice catch, thanks, I started using collect during my tests than I moved to 
count but it was a mistake, I am fixing it, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21133: [SPARK-24013][SQL] Remove unneeded compress in Ap...

2018-04-27 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21133#discussion_r184652876
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala
 ---
@@ -238,12 +238,6 @@ object ApproximatePercentile {
   summaries = summaries.insert(value)
   // The result of QuantileSummaries.insert is un-compressed
   isCompressed = false
-
-  // Currently, QuantileSummaries ignores the construction parameter 
compressThresHold,
-  // which may cause QuantileSummaries to occupy unbounded memory. We 
have to hack around here
-  // to make sure QuantileSummaries doesn't occupy infinite memory.
-  // TODO: Figure out why QuantileSummaries ignores construction 
parameter compressThresHold
-  if (summaries.sampled.length >= compressThresHoldBufferLength) 
compress()
--- End diff --

Yes, the TODO was resolved in SPARK-17439. I thought I clearly stated it in 
the description, but if this is not the case or you have any suggestion about 
how to improve the description, I am happy to improve it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries i...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21177
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89916/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries i...

2018-04-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21177
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21177: [SPARK-24111][SQL] Add the TPCDS v2.7 (latest) queries i...

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21177
  
**[Test build #89916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89916/testReport)**
 for PR 21177 at commit 
[`0ef1b28`](https://github.com/apache/spark/commit/0ef1b28b72f072ff22f750c6263c7fdc46eec831).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21021: [SPARK-23921][SQL] Add array_sort function

2018-04-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21021
  
**[Test build #89920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89920/testReport)**
 for PR 21021 at commit 
[`175d981`](https://github.com/apache/spark/commit/175d98195fc172655584b0dcf4087014e1377d12).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 >

201 - 300 of 367 matches

Mail list logo