[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88159/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20727
  
**[Test build #88159 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88159/testReport)**
 for PR 20727 at commit 
[`d6e9160`](https://github.com/apache/spark/commit/d6e91604585b22a27fbd0b7caa0a8e96d3725400).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][K8s][BUILD] Initialize BUILD_ARGS in docke...

2018-03-10 Thread foxish
Github user foxish commented on the issue:

https://github.com/apache/spark/pull/20791
  
LGTM! Thanks @jooseong. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20793
  
Ah, results are different since the number of operations are different. It 
may be an issue like #20630.

I am curious why test are failure when seed is changed. Of course, I 
understand the sequence of rand must be reproducable with certain seed value in 
a package or implementation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the catalog fo...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the cat...

2018-03-10 Thread kevinyu98
GitHub user kevinyu98 opened a pull request:

https://github.com/apache/spark/pull/20795

[SPARK-23486]cache the function name from the catalog for lookupFunctions

## What changes were proposed in this pull request?

This PR will cache the function name from spark and external catalog, it is 
used by lookupFunctions in the analyzer, and it is cached for each query plan. 
The original problem is reported in the [ 
spark-19737](https://issues.apache.org/jira/browse/SPARK-19737) 

## How was this patch tested?

I did unit testing on local machine, it shows that the cache will be used 
if there multiple same functions in the same query. But I am not sure how I can 
add a test case into spark, can you advice? thanks.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kevinyu98/spark spark-23486

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20795.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20795


commit 701100c11126d7437dc03ef20b484e84e2f9cb2a
Author: Kevin Yu 
Date:   2018-03-11T06:40:27Z

cache the function name from the catalog for lookupFunction




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][K8s][BUILD] Initialize BUILD_ARGS in docke...

2018-03-10 Thread jooseong
Github user jooseong commented on the issue:

https://github.com/apache/spark/pull/20791
  
Added [K8s] into the PR title. Thanks for the review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20791
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88157/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20791
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20791
  
**[Test build #88157 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88157/testReport)**
 for PR 20791 at commit 
[`096e992`](https://github.com/apache/spark/commit/096e99287a72b3ea164dbdf6c90edf4b256a2623).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20793
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20793
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88156/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20793
  
**[Test build #88156 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88156/testReport)**
 for PR 20793 at commit 
[`bb40ef2`](https://github.com/apache/spark/commit/bb40ef2e8d337508d60903a6a824b5aa45d87326).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/20793
  
Does `hashSeed` method produce same hash value after this change?

```scala
scala> def hashSeed(seed: Long): Long = {
 |   val bytes = 
ByteBuffer.allocate(java.lang.Long.SIZE).putLong(seed).array()
 |   val lowBits = MurmurHash3.bytesHash(bytes)
 |   val highBits = MurmurHash3.bytesHash(bytes, lowBits)
 |   (highBits.toLong << 32) | (lowBits.toLong & 0xL)
 | }
hashSeed: (seed: Long)Long

scala> hashSeed(100)
res3: Long = 852394178374189935

scala> def hashSeed2(seed: Long): Long = {
 |   val bytes = 
ByteBuffer.allocate(java.lang.Long.BYTES).putLong(seed).array()
 |   val lowBits = MurmurHash3.bytesHash(bytes)
 |   val highBits = MurmurHash3.bytesHash(bytes, lowBits)
 |   (highBits.toLong << 32) | (lowBits.toLong & 0xL)
 | }
hashSeed2: (seed: Long)Long
scala> hashSeed2(100)
res7: Long = 1088402058313200430
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20727
  
**[Test build #88159 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88159/testReport)**
 for PR 20727 at commit 
[`d6e9160`](https://github.com/apache/spark/commit/d6e91604585b22a27fbd0b7caa0a8e96d3725400).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1454/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88158/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20727
  
**[Test build #88158 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88158/testReport)**
 for PR 20727 at commit 
[`97a8422`](https://github.com/apache/spark/commit/97a8422c63931ba1709523bb9bd1f60fffee597b).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1453/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20727
  
**[Test build #88158 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88158/testReport)**
 for PR 20727 at commit 
[`97a8422`](https://github.com/apache/spark/commit/97a8422c63931ba1709523bb9bd1f60fffee597b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20727: [SPARK-23577][SQL] Supports custom line separator for te...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20727
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r173642056
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -42,7 +52,12 @@ class HadoopFileLinesReader(
   Array.empty)
 val attemptId = new TaskAttemptID(new TaskID(new JobID(), 
TaskType.MAP, 0), 0)
 val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
-val reader = new LineRecordReader()
+val reader = if (lineSeparator != "\n") {
+  new LineRecordReader(lineSeparator.getBytes("UTF-8"))
--- End diff --

OK. Let me try to address this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88155/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20785
  
**[Test build #88155 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88155/testReport)**
 for PR 20785 at commit 
[`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2018-03-10 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/17774
  
LGTM
@tdas @zsxwing absent any objections from you in the next couple of days, 
I'll merge this


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-03-10 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/19431
  
@tdas any concerns?

If @omuravskiy doesn't express any objections (since these tests are 
basically taken directly from his linked PR) in the next couple of days, I'm 
inclined to merge this.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/19431#discussion_r173641331
  
--- Diff: 
external/kafka-0-8/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -456,6 +455,60 @@ class DirectKafkaStreamSuite
 ssc.stop()
   }
 
+  test("backpressure.initialRate should honor maxRatePerPartition") {
+backpressureTest(maxRatePerPartition = 1000, initialRate = 500, 
maxMessagesPerPartition = 250)
+  }
+
+  test("use backpressure.initialRate with backpressure") {
--- End diff --

Aren't the descriptions of these tests backwards, i.e. this the one testing 
that maxRatePerPartition is honored?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-10 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/20767
  
Can you clarify why you want to allow only 1 cached consumer per 
topicpartition, closing any others at task end?

It seems like opening and closing consumers would be less efficient than 
allowing a pool of more than one consumer per topicpartition.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20791
  
**[Test build #88157 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88157/testReport)**
 for PR 20791 at commit 
[`096e992`](https://github.com/apache/spark/commit/096e99287a72b3ea164dbdf6c90edf4b256a2623).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20793
  
**[Test build #88156 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88156/testReport)**
 for PR 20793 at commit 
[`bb40ef2`](https://github.com/apache/spark/commit/bb40ef2e8d337508d60903a6a824b5aa45d87326).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...

2018-03-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20791
  
@foxish


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...

2018-03-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20791
  
could you add [K8s] into PR title


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20791: [SPARK-23618][BUILD] Initialize BUILD_ARGS in docker-ima...

2018-03-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20791
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/20793
  
Jenkins, ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r173639932
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
 ---
@@ -39,9 +39,12 @@ private[text] class TextOptions(@transient private val 
parameters: CaseInsensiti
*/
   val wholeText = parameters.getOrElse(WHOLETEXT, "false").toBoolean
 
+  val lineSeparator: String = parameters.getOrElse(LINE_SEPARATOR, "\n")
+  require(lineSeparator.nonEmpty, s"'$LINE_SEPARATOR' cannot be an empty 
string.")
 }
 
 private[text] object TextOptions {
   val COMPRESSION = "compression"
   val WHOLETEXT = "wholetext"
+  val LINE_SEPARATOR = "lineSep"
--- End diff --

One example might sound counterintuitive to you but it looks less 
consistent with other places at least I usually refer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r173639748
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -30,9 +31,19 @@ import 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
 /**
  * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], 
which are all of the lines
  * in that file.
+ *
+ * @param file A part (i.e. "block") of a single file that should be read 
line by line.
+ * @param lineSeparator A line separator that should be used for each 
line. If the value is `None`,
+ *  it covers `\r`, `\r\n` and `\n`.
+ * @param conf Hadoop configuration
  */
 class HadoopFileLinesReader(
-file: PartitionedFile, conf: Configuration) extends Iterator[Text] 
with Closeable {
+file: PartitionedFile,
+lineSeparator: Option[String],
+conf: Configuration) extends Iterator[Text] with Closeable {
--- End diff --

Yup, I am sorry if I wasn't clear. I mean [the doc 
describes](https://hadoop.apache.org/docs/r2.7.1/api/index.html?org/apache/hadoop/io/Text.html):

> This class stores text using standard UTF8 encoding.

I was wondering if that's a official way to use `Text` because that sounds 
rather an informal workaround.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20785
  
**[Test build #88155 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88155/testReport)**
 for PR 20785 at commit 
[`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1452/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20785
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-10 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20767#discussion_r173636109
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -342,80 +415,103 @@ private[kafka010] object CachedKafkaConsumer extends 
Logging {
 }
   }
 
-  def releaseKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
+  private def releaseConsumer(intConsumer: InternalKafkaConsumer): Unit = {
 synchronized {
-  val consumer = cache.get(key)
-  if (consumer != null) {
-consumer.inuse = false
-  } else {
-logWarning(s"Attempting to release consumer that does not exist")
-  }
-}
-  }
 
-  /**
-   * Removes (and closes) the Kafka Consumer for the given topic, 
partition and group id.
-   */
-  def removeKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-synchronized {
-  val removedConsumer = cache.remove(key)
-  if (removedConsumer != null) {
-removedConsumer.close()
+  // If it has been marked for close, then do it any way
+  if (intConsumer.inuse && intConsumer.markedForClose) 
intConsumer.close()
+  intConsumer.inuse = false
+
+  // Clear the consumer from the cache if this is indeed the consumer 
present in the cache
+  val key = new CacheKey(intConsumer.topicPartition, 
intConsumer.kafkaParams)
+  val cachedIntConsumer = cache.get(key)
+  if (cachedIntConsumer != null) {
+if (cachedIntConsumer.eq(intConsumer)) {
+  // The released consumer is indeed the cached one.
+  cache.remove(key)
+} else {
+  // The released consumer is not the cached one. Don't do 
anything.
+  // This should not happen as long as we maintain the invariant 
mentioned above.
+  logWarning(
+s"Cached consumer not the same one as the one being release" +
+  s"\ncached = $cachedIntConsumer 
[${System.identityHashCode(cachedIntConsumer)}]" +
+  s"\nreleased = $intConsumer 
[${System.identityHashCode(intConsumer)}]")
+}
+  } else {
+// The released consumer is not in the cache. Don't do anything.
+// This should not happen as long as we maintain the invariant 
mentioned above.
+logWarning(s"Attempting to release consumer that is not in the 
cache")
   }
 }
   }
 
   /**
* Get a cached consumer for groupId, assigned to topic and partition.
* If matching consumer doesn't already exist, will be created using 
kafkaParams.
+   * The returned consumer must be released explicitly using 
[[KafkaDataConsumer.release()]].
+   *
+   * Note: This method guarantees that the consumer returned is not 
currently in use by any one
+   * else. Within this guarantee, this will make a best effort attempt to 
re-use consumers by
+   * caching them and tracking when they are in use.
*/
-  def getOrCreate(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): CachedKafkaConsumer = 
synchronized {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-// If this is reattempt at running the task, then invalidate cache and 
start with
-// a new consumer
+  def acquire(
+  topicPartition: TopicPartition,
+  kafkaParams: ju.Map[String, Object],
+  useCache: Boolean): KafkaDataConsumer = synchronized {
+val key = new CacheKey(topicPartition, kafkaParams)
+val existingInternalConsumer = cache.get(key)
+
+lazy val newInternalConsumer = new 
InternalKafkaConsumer(topicPartition, kafkaParams)
+
 if (TaskContext.get != null && TaskContext.get.attemptNumber >= 1) {
-  removeKafkaConsumer(topic, partition, kafkaParams)
-  val consumer = new CachedKafkaConsumer(topicPartition, kafkaParams)
-  consumer.inuse = true
-  cache.put(key, consumer)
-  consumer
-} else {
-  if 

[GitHub] spark pull request #20767: [SPARK-23623] [SS] Avoid concurrent use of cached...

2018-03-10 Thread tedyu
Github user tedyu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20767#discussion_r173636002
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -342,80 +415,103 @@ private[kafka010] object CachedKafkaConsumer extends 
Logging {
 }
   }
 
-  def releaseKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
+  private def releaseConsumer(intConsumer: InternalKafkaConsumer): Unit = {
 synchronized {
-  val consumer = cache.get(key)
-  if (consumer != null) {
-consumer.inuse = false
-  } else {
-logWarning(s"Attempting to release consumer that does not exist")
-  }
-}
-  }
 
-  /**
-   * Removes (and closes) the Kafka Consumer for the given topic, 
partition and group id.
-   */
-  def removeKafkaConsumer(
-  topic: String,
-  partition: Int,
-  kafkaParams: ju.Map[String, Object]): Unit = {
-val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
-val topicPartition = new TopicPartition(topic, partition)
-val key = CacheKey(groupId, topicPartition)
-
-synchronized {
-  val removedConsumer = cache.remove(key)
-  if (removedConsumer != null) {
-removedConsumer.close()
+  // If it has been marked for close, then do it any way
+  if (intConsumer.inuse && intConsumer.markedForClose) 
intConsumer.close()
--- End diff --

Is it possible we have the following condition - should intConsumer.close() 
be called ?

!intConsumer.inuse && intConsumer.markedForClose


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r173633651
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -30,9 +31,19 @@ import 
org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
 /**
  * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], 
which are all of the lines
  * in that file.
+ *
+ * @param file A part (i.e. "block") of a single file that should be read 
line by line.
+ * @param lineSeparator A line separator that should be used for each 
line. If the value is `None`,
+ *  it covers `\r`, `\r\n` and `\n`.
+ * @param conf Hadoop configuration
  */
 class HadoopFileLinesReader(
-file: PartitionedFile, conf: Configuration) extends Iterator[Text] 
with Closeable {
+file: PartitionedFile,
+lineSeparator: Option[String],
+conf: Configuration) extends Iterator[Text] with Closeable {
--- End diff --

Some methods of Hadoop's Text have such assumption about UTF-8 encoding. In 
general a datasource could eliminate the restriction by using the Text class as 
container of raw bytes and calling methods like **getBytes()** and 
**getLength()**.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20794
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88153/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20794
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20794
  
**[Test build #88153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88153/testReport)**
 for PR 20794 at commit 
[`17ea399`](https://github.com/apache/spark/commit/17ea399162167092e0362f90b49a03397ae82afe).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r173633462
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileLinesReader.scala
 ---
@@ -42,7 +52,12 @@ class HadoopFileLinesReader(
   Array.empty)
 val attemptId = new TaskAttemptID(new TaskID(new JobID(), 
TaskType.MAP, 0), 0)
 val hadoopAttemptContext = new TaskAttemptContextImpl(conf, attemptId)
-val reader = new LineRecordReader()
+val reader = if (lineSeparator != "\n") {
+  new LineRecordReader(lineSeparator.getBytes("UTF-8"))
--- End diff --

Why do you think this class is responsible for converting string separator 
to array of bytes? Especially restriction by one charset is not clear. The 
purpose of the class is to provide the Iterator interface of records/lines to 
datasources. And this class doesn't have to know about datasource's charset. I 
would not stick on particular charset here and expose the separator parameter 
with `Option[Array[Byte]]` like the LineReader provides a constructor with 
`byte[] recordDelimiter`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88154/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20779
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20779
  
**[Test build #88154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88154/testReport)**
 for PR 20779 at commit 
[`603ce0f`](https://github.com/apache/spark/commit/603ce0fb29bfa5b5c0cfea69fb72e2a3128e772a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20727: [SPARK-23577][SQL] Supports custom line separator...

2018-03-10 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20727#discussion_r173632775
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/TextOptions.scala
 ---
@@ -39,9 +39,12 @@ private[text] class TextOptions(@transient private val 
parameters: CaseInsensiti
*/
   val wholeText = parameters.getOrElse(WHOLETEXT, "false").toBoolean
 
+  val lineSeparator: String = parameters.getOrElse(LINE_SEPARATOR, "\n")
+  require(lineSeparator.nonEmpty, s"'$LINE_SEPARATOR' cannot be an empty 
string.")
 }
 
 private[text] object TextOptions {
   val COMPRESSION = "compression"
   val WHOLETEXT = "wholetext"
+  val LINE_SEPARATOR = "lineSep"
--- End diff --

In the example above, the line is counterintuitive for me. I imagine a line 
in text files as a sequence of one or more characters, displayed within a 
single horizontal sequence. I would prefer the short name *recSep* or 
*recordSeparator* for long name. I guess when the option will be used, it will 
separate text not by new line chars like `'\n'`, `'\r\n'`. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18994: [SPARK-21784][SQL] Adds support for defining information...

2018-03-10 Thread ioana-delaney
Github user ioana-delaney commented on the issue:

https://github.com/apache/spark/pull/18994
  
@sureshthalamati Hi Suresh, We are planning to proceed with the performance 
improvements. Will you be able to continue working on this PR? Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19775: [SPARK-22343][core] Add support for publishing Spark met...

2018-03-10 Thread matyix
Github user matyix commented on the issue:

https://github.com/apache/spark/pull/19775
  
For those who are still interested using Prometheus you can get the 
standalone package and source code from here: 
https://github.com/banzaicloud/spark-metrics . Happy monitoring, try to catch 
the issues and avoid those PagerDuty notifications beforehand :).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88152/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20785
  
**[Test build #88152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88152/testReport)**
 for PR 20785 at commit 
[`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20763: [SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect res...

2018-03-10 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20763
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20793
  
Good catch, LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-03-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20043
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20779
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1451/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20779
  
**[Test build #88154 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88154/testReport)**
 for PR 20779 at commit 
[`603ce0f`](https://github.com/apache/spark/commit/603ce0fb29bfa5b5c0cfea69fb72e2a3128e772a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20779
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20779
  
Let me reduce the number of loops. Another option is to revert this change 
to use non-loop version that worked without an exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20794
  
**[Test build #88153 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88153/testReport)**
 for PR 20794 at commit 
[`17ea399`](https://github.com/apache/spark/commit/17ea399162167092e0362f90b49a03397ae82afe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20794
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1450/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20794: [SPARK-23644][CORE][UI] Use absolute path for REST call ...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20794
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20794: [SPARK-23644][CORE][UI] Use absolute path for RES...

2018-03-10 Thread mgaido91
GitHub user mgaido91 opened a pull request:

https://github.com/apache/spark/pull/20794

[SPARK-23644][CORE][UI] Use absolute path for REST call in SHS

## What changes were proposed in this pull request?

SHS is using a relative path for the REST API call to get the list of the 
application is a relative path call. In case of the SHS being consumed through 
a proxy, it can be an issue if the path doesn't end with a "/".

Therefore, we should use an absolute path for the REST call as it is done 
for all the other resources.

## How was this patch tested?

manual tests
Before the change:
![screen shot 2018-03-10 at 4 22 02 
pm](https://user-images.githubusercontent.com/8821783/37244190-8ccf9d40-2485-11e8-8fa9-345bc81472fc.png)

After the change:
![screen shot 2018-03-10 at 4 36 34 pm 
1](https://user-images.githubusercontent.com/8821783/37244201-a1922810-2485-11e8-8856-eeab2bf5e180.png)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mgaido91/spark SPARK-23644

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20794.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20794


commit 17ea399162167092e0362f90b49a03397ae82afe
Author: Marco Gaido 
Date:   2018-03-10T15:49:52Z

[SPARK-23644][CORE][UI] Use absolute path for REST call in SHS




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20579: [SPARK-23372][SQL] Writing empty struct in parque...

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20579#discussion_r173625828
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -72,6 +72,29 @@ class FileBasedDataSourceSuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  // Text and Parquet format does not allow wrting data frame with empty 
schema.
+  Seq("parquet", "text").foreach { format =>
+test(s"SPARK-23372 writing empty dataframe should produce 
AnalysisException - $format") {
+  withTempPath { outputPath =>
+intercept[AnalysisException] {
+  
spark.emptyDataFrame.write.format(format).save(outputPath.toString)
+}
+  }
+}
+  }
+
+  // Formats excluding text and parquet allow writing empty data frames to 
files.
+  allFileBasedDataSources.filterNot(p => p == "text" || p == 
"parquet").foreach { format =>
+test(s"SPARK-23372 writing empty dataframe and reading from it - 
$format") {
+  withTempPath { outputPath =>
+  
spark.emptyDataFrame.write.format(format).save(outputPath.toString)
+  intercept[AnalysisException] {
+val df = spark.read.format(format).load(outputPath.toString)
--- End diff --

Sorry if I misunderstood. The link is 
https://github.com/apache/spark/pull/20579#issuecomment-364994881. Is that the 
right link?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20793
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20793: [SPARK-23643] Shrinking the buffer in hashSeed up to siz...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20793
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20793: [SPARK-23643] Shrinking the buffer in hashSeed up...

2018-03-10 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/20793

[SPARK-23643] Shrinking the buffer in hashSeed up to size of the seed 
parameter

## What changes were proposed in this pull request?

The hashSeed method allocates 64 bytes instead of 8. Other bytes are always 
zeros. And they could be excluded from hash calculation because they don't 
differentiate inputs.

## How was this patch tested?

By running the existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 hash-buff-size

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20793.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20793


commit bb40ef2e8d337508d60903a6a824b5aa45d87326
Author: Maxim Gekk 
Date:   2018-03-10T13:14:33Z

Shrinking the buffer up to size of the long type




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20701
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88150/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20701: [SPARK-23528][ML] Add numIter to ClusteringSummary

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20701
  
**[Test build #88150 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88150/testReport)**
 for PR 20701 at commit 
[`b3d0523`](https://github.com/apache/spark/commit/b3d0523e5eed89dc800d0678adde59eb4ac4343e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20525: [SPARK-23271[SQL] Parquet output contains only _SUCCESS ...

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20525
  
late LGTM too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20788: [WIP][SPARK-21030][PYTHON][SQL] Adds more types f...

2018-03-10 Thread DylanGuedes
Github user DylanGuedes commented on a diff in the pull request:

https://github.com/apache/spark/pull/20788#discussion_r173623998
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -437,10 +437,11 @@ def hint(self, name, *parameters):
 if not isinstance(name, str):
 raise TypeError("name should be provided as str, got 
{0}".format(type(name)))
 
+allowed = [str, list, float, int]
 for p in parameters:
-if not isinstance(p, str):
+if not type(p) in allowed:
--- End diff --

Didn't know that it was possible, nice!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1449/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20785
  
**[Test build #88152 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88152/testReport)**
 for PR 20785 at commit 
[`0034a58`](https://github.com/apache/spark/commit/0034a58437684fdcfde8511ef47278ff8bfb1fe2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20785
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88149/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20717
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20717: [SPARK-23564][SQL] Add isNotNull check for left anti and...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20717
  
**[Test build #88149 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88149/testReport)**
 for PR 20717 at commit 
[`9e2d993`](https://github.com/apache/spark/commit/9e2d993d691ad37b230c9e14d16148b9dc9727e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20779
  
I don't think so. There is an option to change the heap size for test 
execution, but I am not sure we are allowed/it is a good idea to do that. Let's 
hear others' opinion...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20779: [SPARK-23598][SQL] Make methods in BufferedRowIterator p...

2018-03-10 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20779
  
Ah, I increased the heap size (4GB) in my environment with IntelliJ.
Should we create a class like https://github.com/apache/spark/pull/20636?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20719
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88151/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20719
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20719
  
**[Test build #88151 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88151/testReport)**
 for PR 20719 at commit 
[`2d64a90`](https://github.com/apache/spark/commit/2d64a9028ea138aa8b538da25637771543109076).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20790: AccumulatorV2 subclass isZero scaladoc fix

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20790
  
Wait .. I just found you opened a JIRA - SPARK-23642. Please link it by 
`[SPARK-23642][DOCS] ...`. see https://spark.apache.org/contributing.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20790: AccumulatorV2 subclass isZero scaladoc fix

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20790
  
Shall we fix the title to `[MINOR][DOCS] AccumulatorV2  ...` to be 
consistent with other PRs?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20790: AccumulatorV2 subclass isZero scaladoc fix

2018-03-10 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20790#discussion_r173621260
  
--- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
@@ -290,7 +290,8 @@ class LongAccumulator extends AccumulatorV2[jl.Long, 
jl.Long] {
   private var _count = 0L
 
   /**
-   * Adds v to the accumulator, i.e. increment sum by v and count by 1.
+   * Returns false if this accumulator has had any values added to it or 
the sum is non-zero.
+   *
--- End diff --

I think this duplicates the doc from `AccumulatorV2.isZero`. Can we simply 
remove this wrong doc and revert other changes so that we can reuse inherited 
doc from `AccumulatorV2.isZero` in all places?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20771
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88146/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20771
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20771: [SPARK-23587][SQL] Add interpreted execution for MapObje...

2018-03-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20771
  
**[Test build #88146 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88146/testReport)**
 for PR 20771 at commit 
[`e725608`](https://github.com/apache/spark/commit/e725608d1b38a7a2b1a0677afca947cec6a12801).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20719
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1448/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20719: [SPARK-23568][ML] Use metadata numAttributes if availabl...

2018-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20719
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >