[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99565/testReport)**
 for PR 22683 at commit 
[`5188c54`](https://github.com/apache/spark/commit/5188c54fcf33c24dac341c044f7ffa75c272bf52).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99564/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99564 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99564/testReport)**
 for PR 23088 at commit 
[`7956b27`](https://github.com/apache/spark/commit/7956b27bd6b19065d367d96cd5e2b448507c7dc4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

2018-12-01 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23072#discussion_r238087240
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/FPGrowthExample.scala ---
@@ -64,4 +64,3 @@ object FPGrowthExample {
 spark.stop()
   }
 }
-// scalastyle:on println
--- End diff --

yes, println is not used


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23164: [SPARK-26198][SQL] Fix Metadata serialize null values th...

2018-12-01 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/23164
  
I used it here: 
https://github.com/apache/spark/compare/master...wangyum:default-value?expand=1#diff-9847f5cef7cf7fbc5830fbc6b779ee10R1827


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23120
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99563/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23120
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23120
  
**[Test build #99563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99563/testReport)**
 for PR 23120 at commit 
[`8f2d69d`](https://github.com/apache/spark/commit/8f2d69d848b8242c529118436249019016069ca2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99561/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99561 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99561/testReport)**
 for PR 23088 at commit 
[`dc95355`](https://github.com/apache/spark/commit/dc9535547c353509930ea340780611f3129da962).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99562/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99562/testReport)**
 for PR 23088 at commit 
[`64fbe5d`](https://github.com/apache/spark/commit/64fbe5d7b845b6351e2dae2af231d2be37ca13b8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99566/testReport)**
 for PR 23088 at commit 
[`2235967`](https://github.com/apache/spark/commit/2235967190ef80d363b81274f400f4a42b3556f2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5624/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23198: Branch 2.4

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23198
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23198: Branch 2.4

2018-12-01 Thread edurekagithub
Github user edurekagithub closed the pull request at:

https://github.com/apache/spark/pull/23198


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23198: Branch 2.4

2018-12-01 Thread edurekagithub
GitHub user edurekagithub opened a pull request:

https://github.com/apache/spark/pull/23198

Branch 2.4

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/edurekagithub/spark branch-2.4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23198.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23198


commit b7efca7ece484ee85091b1b50bbc84ad779f9bfe
Author: Mario Molina 
Date:   2018-09-11T12:47:14Z

[SPARK-17916][SPARK-25241][SQL][FOLLOW-UP] Fix empty string being parsed as 
null when nullValue is set.

## What changes were proposed in this pull request?

In the PR, I propose new CSV option `emptyValue` and an update in the SQL 
Migration Guide which describes how to revert previous behavior when empty 
strings were not written at all. Since Spark 2.4, empty strings are saved as 
`""` to distinguish them from saved `null`s.

Closes #22234
Closes #22367

## How was this patch tested?

It was tested by `CSVSuite` and new tests added in the PR #22234

Closes #22389 from MaxGekk/csv-empty-value-master.

Lead-authored-by: Mario Molina 
Co-authored-by: Maxim Gekk 
Signed-off-by: hyukjinkwon 
(cherry picked from commit c9cb393dc414ae98093c1541d09fa3c8663ce276)
Signed-off-by: hyukjinkwon 

commit 0b8bfbe12b8a368836d7ddc8445de18b7ee42cde
Author: Dongjoon Hyun 
Date:   2018-09-11T15:57:42Z

[SPARK-25389][SQL] INSERT OVERWRITE DIRECTORY STORED AS should prevent 
duplicate fields

## What changes were proposed in this pull request?

Like `INSERT OVERWRITE DIRECTORY USING` syntax, `INSERT OVERWRITE DIRECTORY 
STORED AS` should not generate files with duplicate fields because Spark cannot 
read those files back.

**INSERT OVERWRITE DIRECTORY USING**
```scala
scala> sql("INSERT OVERWRITE DIRECTORY 'file:///tmp/parquet' USING parquet 
SELECT 'id', 'id2' id")
... ERROR InsertIntoDataSourceDirCommand: Failed to write to directory ...
org.apache.spark.sql.AnalysisException: Found duplicate column(s) when 
inserting into file:/tmp/parquet: `id`;
```

**INSERT OVERWRITE DIRECTORY STORED AS**
```scala
scala> sql("INSERT OVERWRITE DIRECTORY 'file:///tmp/parquet' STORED AS 
parquet SELECT 'id', 'id2' id")
// It generates corrupted files
scala> spark.read.parquet("/tmp/parquet").show
18/09/09 22:09:57 WARN DataSource: Found duplicate column(s) in the data 
schema and the partition schema: `id`;
```

## How was this patch tested?

Pass the Jenkins with newly added test cases.

Closes #22378 from dongjoon-hyun/SPARK-25389.

Authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 77579aa8c35b0d98bbeac3c828bf68a1d190d13e)
Signed-off-by: Dongjoon Hyun 

commit 4414e026097c74aadd252b541c9d3009cd7e9d09
Author: Gera Shegalov 
Date:   2018-09-11T16:28:32Z

[SPARK-25221][DEPLOY] Consistent trailing whitespace treatment of conf 
values

## What changes were proposed in this pull request?

Stop trimming values of properties loaded from a file

## How was this patch tested?

Added unit test demonstrating the issue hit in production.

Closes #22213 from gerashegalov/gera/SPARK-25221.

Authored-by: Gera Shegalov 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit bcb9a8c83f4e6835af5dc51f1be7f964b8fa49a3)
Signed-off-by: Marcelo Vanzin 

commit 16127e844f8334e1152b2e3ed3d878ec8de13dfa
Author: Liang-Chi Hsieh 
Date:   2018-09-11T17:31:06Z

[SPARK-24889][CORE] Update block info when unpersist rdds

## What changes were proposed in this pull request?

We will update block info coming from executors, at the timing like caching 
a RDD. However, when removing RDDs with unpersisting, we don't ask to update 
block info. So the block info is not updated.

We can fix this with few options:

1. Ask to update block info when unpersisting

This is simplest but changes driver-executor communication a bit.

2. Update block info when processing the event of unpersisting RDD

We send a `SparkListenerUnpersistRDD` event when unpersisting RDD. When 
processing this event, we can update block info of the RDD. This only changes 
event processing code 

[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

2018-12-01 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22952
  
@zsxwing 
Yeah, it would be ideal we can enforce `archivePath` to which don't have 
any possibility to match against source path (glob), so my approach was to find 
directory which is the base directory without having glob in ancestor, and 
`archive path + base directory of source path` doesn't belong to sub-directory 
of found directory.

For example, suppose source path is `/a/b/c/*/ef?/*/g/h/*/i`, then base 
directory of source path would be `/a/b/c`, and `archive path + base directory 
of source path` should not belong to sub-directory of `/a/b/c`.
(My code has a bug for finding the directory so need to fix it.)

This is not an elegant approach and the approach has false-positive, ending 
up restricting the archive path which actually doesn't make overlap (too 
restrict), but it would guarantee two paths never overlap. (So no need to 
re-check when renaming file.)

I guess the approach might be reasonable because in practice end users 
would avoid themselves have to think about complicated case on overlaps, and 
just isolate two paths.

What do you think about this approach?

cc. @gaborgsomogyi Could you also help validating my approach? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Hi @pgandhi999 , It seems after your checkins, when there is no summary 
metrics, it is displaying empty table rather than a message which shown in the 
PR title. could you please help me to fix that. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22683
  
**[Test build #99565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99565/testReport)**
 for PR 22683 at commit 
[`5188c54`](https://github.com/apache/spark/commit/5188c54fcf33c24dac341c044f7ffa75c272bf52).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-01 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22683
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5623/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23120
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5622/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23120
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99564/testReport)**
 for PR 23088 at commit 
[`7956b27`](https://github.com/apache/spark/commit/7956b27bd6b19065d367d96cd5e2b448507c7dc4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23120
  
**[Test build #99563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99563/testReport)**
 for PR 23120 at commit 
[`8f2d69d`](https://github.com/apache/spark/commit/8f2d69d848b8242c529118436249019016069ca2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23120: [SPARK-26151][SQL] Return partial results for bad CSV re...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23120
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23120: [SPARK-26151][SQL] Return partial results for bad...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23120#discussion_r238083349
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ---
@@ -243,21 +243,27 @@ class UnivocityParser(
 () => getPartialResult(),
 new RuntimeException("Malformed CSV record"))
 } else {
-  try {
-// When the length of the returned tokens is identical to the 
length of the parsed schema,
-// we just need to convert the tokens that correspond to the 
required columns.
-var i = 0
-while (i < requiredSchema.length) {
+  // When the length of the returned tokens is identical to the length 
of the parsed schema,
+  // we just need to convert the tokens that correspond to the 
required columns.
+  var badRecordException: Option[Throwable] = None
+  var i = 0
+  while (i < requiredSchema.length) {
+try {
   row(i) = valueConverters(i).apply(getToken(tokens, i))
-  i += 1
+} catch {
+  case NonFatal(e) =>
+badRecordException = badRecordException.orElse(Some(e))
 }
+i += 1
+  }
+
+  if (badRecordException.isEmpty) {
 row
-  } catch {
-case NonFatal(e) =>
-  // For corrupted records with the number of tokens same as the 
schema,
-  // CSV reader doesn't support partial results. All fields other 
than the field
-  // configured by `columnNameOfCorruptRecord` are set to `null`.
-  throw BadRecordException(() => getCurrentInput, () => None, e)
+  } else {
+// For corrupted records with the number of tokens same as the 
schema,
+// CSV reader doesn't support partial results. All fields other 
than the field
+// configured by `columnNameOfCorruptRecord` are set to `null`.
--- End diff --

what do you mean here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23154: [SPARK-26195][SQL] Correct exception messages in ...

2018-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23154


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23154: [SPARK-26195][SQL] Correct exception messages in some cl...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23154
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/23088
  
Hi @srowen . Yes. Currently for disk store case, we need to have a more 
optimized code.

> While it makes some sense I have two concerns: different answers based on 
disk vs memory store which shouldn't really affect things. But would a user 
ever have both and see both side by side and be confused?

This configurable store is only for history server. So user can configure 
either one at a time for history server. But, the live UI (which open from Yarn 
UI), also goes through the same code flow. Where it has 'ElementTrackingStore', 
which is also inMemory. So, if a user configure disk store for History server 
and open both live and inProgress History UI, the summary metrics will be 
different.

> changing the way the indexing works, so that you can index by specific 
metrics for successful and failed tasks differently, would be tricky, and also 
would require changing the disk store version (to invalidate old stores).

I think @vanzin suggestion seems work, but need time to give it a try and 
to test it. May be we can add as "TODO" for diskStore case or open a seperate 
JIRA for that.

> Second is, that seems like it should still entail pushing down all the 
quantile logic into the KVStore, to be clean, right? and that's a bigger change.

Thanks @srowen for the suggestion. Probably @vanzin can answer this well.

I have modified the code, for InMemory case. Disk store still uses the old 
code.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23178


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23178: [SPARK-26216][SQL] Do not use case class as publi...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23178#discussion_r238083055
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -38,114 +38,106 @@ import org.apache.spark.sql.types.DataType
  * @since 1.3.0
  */
 @Stable
--- End diff --

It's not a new API anyway, it will be weird to change since to 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23178: [SPARK-26216][SQL] Do not use case class as public API (...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23178
  
thanks for the review, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5621/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99562 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99562/testReport)**
 for PR 23088 at commit 
[`64fbe5d`](https://github.com/apache/spark/commit/64fbe5d7b845b6351e2dae2af231d2be37ca13b8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23088
  
**[Test build #99561 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99561/testReport)**
 for PR 23088 at commit 
[`dc95355`](https://github.com/apache/spark/commit/dc9535547c353509930ea340780611f3129da962).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5620/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23088: [SPARK-26119][CORE][WEBUI]Task summary table should cont...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23088
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23130


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5619/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23130: [SPARK-26161][SQL] Ignore empty files in load

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23130
  
We don't need to block it, but @MaxGekk if you have time, it would great to 
answer https://github.com/apache/spark/pull/23130#issuecomment-442491582

thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99560/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23187: [SPARK-26211][SQL][TEST][FOLLOW-UP] Combine test ...

2018-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23187


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23190
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23190
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23187: [SPARK-26211][SQL][TEST][FOLLOW-UP] Combine test cases f...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23187
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23153: [SPARK-26147][SQL] only pull out unevaluable pyth...

2018-12-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23153#discussion_r238082589
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -155,19 +155,20 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
 }
 
 /**
- * PythonUDF in join condition can not be evaluated, this rule will detect 
the PythonUDF
- * and pull them out from join condition. For python udf accessing 
attributes from only one side,
- * they are pushed down by operation push down rules. If not (e.g. user 
disables filter push
- * down rules), we need to pull them out in this rule too.
+ * PythonUDF in join condition can't be evaluated if it refers to 
attributes from both join sides.
+ * See `ExtractPythonUDFs` for details. This rule will detect un-evaluable 
PythonUDF and pull them
+ * out from join condition.
  */
 object PullOutPythonUDFInJoinCondition extends Rule[LogicalPlan] with 
PredicateHelper {
-  def hasPythonUDF(expression: Expression): Boolean = {
-expression.collectFirst { case udf: PythonUDF => udf }.isDefined
+
+  private def hasUnevaluablePythonUDF(expr: Expression, j: Join): Boolean 
= {
+expr.find { e =>
+  PythonUDF.isScalarPythonUDF(e) && !canEvaluate(e, j.left) && 
!canEvaluate(e, j.right)
--- End diff --

It's only possible to have scalar UDF in join condition, so changing it to 
`e.isInstanceOf[PythonUDF]` is same.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99559/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23173
  
**[Test build #99559 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99559/testReport)**
 for PR 23173 at commit 
[`9d5cb7b`](https://github.com/apache/spark/commit/9d5cb7be9a8cb97bd54dd1e938ba819ed3066351).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23192: [SPARK-26241][SQL] Add queryId to IncrementalExec...

2018-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23192


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23193: [SPARK-26226][SQL] Track optimization phase for s...

2018-12-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23193


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23150
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99556/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23150
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23150
  
**[Test build #99556 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99556/testReport)**
 for PR 23150 at commit 
[`00509d3`](https://github.com/apache/spark/commit/00509d3a94e0679505cd9fde78e38a3a15d11bde).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23196
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99558/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23196
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23196
  
**[Test build #99558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99558/testReport)**
 for PR 23196 at commit 
[`f326042`](https://github.com/apache/spark/commit/f326042aa1aff540d06c79fd73395204d846f3ea).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99557/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23190
  
**[Test build #99557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99557/testReport)**
 for PR 23190 at commit 
[`87c567d`](https://github.com/apache/spark/commit/87c567d6629df9042f7189f9083e26405f5fb387).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23173
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23173: [SPARK-26208][SQL] add headers to empty csv files when h...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23173
  
**[Test build #99559 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99559/testReport)**
 for PR 23173 at commit 
[`9d5cb7b`](https://github.com/apache/spark/commit/9d5cb7be9a8cb97bd54dd1e938ba819ed3066351).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23196
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5617/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23196
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23196
  
**[Test build #99558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99558/testReport)**
 for PR 23196 at commit 
[`f326042`](https://github.com/apache/spark/commit/f326042aa1aff540d06c79fd73395204d846f3ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23173: [SPARK-26208][SQL] add headers to empty csv files...

2018-12-01 Thread koertkuipers
Github user koertkuipers commented on a diff in the pull request:

https://github.com/apache/spark/pull/23173#discussion_r238077135
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 ---
@@ -171,15 +171,21 @@ private[csv] class CsvOutputWriter(
 
   private var univocityGenerator: Option[UnivocityGenerator] = None
 
-  override def write(row: InternalRow): Unit = {
-val gen = univocityGenerator.getOrElse {
-  val charset = Charset.forName(params.charset)
-  val os = CodecStreams.createOutputStreamWriter(context, new 
Path(path), charset)
-  val newGen = new UnivocityGenerator(dataSchema, os, params)
-  univocityGenerator = Some(newGen)
-  newGen
-}
+  if (params.headerFlag) {
+val gen = getGen()
+gen.writeHeaders()
+  }
 
+  private def getGen(): UnivocityGenerator = univocityGenerator.getOrElse {
+val charset = Charset.forName(params.charset)
+val os = CodecStreams.createOutputStreamWriter(context, new 
Path(path), charset)
+val newGen = new UnivocityGenerator(dataSchema, os, params)
+univocityGenerator = Some(newGen)
+newGen
+  }
+
+  override def write(row: InternalRow): Unit = {
+val gen = getGen()
--- End diff --

i will revert this change to lazy val for now since it doesnt have anything 
to do wit this pullreq or jira: the Option approach was created in another 
pullreq.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5616/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23190
  
**[Test build #99557 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99557/testReport)**
 for PR 23190 at commit 
[`87c567d`](https://github.com/apache/spark/commit/87c567d6629df9042f7189f9083e26405f5fb387).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...

2018-12-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23150#discussion_r238075711
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/csv/UnivocityParserSuite.scala
 ---
@@ -86,62 +85,74 @@ class UnivocityParserSuite extends SparkFunSuite with 
SQLHelper {
 // null.
 Seq(true, false).foreach { b =>
   val options = new CSVOptions(Map("nullValue" -> "null"), false, 
"GMT")
-  val converter =
-parser.makeConverter("_1", StringType, nullable = b, options = 
options)
+  val parser = new UnivocityParser(StructType(Seq.empty), options)
+  val converter = parser.makeConverter("_1", StringType, nullable = b)
   assert(converter.apply("") == UTF8String.fromString(""))
 }
   }
 
   test("Throws exception for empty string with non null type") {
-  val options = new CSVOptions(Map.empty[String, String], false, "GMT")
+val options = new CSVOptions(Map.empty[String, String], false, "GMT")
+val parser = new UnivocityParser(StructType(Seq.empty), options)
 val exception = intercept[RuntimeException]{
-  parser.makeConverter("_1", IntegerType, nullable = false, options = 
options).apply("")
+  parser.makeConverter("_1", IntegerType, nullable = false).apply("")
 }
 assert(exception.getMessage.contains("null value found but field _1 is 
not nullable."))
   }
 
   test("Types are cast correctly") {
 val options = new CSVOptions(Map.empty[String, String], false, "GMT")
-assert(parser.makeConverter("_1", ByteType, options = 
options).apply("10") == 10)
-assert(parser.makeConverter("_1", ShortType, options = 
options).apply("10") == 10)
-assert(parser.makeConverter("_1", IntegerType, options = 
options).apply("10") == 10)
-assert(parser.makeConverter("_1", LongType, options = 
options).apply("10") == 10)
-assert(parser.makeConverter("_1", FloatType, options = 
options).apply("1.00") == 1.0)
-assert(parser.makeConverter("_1", DoubleType, options = 
options).apply("1.00") == 1.0)
-assert(parser.makeConverter("_1", BooleanType, options = 
options).apply("true") == true)
-
-val timestampsOptions =
+var parser = new UnivocityParser(StructType(Seq.empty), options)
+assert(parser.makeConverter("_1", ByteType).apply("10") == 10)
+assert(parser.makeConverter("_1", ShortType).apply("10") == 10)
+assert(parser.makeConverter("_1", IntegerType).apply("10") == 10)
+assert(parser.makeConverter("_1", LongType).apply("10") == 10)
+assert(parser.makeConverter("_1", FloatType).apply("1.00") == 1.0)
+assert(parser.makeConverter("_1", DoubleType).apply("1.00") == 1.0)
+assert(parser.makeConverter("_1", BooleanType).apply("true") == true)
+
+var timestampsOptions =
   new CSVOptions(Map("timestampFormat" -> "dd/MM/ hh:mm"), false, 
"GMT")
+parser = new UnivocityParser(StructType(Seq.empty), timestampsOptions)
 val customTimestamp = "31/01/2015 00:00"
-val expectedTime = 
timestampsOptions.timestampFormat.parse(customTimestamp).getTime
-val castedTimestamp =
-  parser.makeConverter("_1", TimestampType, nullable = true, options = 
timestampsOptions)
+var format = FastDateFormat.getInstance(
+  timestampsOptions.timestampFormat, timestampsOptions.timeZone, 
timestampsOptions.locale)
+val expectedTime = format.parse(customTimestamp).getTime
+val castedTimestamp = parser.makeConverter("_1", TimestampType, 
nullable = true)
 .apply(customTimestamp)
 assert(castedTimestamp == expectedTime * 1000L)
 
 val customDate = "31/01/2015"
 val dateOptions = new CSVOptions(Map("dateFormat" -> "dd/MM/"), 
false, "GMT")
-val expectedDate = dateOptions.dateFormat.parse(customDate).getTime
-val castedDate =
-  parser.makeConverter("_1", DateType, nullable = true, options = 
dateOptions)
-.apply(customTimestamp)
-assert(castedDate == DateTimeUtils.millisToDays(expectedDate))
+parser = new UnivocityParser(StructType(Seq.empty), dateOptions)
+format = FastDateFormat.getInstance(
+  dateOptions.dateFormat, dateOptions.timeZone, dateOptions.locale)
+val expectedDate = format.parse(customDate).getTime
+val castedDate = parser.makeConverter("_1", DateType, nullable = true)
+.apply(customDate)
+assert(castedDate == DateTimeUtils.millisToDays(expectedDate, 
TimeZone.getTimeZone("GMT")))
 
 val timestamp = "2015-01-01 00:00:00"
-assert(parser.makeConverter("_1", TimestampType, options = 
options).apply(timestamp) ==
-  DateTimeUtils.stringToTime(timestamp).getTime  * 1000L)
-assert(parser.makeConverter("_1", DateType, options = 

[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...

2018-12-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23150#discussion_r238075664
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -622,10 +623,11 @@ class CSVSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils with Te
 val options = Map(
   "header" -> "true",
   "inferSchema" -> "false",
-  "dateFormat" -> "dd/MM/ hh:mm")
+  "dateFormat" -> "dd/MM/ HH:mm")
--- End diff --

According to iso 8601:
```
h   clock-hour-of-am-pm (1-12)  number12
H   hour-of-day (0-23)  number0
```
but real data is not in the allowed range:
```
date
26/08/2015 18:00
27/10/2014 18:30
28/01/2016 20:00
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23196
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99555/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23196
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23196
  
**[Test build #99555 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99555/testReport)**
 for PR 23196 at commit 
[`4646ded`](https://github.com/apache/spark/commit/4646dededae832185a35a85244baab6507d28f0d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23150: [SPARK-26178][SQL] Use java.time API for parsing ...

2018-12-01 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23150#discussion_r238075585
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -1107,7 +,7 @@ class CSVSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils with Te
 
   test("SPARK-18699 put malformed records in a `columnNameOfCorruptRecord` 
field") {
 Seq(false, true).foreach { multiLine =>
-  val schema = new StructType().add("a", IntegerType).add("b", 
TimestampType)
+  val schema = new StructType().add("a", IntegerType).add("b", 
DateType)
--- End diff --

I changed the type because supposed to valid date `"1983-08-04"` cannot be 
parsed with default timestamp pattern.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23150
  
> they pass right? is there another test you were unable to add?

For now everything has been passed. I run all test localy on different 
timezones (set via jvm parameter `-Duser.timezone`). 

I updated the migration guide since I enabled new parser by default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23150
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23150
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5615/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23150
  
**[Test build #99556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99556/testReport)**
 for PR 23150 at commit 
[`00509d3`](https://github.com/apache/spark/commit/00509d3a94e0679505cd9fde78e38a3a15d11bde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99554/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23190
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23190: [MINOR][SQL]throw SparkOutOfMemoryError intead of SparkE...

2018-12-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23190
  
**[Test build #99554 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99554/testReport)**
 for PR 23190 at commit 
[`5b39ed3`](https://github.com/apache/spark/commit/5b39ed317d0da16162484cc8f8d94e0df4c2f30e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23150: [SPARK-26178][SQL] Use java.time API for parsing timesta...

2018-12-01 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/23150
  
It makes sense that parsing depends on a timezone, though that's set as an 
option in the parser typically. The tests should generally test "GMT" for this 
reason. If there's a default code path for when no timezone is specified, then 
I'd use the test harness mechanisms for temporarily changing the system 
timezone to GMT (which then automatically changes back).

Your changes look OK here and they pass right? is there another test you 
were unable to add?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/22758
  
cc@wzhfy 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22758: [SPARK-25332][SQL] Instead of broadcast hash join ,Sort ...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/22758
  
do we need to handle this scenario? do we have any PR for handling this 
issue?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23173: [SPARK-26208][SQL] add headers to empty csv files...

2018-12-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23173#discussion_r238073218
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala
 ---
@@ -171,15 +171,21 @@ private[csv] class CsvOutputWriter(
 
   private var univocityGenerator: Option[UnivocityGenerator] = None
 
-  override def write(row: InternalRow): Unit = {
-val gen = univocityGenerator.getOrElse {
-  val charset = Charset.forName(params.charset)
-  val os = CodecStreams.createOutputStreamWriter(context, new 
Path(path), charset)
-  val newGen = new UnivocityGenerator(dataSchema, os, params)
-  univocityGenerator = Some(newGen)
-  newGen
-}
+  if (params.headerFlag) {
+val gen = getGen()
+gen.writeHeaders()
+  }
 
+  private def getGen(): UnivocityGenerator = univocityGenerator.getOrElse {
+val charset = Charset.forName(params.charset)
+val os = CodecStreams.createOutputStreamWriter(context, new 
Path(path), charset)
+val newGen = new UnivocityGenerator(dataSchema, os, params)
+univocityGenerator = Some(newGen)
+newGen
+  }
+
+  override def write(row: InternalRow): Unit = {
+val gen = getGen()
--- End diff --

Yeah we have two different approaches, both of which are fine IMHO. I think 
it's reasonable to clean that up in a follow-up if desired. WDYT @HyukjinKwon ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Filter Query Date and Timestamp...

2018-12-01 Thread sujith71955
Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/23197
  
cc marmbrus yhuai srowen vinodkc  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Filter Query Date and Timestamp...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23197
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23197: [SPARK-26165][Optimizer] Flter Query Date and Timestamp ...

2018-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23197
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >