date:20180227

[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20690
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1135/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-27 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
> it just means that for very long-running streams task restarts will 
eventually run out.

Ah, I know your means. Yeah, if we support task level retry we should also 
set the task retry number unlimited.

> But if you're worried that the current implementation of task restart 
will become incorrect as more complex scenarios are supported, I'd definitely 
lean towards deferring it until continuous processing is more feature-complete.

Yep, the "complex scenarios" I mentioned mainly including shuffle and 
aggregation scenario like comments in 
https://issues.apache.org/jira/browse/SPARK-20928?focusedCommentId=16245556=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16245556,
 in those scenario maybe task level retry should consider epoch align, but 
current implementation of task restart is completed for map-only continuous 
processing I think.

Agree with you about deferring it, so I just leave a comment in SPARK-23033 
and close this or you think this should reviewed by others?

> Do you want to spin that off into a separate PR? (I can handle it 
otherwise.)

Of cause, #20689 added a new interface `ContinuousDataReaderFactory` as our 
comments.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20690: [SPARK-23532][Standalone]Improve data locality when laun...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20690
  
**[Test build #87760 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87760/testReport)**
 for PR 20690 at commit 
[`f7efb22`](https://github.com/apache/spark/commit/f7efb22ddea3dc8eeccc833086d5a82cbce7e530).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20690: [SPARK-23532][Standalone]Improve data locality wh...

2018-02-27 Thread 10110346

GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/20690

[SPARK-23532][Standalone]Improve data locality when launching new executors 
for dynamic allocation

## What changes were proposed in this pull request?
Currently Spark on Yarn supports better data locality by considering the 
preferred locations of the pending tasks when dynamic allocation is enabled, 
Refer to _https://issues.apache.org/jira/browse/SPARK-4352_.
Mesos alse supports data locality, Refer to 
_https://issues.apache.org/jira/browse/SPARK-16944_

It would be better that Standalone can also support this feature.

## How was this patch tested?
Added a unit test,
and manual testing on HDFS 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark executorlocality

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20690.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20690


commit f7efb22ddea3dc8eeccc833086d5a82cbce7e530
Author: liuxian 
Date:   2018-02-28T07:33:44Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20689
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20689
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1134/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20667
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20667
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87748/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20667
  
**[Test build #87748 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87748/testReport)**
 for PR 20667 at commit 
[`bf79f4d`](https://github.com/apache/spark/commit/bf79f4d5c83c364c7f1fc05f158753d282409330).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20649: [SPARK-23462][SQL] improve missing field error message i...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20649
  
**[Test build #87759 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87759/testReport)**
 for PR 20649 at commit 
[`8cdb1d5`](https://github.com/apache/spark/commit/8cdb1d52117325fcbdd1cefc9e9f0616afdb2baa).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20689: [SPARK-23533][SS] Add support for changing ContinuousDat...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20689
  
**[Test build #87758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87758/testReport)**
 for PR 20689 at commit 
[`59cef98`](https://github.com/apache/spark/commit/59cef98868586a4f039b04e74c32c94eaff965c0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20675: [SPARK-23033][SS][Follow Up] Task level retry for...

2018-02-27 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/20675#discussion_r171161352
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousDataReader.java
 ---
@@ -33,4 +33,16 @@
  * as a restart checkpoint.
  */
 PartitionOffset getOffset();
+
+/**
+ * Set the start offset for the current record, only used in task 
retry. If setOffset keep
+ * default implementation, it means current ContinuousDataReader can't 
support task level retry.
+ *
+ * @param offset last offset before task retry.
+ */
+default void setOffset(PartitionOffset offset) {
--- End diff --

Cool, that's more clearer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20689: [SPARK-23533][SS] Add support for changing Contin...

2018-02-27 Thread xuanyuanking

GitHub user xuanyuanking opened a pull request:

https://github.com/apache/spark/pull/20689

[SPARK-23533][SS] Add support for changing ContinuousDataReader's 
startOffset

## What changes were proposed in this pull request?

As discussion in #20675, we need add a new interface 
`ContinuousDataReaderFactory` to support the requirements of setting start 
offset in Continuous Processing.

## How was this patch tested?

Existing UT.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuanyuanking/spark SPARK-23533

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20689.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20689


commit 59cef98868586a4f039b04e74c32c94eaff965c0
Author: Yuanjian Li 
Date:   2018-02-28T07:29:57Z

[SPARK-23533][SS] Add support for changing ContinousDataReader's startOffset




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20472: [SPARK-22751][ML]Improve ML RandomForest shuffle ...

2018-02-27 Thread lucio-yz

Github user lucio-yz commented on a diff in the pull request:

https://github.com/apache/spark/pull/20472#discussion_r171160692
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
@@ -1001,11 +996,18 @@ private[spark] object RandomForest extends Logging {
 } else {
   val numSplits = metadata.numSplits(featureIndex)
 
-  // get count for each distinct value
-  val (valueCountMap, numSamples) = 
featureSamples.foldLeft((Map.empty[Double, Int], 0)) {
+  // get count for each distinct value except zero value
+  val (partValueCountMap, partNumSamples) = 
featureSamples.foldLeft((Map.empty[Double, Int], 0)) {
 case ((m, cnt), x) =>
   (m + ((x, m.getOrElse(x, 0) + 1)), cnt + 1)
   }
+
+  // Calculate the number of samples for finding splits
+  val numSamples: Int = (samplesFractionForFindSplits(metadata) * 
metadata.numExamples).toInt
--- End diff --

I have seen the note of function _sample_, and _sample_ does not guarantee 
to provide exactly the fraction of the count of the given RDD. It seems that 
requiring _numSamples - partNumSamples_ to be non-negative is a more efficient 
choice than trigger a _count_. The degree of approximation depends upon the 
degree approximation of _sample_. And it's sure that the splits will be 
inaccurate.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20667
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20667
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87746/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20667
  
**[Test build #87746 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87746/testReport)**
 for PR 20667 at commit 
[`3379899`](https://github.com/apache/spark/commit/337989945b0757dfc6a069315c4e7828afe77d00).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87757/testReport)**
 for PR 20681 at commit 
[`5c922ca`](https://github.com/apache/spark/commit/5c922cacc498018bb22bfe7dde7a137776e6fe3f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1133/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87747 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87747/testReport)**
 for PR 20681 at commit 
[`999f86f`](https://github.com/apache/spark/commit/999f86f89ae05147136de8ace51efeb972bf1538).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87747/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87750/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87750/testReport)**
 for PR 20647 at commit 
[`c5af52e`](https://github.com/apache/spark/commit/c5af52ea185e6f94f64096a4937f462db47a4fc5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20678#discussion_r171155800
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1518,7 +1525,9 @@ class SQLConf extends Serializable with Logging {
 
   def rangeExchangeSampleSizePerPartition: Int = 
getConf(RANGE_EXCHANGE_SAMPLE_SIZE_PER_PARTITION)
 
-  def arrowEnable: Boolean = getConf(ARROW_EXECUTION_ENABLE)
+  def arrowEnable: Boolean = getConf(ARROW_EXECUTION_ENABLED)
--- End diff --

Actually seems we don't use `arrowEnable` too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20678#discussion_r171155732
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1518,7 +1525,9 @@ class SQLConf extends Serializable with Logging {
 
   def rangeExchangeSampleSizePerPartition: Int = 
getConf(RANGE_EXCHANGE_SAMPLE_SIZE_PER_PARTITION)
 
-  def arrowEnable: Boolean = getConf(ARROW_EXECUTION_ENABLE)
+  def arrowEnable: Boolean = getConf(ARROW_EXECUTION_ENABLED)
+
+  def arrowFallbackEnable: Boolean = getConf(ARROW_FALLBACK_ENABLED)
--- End diff --

nit: Have we used this `arrowFallbackEnable` definition?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20449
  
**[Test build #87756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87756/testReport)**
 for PR 20449 at commit 
[`8c15c56`](https://github.com/apache/spark/commit/8c15c564c7d2d0adc0cfd725e34dbd359c6a0ab6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r171153715
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,63 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+// In this test case, we create a Spark job of two stages. The second 
stage is cancelled during
+// execution and a counter is used to make sure that the corresponding 
tasks are indeed
+// cancelled.
+import JobCancellationSuite._
+val numSlice = 1
--- End diff --

I'm not sure, let's just try it :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20449
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-02-27 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20683
  
> a extra boolean expression was added to test if a regex was present.

Can you please explain what's the meaning of "if a regex was present"?

Seems the fix is not so necessary. If you want to filter out some temp 
files, you can write your own `filter` instead of using Spark Streaming's 
default one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-27 Thread advancedxy

Github user advancedxy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r171152501
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,63 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+// In this test case, we create a Spark job of two stages. The second 
stage is cancelled during
+// execution and a counter is used to make sure that the corresponding 
tasks are indeed
+// cancelled.
+import JobCancellationSuite._
+val numSlice = 1
--- End diff --

Will update it later. 

But looks like Jenkins are having troubles there days? it it back to normal?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20685
  
**[Test build #87755 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87755/testReport)**
 for PR 20685 at commit 
[`110c851`](https://github.com/apache/spark/commit/110c8510dcc6c2abaf4ca416b95854daf129b0a5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20685
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1132/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20685
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-27 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/20685
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-27 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/20611
  
@gatorsmile Is any issue with this PR? can you please re look into this. 
Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20043
  
**[Test build #87754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87754/testReport)**
 for PR 20043 at commit 
[`f59bb19`](https://github.com/apache/spark/commit/f59bb19a3fd04b24ea3077a12283777be0af437d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1131/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20043: [SPARK-22856][SQL] Add wrappers for codegen output and n...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20043
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87743/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18906
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18906
  
**[Test build #87743 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87743/testReport)**
 for PR 18906 at commit 
[`e6e6dbf`](https://github.com/apache/spark/commit/e6e6dbf5cd8d8c8e15977fe89f741483eb6138a6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20679
  
**[Test build #87753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87753/testReport)**
 for PR 20679 at commit 
[`b37f24f`](https://github.com/apache/spark/commit/b37f24f372bb45ff9b8380222e0eb7e6d8819e58).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20679
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1130/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20679
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20449
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Finally, Spark 2.3 passes the vote. Could you review this, @gatorsmile ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20679
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20684: [SPARK-23523] [SQL] Fix the incorrect result caused by t...

2018-02-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20684
  
Hi, @gatorsmile and @cloud-fan .
Since 2.3 vote passed, can we have this in `branch-2.3` for Apache Spark 
2.3.1?
The conflicts on `LocalRelation.scala` is simply due to indentation changes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20449: [SPARK-23040][CORE]: Returns interruptible iterat...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20449#discussion_r171148057
  
--- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala 
---
@@ -320,6 +321,63 @@ class JobCancellationSuite extends SparkFunSuite with 
Matchers with BeforeAndAft
 f2.get()
   }
 
+  test("Interruptible iterator of shuffle reader") {
+// In this test case, we create a Spark job of two stages. The second 
stage is cancelled during
+// execution and a counter is used to make sure that the corresponding 
tasks are indeed
+// cancelled.
+import JobCancellationSuite._
+val numSlice = 1
--- End diff --

can we hardcode it? using a variable makes people feel like they can change 
its value and the test can still pass, however it's not true as 
`assert(executionOfInterruptibleCounter.get() <= 10)` needs to be updated too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20208
  
**[Test build #87752 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87752/testReport)**
 for PR 20208 at commit 
[`6ae471c`](https://github.com/apache/spark/commit/6ae471c8ecaae3eb3888eecaac1c4e7552bedcc6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1129/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-02-27 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/20208
  
Rebased to the master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20688
  
**[Test build #87751 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87751/testReport)**
 for PR 20688 at commit 
[`8bfadc3`](https://github.com/apache/spark/commit/8bfadc387393c2a42d09ef11707b1f0d3d27a53a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1128/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20685
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87741/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20685
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20685: [SPARK-23524] Big local shuffle blocks should not be che...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20685
  
**[Test build #87741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87741/testReport)**
 for PR 20685 at commit 
[`110c851`](https://github.com/apache/spark/commit/110c8510dcc6c2abaf4ca416b95854daf129b0a5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread Ngone51

Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20667
  
Hi, @jiangxb1987 , thanks for your kindly explanation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20647#discussion_r171143986
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2StringFormat.scala
 ---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.spark.sql.catalyst.expressions.Attribute
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources.DataSourceRegister
+import org.apache.spark.sql.sources.v2.DataSourceV2
+import org.apache.spark.sql.sources.v2.reader._
+import org.apache.spark.util.Utils
+
+/**
+ * A trait that can be used by data source v2 related query plans(both 
logical and physical), to
+ * provide a string format of the data source information for explain.
+ */
+trait DataSourceV2StringFormat {
+
+  /**
+   * The instance of this data source implementation. Note that we only 
consider its class in
+   * equals/hashCode, not the instance itself.
+   */
+  def source: DataSourceV2
+
+  /**
+   * The output of the data source reader, w.r.t. column pruning.
+   */
+  def output: Seq[Attribute]
+
+  /**
+   * The options for this data source reader.
+   */
+  def options: Map[String, String]
+
+  /**
+   * The created data source reader. Here we use it to get the filters 
that has been pushed down
+   * so far, itself doesn't take part in the equals/hashCode.
+   */
+  def reader: DataSourceReader
+
+  private lazy val filters = reader match {
+case s: SupportsPushDownCatalystFilters => 
s.pushedCatalystFilters().toSet
+case s: SupportsPushDownFilters => s.pushedFilters().toSet
+case _ => Set.empty
+  }
+
+  private def sourceName: String = source match {
+case registered: DataSourceRegister => registered.shortName()
+case _ => source.getClass.getSimpleName.stripSuffix("$")
+  }
+
+  def metadataString: String = {
+val entries = scala.collection.mutable.ArrayBuffer.empty[(String, 
String)]
+
+if (filters.nonEmpty) {
+  entries += "Pushed Filters" -> filters.mkString("[", ", ", "]")
+}
+
+// TODO: we should only display some standard options like path, 
table, etc.
+entries ++= options
+
+val outputStr = Utils.truncatedString(output, "[", ", ", "]")
+
+val entriesStr = if (entries.nonEmpty) {
+  Utils.truncatedString(entries.map {
+case (key, value) => StringUtils.abbreviate(redact(key + ":" + 
value), 100)
--- End diff --

Now users can match password by `password:.+` to redact password.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20667
  
In case the same `BlockManagerId` being created multiple times, this cache 
will ensure we always use the first one that is created, which make it possible 
for the rest `BlockManagerId` instances being recycled shortly. The downside is 
we have to persist all the distinct `BlockManagerId` created.

Since the code is added long times ago, and it's actually hard to examine 
the performance with/without the cache, we'd like to keep it for now. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20647: [SPARK-23303][SQL] improve the explain result for...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20647#discussion_r171143887
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -107,19 +104,36 @@ case class DataSourceV2Relation(
 }
 
 /**
- * A specialization of DataSourceV2Relation with the streaming bit set to 
true. Otherwise identical
- * to the non-streaming relation.
+ * A specialization of [[DataSourceV2Relation]] with the streaming bit set 
to true.
+ *
+ * Note that, this plan has a mutable reader, so Spark won't apply 
operator push-down for this plan,
+ * to avoid making the plan mutable. We should consolidate this plan and 
[[DataSourceV2Relation]]
+ * after we figure out how to apply operator push-down for streaming data 
sources.
  */
 case class StreamingDataSourceV2Relation(
 output: Seq[AttributeReference],
+source: DataSourceV2,
+options: Map[String, String],
 reader: DataSourceReader)
-extends LeafNode with DataSourceReaderHolder with 
MultiInstanceRelation {
+  extends LeafNode with MultiInstanceRelation with 
DataSourceV2StringFormat {
+
   override def isStreaming: Boolean = true
 
-  override def canEqual(other: Any): Boolean = 
other.isInstanceOf[StreamingDataSourceV2Relation]
+  override def simpleString: String = "Streaming RelationV2 " + 
metadataString
 
   override def newInstance(): LogicalPlan = copy(output = 
output.map(_.newInstance()))
 
+  // TODO: unify the equal/hashCode implementation for all data source v2 
query plans.
+  override def equals(other: Any): Boolean = other match {
+case other: StreamingDataSourceV2Relation =>
+  output == other.output && reader.getClass == other.reader.getClass 
&& options == other.options
--- End diff --

Now it's exactly same as before. We should clean it up after figure out how 
to push down operators to streaming relation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #87749 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87749/testReport)**
 for PR 13599 at commit 
[`86484d6`](https://github.com/apache/spark/commit/86484d67c3f85e2372cd1de69cafb3a4b7bbb691).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`
  * `  class DriverEndpoint(override val rpcEnv: RpcEnv)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87749/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20647
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1127/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20647
  
Hi @rdblue , I've opened https://issues.apache.org/jira/browse/SPARK-23531 
to include the type info. I'd like to do it later as it's a general problem in 
Spark SQL and many plans need to be updated like leaf nodes other than data 
source scan.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20647: [SPARK-23303][SQL] improve the explain result for data s...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20647
  
**[Test build #87750 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87750/testReport)**
 for PR 20647 at commit 
[`c5af52e`](https://github.com/apache/spark/commit/c5af52ea185e6f94f64096a4937f462db47a4fc5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread Ngone51

Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20667
  
Hi, @caneGuy , sorry for my previous comment as I mixed up ```BlockId``` 
with ```BlockManagerId```, and leave some wrong comments. And thanks for your 
reply.

Back to now, I have the same question with @cloud-fan ,
> Why we need this cache?

though, we have a better cache way(guava cache) now.

My confusions:
- It is weird  that we need to create a ```BlockManagerId ``` before  we 
get a same one from the cache.

- And on executor side, when ```BlockManagerId ``` registered to master and 
return with an updated ```BlockManagerId ``` , the new ```BlockManagerId ``` 
does not be updated to ```blockManagerIdCache```. So, it seems executor side's 
```BlockManagerId``` has little relevance with ```blockManagerIdCache```.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #87749 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87749/testReport)**
 for PR 13599 at commit 
[`86484d6`](https://github.com/apache/spark/commit/86484d67c3f85e2372cd1de69cafb3a4b7bbb691).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1126/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87740/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20688
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20688: [SPARK-23096][SS] Migrate rate source to V2

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20688
  
**[Test build #87740 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87740/testReport)**
 for PR 20688 at commit 
[`538223e`](https://github.com/apache/spark/commit/538223e52e1d12d82339a22390a9812beaccf8a6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20678
  
Will try to clean up soon.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20678#discussion_r171139748
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1689,6 +1689,10 @@ using the call `toPandas()` and when creating a 
Spark DataFrame from a Pandas Da
 `createDataFrame(pandas_df)`. To use Arrow when executing these calls, 
users need to first set
 the Spark configuration 'spark.sql.execution.arrow.enabled' to 'true'. 
This is disabled by default.
 
+In addition, optimizations enabled by 'spark.sql.execution.arrow.enabled' 
will fallback automatically
+to non-optimized implementations if an error occurs. This can be 
controlled by
--- End diff --

Let me try to rephrase this doc a bit. The point I was trying to make in 
this fallback (for now) was, to only do the fallback before the actual 
distributed computation within Spark.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-27 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20678#discussion_r171138898
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1986,55 +1986,89 @@ def toPandas(self):
 timezone = None
 
 if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", 
"false").lower() == "true":
+should_fallback = False
 try:
-from pyspark.sql.types import 
_check_dataframe_convert_date, \
-_check_dataframe_localize_timestamps, to_arrow_schema
+from pyspark.sql.types import to_arrow_schema
 from pyspark.sql.utils import 
require_minimum_pyarrow_version
+
 require_minimum_pyarrow_version()
-import pyarrow
 to_arrow_schema(self.schema)
-tables = self._collectAsArrow()
-if tables:
-table = pyarrow.concat_tables(tables)
-pdf = table.to_pandas()
-pdf = _check_dataframe_convert_date(pdf, self.schema)
-return _check_dataframe_localize_timestamps(pdf, 
timezone)
-else:
-return pd.DataFrame.from_records([], 
columns=self.columns)
 except Exception as e:
-msg = (
-"Note: toPandas attempted Arrow optimization because "
-"'spark.sql.execution.arrow.enabled' is set to true. 
Please set it to false "
-"to disable this.")
-raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
-else:
-pdf = pd.DataFrame.from_records(self.collect(), 
columns=self.columns)
 
-dtype = {}
+if 
self.sql_ctx.getConf("spark.sql.execution.arrow.fallback.enabled", "true") \
+.lower() == "true":
+msg = (
+"toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to 
true; however, "
+"failed by the reason below:\n  %s\n"
+"Attempts non-optimization as "
+"'spark.sql.execution.arrow.fallback.enabled' is 
set to "
+"true." % _exception_message(e))
+warnings.warn(msg)
+should_fallback = True
+else:
+msg = (
+"toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to 
true; however, "
+"failed by the reason below:\n  %s\n"
+"For fallback to non-optimization automatically, 
please set true to "
+"'spark.sql.execution.arrow.fallback.enabled'." % 
_exception_message(e))
+raise RuntimeError(msg)
+
+if not should_fallback:
--- End diff --

Correct, but there's one more - we fallback if PyArrow is not installed. 
Will add some comments to make this easier to read.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20667
  
**[Test build #87748 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87748/testReport)**
 for PR 20667 at commit 
[`bf79f4d`](https://github.com/apache/spark/commit/bf79f4d5c83c364c7f1fc05f158753d282409330).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20681
  
**[Test build #87747 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87747/testReport)**
 for PR 20681 at commit 
[`999f86f`](https://github.com/apache/spark/commit/999f86f89ae05147136de8ace51efeb972bf1538).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20681: [SPARK-23518][SQL] Avoid metastore access when the users...

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20681
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1125/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #87745 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87745/testReport)**
 for PR 13599 at commit 
[`3da68c7`](https://github.com/apache/spark/commit/3da68c75552798d841e7adefae1c2ae7cefff0b7).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class VirtualEnvFactory(pythonExec: String, conf: SparkConf, isDriver: 
Boolean)`
  * `  class DriverEndpoint(override val rpcEnv: RpcEnv)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87745/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case bl...

2018-02-27 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r171136135
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,17 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  /**
+   * Here we set max cache size as 1.Since the size of a 
BlockManagerId object
--- End diff --

nit:
```
The max cache size is hardcoded to 1, since the size of a 
BlockManagerId object is about 48B, the total memory cost should be below 1MB 
which is feasible.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20667
  
**[Test build #87746 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87746/testReport)**
 for PR 20667 at commit 
[`3379899`](https://github.com/apache/spark/commit/337989945b0757dfc6a069315c4e7828afe77d00).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13599
  
**[Test build #87745 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87745/testReport)**
 for PR 13599 at commit 
[`3da68c7`](https://github.com/apache/spark/commit/3da68c75552798d841e7adefae1c2ae7cefff0b7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1124/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2018-02-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13599
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20667
  
**[Test build #87744 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87744/testReport)**
 for PR 20667 at commit 
[`3379899`](https://github.com/apache/spark/commit/337989945b0757dfc6a069315c4e7828afe77d00).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20667
  
add to whitelist


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20667
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-27 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20667
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 462 matches

Mail list logo