[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21092
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2365/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2418/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21093
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21093
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2417/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2018-04-17 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/19381
  
@WeichenXu123 we were discussing this when we moved out common math code 
into `mllib-local`, but there is no umbrella ticket around it. I talked to many 
companies, and that is one of the pain point using mllib. Feel free to create 
ticket. 

Thanks. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2416/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21092
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-spark-integration/2365/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21093
  
**[Test build #89484 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89484/testReport)**
 for PR 21093 at commit 
[`fc5d976`](https://github.com/apache/spark/commit/fc5d976ffb33ebec996415ac1296196f8458a01f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20636
  
**[Test build #89485 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89485/testReport)**
 for PR 20636 at commit 
[`21b3708`](https://github.com/apache/spark/commit/21b3708f3acc4a170469e854d1be45d597a4a7d1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21093: [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC...

2018-04-17 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/21093

[SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3

## What changes were proposed in this pull request?

This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. 
Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3 has 5 more 
patches (https://s.apache.org/Fll8).

Especially, the following ORC-285 is fixed at 1.4.3.

```scala
scala> val df = Seq(Array.empty[Float]).toDF()

scala> df.write.format("orc").save("/tmp/floatarray")

scala> spark.read.orc("/tmp/floatarray")
res1: org.apache.spark.sql.DataFrame = [value: array]

scala> spark.read.orc("/tmp/floatarray").show()
18/02/12 22:09:10 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.io.IOException: Error reading file: 
file:/tmp/floatarray/part-0-9c0b461b-4df1-4c23-aac1-3e4f349ac7d6-c000.snappy.orc
at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at 
org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78)
...
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
```

## How was this patch tested?

Pass the Jenkins test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-23340-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21093.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21093


commit fc5d976ffb33ebec996415ac1296196f8458a01f
Author: Dongjoon Hyun 
Date:   2018-02-17T08:25:36Z

[SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3

This PR updates Apache ORC dependencies to 1.4.3 released on February 9th. 
Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3 has 5 more 
patches (https://s.apache.org/Fll8).

Especially, the following ORC-285 is fixed at 1.4.3.

```scala
scala> val df = Seq(Array.empty[Float]).toDF()

scala> df.write.format("orc").save("/tmp/floatarray")

scala> spark.read.orc("/tmp/floatarray")
res1: org.apache.spark.sql.DataFrame = [value: array]

scala> spark.read.orc("/tmp/floatarray").show()
18/02/12 22:09:10 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.io.IOException: Error reading file: 
file:/tmp/floatarray/part-0-9c0b461b-4df1-4c23-aac1-3e4f349ac7d6-c000.snappy.orc
at 
org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1191)
at 
org.apache.orc.mapreduce.OrcMapreduceRecordReader.ensureBatch(OrcMapreduceRecordReader.java:78)
...
Caused by: java.io.EOFException: Read past EOF for compressed stream Stream 
for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
```

Pass the Jenkins test.

Author: Dongjoon Hyun 

Closes #20511 from dongjoon-hyun/SPARK-23340.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings for PySp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21092
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21092: [SPARK-23984][K8S][WIP] Initial Python Bindings f...

2018-04-17 Thread ifilonenko
GitHub user ifilonenko opened a pull request:

https://github.com/apache/spark/pull/21092

[SPARK-23984][K8S][WIP] Initial Python Bindings for PySpark on K8s

## What changes were proposed in this pull request?

Introducing Python Bindings for PySpark.

- [ ] Running PySpark Jobs
- [ ] Increased Default Memory Overhead value
- [ ] Dependency Management for virtualenv/conda

## How was this patch tested?

This patch was tested with 

- [ ] Unit Tests
- [ ] Integration tests with [this 
addition](https://github.com/apache-spark-on-k8s/spark-integration/pull/46)
```
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run SparkPi with a test secret mounted into the driver and executor pods
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run PySpark on simple pi.py example
Run completed in 4 minutes, 3 seconds.
Total number of tests run: 9
Suites: completed 2, aborted 0
Tests: succeeded 9, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
```

## Problematic Comments from [ifilonenko]

- [ ] Currently Docker image is built with Python2 --> needs to be generic 
for Python2/3
- [ ] `--py-files` is properly distributing but it seems that example 
commands like
```
exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
/opt/spark/examples/src/main/python/pi.py 
/opt/spark/examples/src/main/python/sort.py
```
is causing errors of `/opt/spark/examples/src/main/python/pi.py` thinking 
that `/opt/spark/examples/src/main/python/sort.py is an argument





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ifilonenko/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21092


commit fb5b9ed83d4e5ed73bc44b9d719ac0e52702655e
Author: Ilan Filonenko 
Date:   2018-04-16T03:23:43Z

initial architecture for PySpark w/o dockerfile work

commit b7b3db0abfbf425120fa21cc61e603c5d766f8af
Author: Ilan Filonenko 
Date:   2018-04-17T19:13:45Z

included entrypoint logic

commit 98cef8ceb0f04cfcefbc482c2a0fe39c75f620c4
Author: Ilan Filonenko 
Date:   2018-04-18T02:22:55Z

satisfying integration tests

commit dc670dcd07944ae30b9b425c26250a21986b2699
Author: Ilan Filonenko 
Date:   2018-04-18T05:20:12Z

end-to-end working pyspark

commit eabe4b9b784f37cca3dd9bcff17110944b50f5c8
Author: Ilan Filonenko 
Date:   2018-04-18T05:20:42Z

Merge pull request #1 from ifilonenko/py-spark

Initial architecture for PySpark w/o dependency management




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread Ngone51
Github user Ngone51 commented on the issue:

https://github.com/apache/spark/pull/20930
  
Hi, @xuanyuanking , I'm still confused (smile & cry). 
> Stage 2 retry 4 times triggered by Stage 3's fetch failed event. Actually 
in this scenario, stage 3 will always failed by fetch fail.

Stage 2 has no missing tasks, right? So,  there's no missing partitions for 
Stage 2 (which means Stage 3 can always get Stage 2's MapOutputs from 
`MapOutputTrackerMaster` ), right? So, why  Stage 3 will always failed by 
FetchFail?
 
Hope you can explain more. Thank you very much!



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182309137
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1266,6 +1266,9 @@ class DAGScheduler(
 }
 if (failedEpoch.contains(execId) && smt.epoch <= 
failedEpoch(execId)) {
   logInfo(s"Ignoring possibly bogus $smt completion from 
executor $execId")
+} else if (failedStages.contains(shuffleStage)) {
--- End diff --

This also confuse me before, as far as I'm concerned, the result task in 
such scenario(speculative task fail but original task success) is ok because it 
has no child stage, we can use the success task's result and 
`markStageAsFinished`. But for shuffle map task, it will cause inconformity 
between mapOutputTracker and stage's pendingPartitions, it must fix.
I'm not sure of ResultTask's behavior, can you give some advice?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89475/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20636
  
**[Test build #89475 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89475/testReport)**
 for PR 20636 at commit 
[`21b3708`](https://github.com/apache/spark/commit/21b3708f3acc4a170469e854d1be45d597a4a7d1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread Ngone51
Github user Ngone51 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182308871
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2399,50 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * This tests the case where origin task success after speculative task 
got FetchFailed
+   * before.
+   */
+  test("[SPARK-23811] FetchFailed comes before Success of same task will 
cause child stage" +
+" never succeed") {
+// Create 3 RDDs with shuffle dependencies on each other: rddA <--- 
rddB <--- rddC
+val rddA = new MyRDD(sc, 2, Nil)
+val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2))
+val shuffleIdA = shuffleDepA.shuffleId
+
+val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = 
mapOutputTracker)
+val shuffleDepB = new ShuffleDependency(rddB, new HashPartitioner(2))
+
+val rddC = new MyRDD(sc, 2, List(shuffleDepB), tracker = 
mapOutputTracker)
+
+submit(rddC, Array(0, 1))
+
+// Complete both tasks in rddA.
+assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0)
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus("hostA", 2)),
+  (Success, makeMapStatus("hostB", 2
+
+// The first task success
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2)))
+
+// The second task's speculative attempt fails first, but task self 
still running.
+// This may caused by ExecutorLost.
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(1),
--- End diff --

Maybe, you can `runEvent(SpeculativeTaskSubmitted)` first to simulate a 
speculative task submitted before you `runEvent(makeCompletetionEvent())`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182307786
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2399,50 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * This tests the case where origin task success after speculative task 
got FetchFailed
+   * before.
+   */
+  test("[SPARK-23811] FetchFailed comes before Success of same task will 
cause child stage" +
+" never succeed") {
--- End diff --

Thanks, I'll change it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182307728
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2399,50 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * This tests the case where origin task success after speculative task 
got FetchFailed
+   * before.
+   */
+  test("[SPARK-23811] FetchFailed comes before Success of same task will 
cause child stage" +
+" never succeed") {
+// Create 3 RDDs with shuffle dependencies on each other: rddA <--- 
rddB <--- rddC
+val rddA = new MyRDD(sc, 2, Nil)
+val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2))
+val shuffleIdA = shuffleDepA.shuffleId
+
+val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = 
mapOutputTracker)
+val shuffleDepB = new ShuffleDependency(rddB, new HashPartitioner(2))
+
+val rddC = new MyRDD(sc, 2, List(shuffleDepB), tracker = 
mapOutputTracker)
+
+submit(rddC, Array(0, 1))
+
+// Complete both tasks in rddA.
+assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0)
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus("hostA", 2)),
+  (Success, makeMapStatus("hostB", 2
+
+// The first task success
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2)))
+
+// The second task's speculative attempt fails first, but task self 
still running.
+// This may caused by ExecutorLost.
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(1),
--- End diff --

Here we only need to mock the speculative task failed event came before 
success event, `makeCompletionEvent` with same taskSets's task can achieve such 
goal. This also use in `task events always posted in speculation / when stage 
is killed`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16476: [SPARK-19084][SQL] Implement expression field

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16476
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r182303815
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala 
---
@@ -27,10 +27,11 @@ import org.apache.spark.sql.DataFrame
 /**
  * Evaluator for multiclass classification.
  *
- * @param predictionAndLabels an RDD of (prediction, label) pairs.
+ * @param predAndLabelsWithOptWeight an RDD of (prediction, label, weight) 
or
+ *   (prediction, label) pairs.
  */
 @Since("1.1.0")
-class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, 
Double)]) {
+class MulticlassMetrics @Since("1.1.0") (predAndLabelsWithOptWeight: 
RDD[_]) {
--- End diff --

hmm the build fails here though with an error indicating that the methods 
are the same after type erasure, perhaps I should revert this code back:

[error] 
/home/jenkins/workspace/SparkPullRequestBuilder@2/mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala:35:
 double definition:
[error] constructor MulticlassMetrics: (predLabelsWeight: 
org.apache.spark.rdd.RDD[(Double, Double, 
Double)])org.apache.spark.mllib.evaluation.MulticlassMetrics at line 33 and
[error] constructor MulticlassMetrics: (predAndLabels: 
org.apache.spark.rdd.RDD[(Double, 
Double)])org.apache.spark.mllib.evaluation.MulticlassMetrics at line 35
[error] have same type after erasure: (predLabelsWeight: 
org.apache.spark.rdd.RDD)org.apache.spark.mllib.evaluation.MulticlassMetrics
[error]   def this(predAndLabels: RDD[(Double, Double)]) =




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21086: [SPARK-24002] [SQL] Task not serializable caused ...

2018-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21086


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....

2018-04-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21086
  
If the community hit this issue, we can then backport it to Spark 2.3. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21086: [SPARK-24002] [SQL] Task not serializable caused by org....

2018-04-17 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21086
  
Thanks! Merged to master/2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #89483 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89483/testReport)**
 for PR 17086 at commit 
[`cf941af`](https://github.com/apache/spark/commit/cf941af63e14282004e978c7195ebfa09666115c).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class MulticlassMetrics @Since(\"2.4.0\") (predLabelsWeight: 
RDD[(Double, Double, Double)]) `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89483/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2415/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17086
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20535: [SPARK-23341][SQL] define some standard options f...

2018-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20535


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21029
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2414/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use weight...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17086
  
**[Test build #89483 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89483/testReport)**
 for PR 17086 at commit 
[`cf941af`](https://github.com/apache/spark/commit/cf941af63e14282004e978c7195ebfa09666115c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21029: [SPARK-23952] remove type parameter in DataReaderFactory

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21029
  
**[Test build #89482 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89482/testReport)**
 for PR 21029 at commit 
[`18e391a`](https://github.com/apache/spark/commit/18e391a008a27f38dfdd2d527991d137e79b7a94).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r182302080
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/evaluation/MulticlassMetrics.scala 
---
@@ -27,10 +27,11 @@ import org.apache.spark.sql.DataFrame
 /**
  * Evaluator for multiclass classification.
  *
- * @param predictionAndLabels an RDD of (prediction, label) pairs.
+ * @param predAndLabelsWithOptWeight an RDD of (prediction, label, weight) 
or
+ *   (prediction, label) pairs.
  */
 @Since("1.1.0")
-class MulticlassMetrics @Since("1.1.0") (predictionAndLabels: RDD[(Double, 
Double)]) {
+class MulticlassMetrics @Since("1.1.0") (predAndLabelsWithOptWeight: 
RDD[_]) {
--- End diff --

good idea, this also simplifies the calculation of the confusions, 
fpByClass, tpByClass and labelCountByClass


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20535: [SPARK-23341][SQL] define some standard options for data...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20535
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89474/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21061
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21061: [SPARK-23914][SQL] Add array_union function

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21061
  
**[Test build #89474 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89474/testReport)**
 for PR 21061 at commit 
[`cf65616`](https://github.com/apache/spark/commit/cf65616d019ad21c6f498e2c856c3ee396e9dbd2).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class ArraySetUtils extends BinaryExpression with 
ExpectsInputTypes `
  * `case class ArrayUnion(left: Expression, right: Expression) extends 
ArraySetUtils `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21074#discussion_r182301592
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -176,10 +176,18 @@ object TypeCoercion {
   }
 
   private def findWiderCommonType(types: Seq[DataType]): Option[DataType] 
= {
-types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
-  case Some(d) => findWiderTypeForTwo(d, c)
-  case None => None
-})
+// findWiderTypeForTwo doesn't satisfy the associative law, i.e. (a op 
b) op c may not equal
+// to a op (b op c). This is only a problem for StringType. Excluding 
StringType,
+// findWiderTypeForTwo satisfies the associative law. For instance, 
(TimestampType,
+// IntegerType, StringType) should have StringType as the wider common 
type.
+val (stringTypes, nonStringTypes) = types.partition { t =>
+  t == StringType || t == ArrayType(StringType)
--- End diff --

we need something like
```
def hasStringType(dt: DataType): Boolean = dt match {
  case StringType => true
  case ArrayType(et, _) => hasStringType(et)
  case _ => false // Add StructType if we support string promotion for 
struct fields in the future.
}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21091
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2413/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21091
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21091
  
**[Test build #89481 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89481/testReport)**
 for PR 21091 at commit 
[`fd099bf`](https://github.com/apache/spark/commit/fd099bf1e1f254c5fe8616a45b2076c41043a474).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-17 Thread jinxing64
GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/21091

[SPARK-22676][FOLLOW-UP] fix code style for test.

## What changes were proposed in this pull request?

This pr address comments in https://github.com/apache/spark/pull/19868 ;
Fix the code style for `org.apache.spark.sql.hive.QueryPartitionSuite` by 
using:
`withTempView`, `withTempDir`, `withTable`...

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-22676-FOLLOW-UP

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21091


commit fd099bf1e1f254c5fe8616a45b2076c41043a474
Author: jinxing 
Date:   2018-04-18T03:35:10Z

[SPARK-22676][FOLLOW-UP] fix code style for test.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21091: [SPARK-22676][FOLLOW-UP] fix code style for test.

2018-04-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/21091
  
Jenkins, test this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r182300738
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
 ---
@@ -75,11 +80,16 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
 SchemaUtils.checkColumnType(schema, $(predictionCol), DoubleType)
 SchemaUtils.checkNumericType(schema, $(labelCol))
 
-val predictionAndLabels =
-  dataset.select(col($(predictionCol)), 
col($(labelCol)).cast(DoubleType)).rdd.map {
-case Row(prediction: Double, label: Double) => (prediction, label)
+val predictionAndLabelsWithWeights =
+  dataset.select(col($(predictionCol)), 
col($(labelCol)).cast(DoubleType),
+if (!isDefined(weightCol) || $(weightCol).isEmpty) lit(1.0) else 
col($(weightCol)))
+.rdd.map {
+case Row(prediction: Double, label: Double, weight: Double) => 
(prediction, label, weight)
   }
-val metrics = new MulticlassMetrics(predictionAndLabels)
+dataset.select(col($(predictionCol)), 
col($(labelCol)).cast(DoubleType)).rdd.map {
+  case Row(prediction: Double, label: Double) => (prediction, label)
+}.values.countByValue()
--- End diff --

good catch -- hmm that shouldn't be there, not sure why I added it, removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17086: [SPARK-18693][ML][MLLIB] ML Evaluators should use...

2018-04-17 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request:

https://github.com/apache/spark/pull/17086#discussion_r182300185
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/MulticlassClassificationEvaluator.scala
 ---
@@ -67,6 +68,10 @@ class MulticlassClassificationEvaluator @Since("1.5.0") 
(@Since("1.5.0") overrid
   @Since("1.5.0")
   def setLabelCol(value: String): this.type = set(labelCol, value)
 
+  /** @group setParam */
+  @Since("2.2.0")
--- End diff --

done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182299915
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1266,6 +1266,9 @@ class DAGScheduler(
 }
 if (failedEpoch.contains(execId) && smt.epoch <= 
failedEpoch(execId)) {
   logInfo(s"Ignoring possibly bogus $smt completion from 
executor $execId")
+} else if (failedStages.contains(shuffleStage)) {
--- End diff --

Why we only have a problem with shuffle map task not result task?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182299803
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2399,50 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * This tests the case where origin task success after speculative task 
got FetchFailed
+   * before.
+   */
+  test("[SPARK-23811] FetchFailed comes before Success of same task will 
cause child stage" +
+" never succeed") {
+// Create 3 RDDs with shuffle dependencies on each other: rddA <--- 
rddB <--- rddC
+val rddA = new MyRDD(sc, 2, Nil)
+val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2))
+val shuffleIdA = shuffleDepA.shuffleId
+
+val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = 
mapOutputTracker)
+val shuffleDepB = new ShuffleDependency(rddB, new HashPartitioner(2))
+
+val rddC = new MyRDD(sc, 2, List(shuffleDepB), tracker = 
mapOutputTracker)
+
+submit(rddC, Array(0, 1))
+
+// Complete both tasks in rddA.
+assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0)
+complete(taskSets(0), Seq(
+  (Success, makeMapStatus("hostA", 2)),
+  (Success, makeMapStatus("hostB", 2
+
+// The first task success
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2)))
+
+// The second task's speculative attempt fails first, but task self 
still running.
+// This may caused by ExecutorLost.
+runEvent(makeCompletionEvent(
+  taskSets(1).tasks(1),
--- End diff --

Sorry I'm not very familiar with this test suite, how can you tell it's a 
speculative task?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182299503
  
--- Diff: 
core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala ---
@@ -2399,6 +2399,50 @@ class DAGSchedulerSuite extends SparkFunSuite with 
LocalSparkContext with TimeLi
 }
   }
 
+  /**
+   * This tests the case where origin task success after speculative task 
got FetchFailed
+   * before.
+   */
+  test("[SPARK-23811] FetchFailed comes before Success of same task will 
cause child stage" +
+" never succeed") {
--- End diff --

nit: the test name should describe the expected behavior not the wrong one.
`SPARK-23811: staged failed by FetchFailed should ignore following 
successful tasks`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21083
  
> That said, the InnerLike joins should already be covered by 1 and might 
not be worth being considered again in this optimization rule.

Previously the `InferFiltersFromConstraints` adds the additional filters to 
join condition, so inner joins are covered by 1. Here I changed this rule to 
directly add additional filters to join children, so inner joins also need to 
be considered here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21083
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21083
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2412/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21083: [SPARK-21479][SPARK-23564][SQL] infer additional filters...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21083
  
**[Test build #89480 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89480/testReport)**
 for PR 21083 at commit 
[`787cddf`](https://github.com/apache/spark/commit/787cddffeba0f21cd40312bcbf84d1bb75126044).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21083: [SPARK-21479][SPARK-23564][SQL] infer additional ...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21083#discussion_r182297617
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -661,21 +661,51 @@ object InferFiltersFromConstraints extends 
Rule[LogicalPlan] with PredicateHelpe
   }
 
 case join @ Join(left, right, joinType, conditionOpt) =>
-  // Only consider constraints that can be pushed down completely to 
either the left or the
-  // right child
-  val constraints = join.allConstraints.filter { c =>
-c.references.subsetOf(left.outputSet) || 
c.references.subsetOf(right.outputSet)
-  }
-  // Remove those constraints that are already enforced by either the 
left or the right child
-  val additionalConstraints = constraints -- (left.constraints ++ 
right.constraints)
-  val newConditionOpt = conditionOpt match {
-case Some(condition) =>
-  val newFilters = additionalConstraints -- 
splitConjunctivePredicates(condition)
-  if (newFilters.nonEmpty) Option(And(newFilters.reduce(And), 
condition)) else None
-case None =>
-  additionalConstraints.reduceOption(And)
+  joinType match {
+// For inner join, we can infer additional filters for both sides. 
LeftSemi is kind of an
+// inner join, it just drops the right side in the final output.
+case _: InnerLike | LeftSemi =>
+  val allConstraints = getAllConstraints(left, right, conditionOpt)
+  val newLeft = inferNewFilter(left, allConstraints)
+  val newRight = inferNewFilter(right, allConstraints)
+  join.copy(left = newLeft, right = newRight)
+
+// For right outer join, we can only infer additional filters for 
left side.
+case RightOuter =>
+  val allConstraints = getAllConstraints(left, right, conditionOpt)
+  val newLeft = inferNewFilter(left, allConstraints)
+  join.copy(left = newLeft)
+
+// For left join, we can only infer additional filters for right 
side.
+case LeftOuter | LeftAnti =>
+  val allConstraints = getAllConstraints(left, right, conditionOpt)
+  val newRight = inferNewFilter(right, allConstraints)
+  join.copy(right = newRight)
+
+case _ => join
   }
-  if (newConditionOpt.isDefined) Join(left, right, joinType, 
newConditionOpt) else join
+  }
+
+  private def getAllConstraints(
+  left: LogicalPlan,
+  right: LogicalPlan,
+  conditionOpt: Option[Expression]): Set[Expression] = {
+val baseConstraints = left.constraints.union(right.constraints)
+  
.union(conditionOpt.map(splitConjunctivePredicates).getOrElse(Nil).toSet)
+
baseConstraints.union(ConstraintsUtils.inferAdditionalConstraints(baseConstraints))
+  }
+
+  private def inferNewFilter(plan: LogicalPlan, constraints: 
Set[Expression]): LogicalPlan = {
--- End diff --

copying the plan is very cheap.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
@Ngone51 Thanks for your review.
> Does stage 2 is correspond to the never success stage in PR description ?

Stage 3 is the never success stage, stage 2 is its parent stage.

> So, why stage 2 retry 4 times when there's no more missing tasks?

Stage 2 retry 4 times triggered by Stage 3's fetch failed event. Actually 
in this scenario, stage 3 will always failed by fetch fail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2411/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2410/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182296297
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -750,6 +752,10 @@ private[spark] class TaskSetManager(
   if (tasksSuccessful == numTasks) {
 isZombie = true
   }
+} else if (fetchFailedTaskIndexSet.contains(index)) {
+  logInfo("Ignoring task-finished event for " + info.id + " in stage " 
+ taskSet.id +
+" because task " + index + " has already failed by FetchFailed")
+  return
--- End diff --

Yep, as @cloud-fan 's suggestion, handle this in `DAGScheduler` is a better 
choice.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20930: [SPARK-23811][Core] FetchFailed comes before Success of ...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #89479 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89479/testReport)**
 for PR 20930 at commit 
[`ba6f71a`](https://github.com/apache/spark/commit/ba6f71a0fc49ce2a07addec3496177c4b2b43fef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21074
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21074
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2409/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21074: [SPARK-21811][SQL] Fix the inconsistency behavior when f...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21074
  
**[Test build #89478 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89478/testReport)**
 for PR 21074 at commit 
[`4ce5081`](https://github.com/apache/spark/commit/4ce5081fb2da8dabb216413fdda4da0f0b061f71).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

2018-04-17 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21074#discussion_r182295312
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -176,10 +176,16 @@ object TypeCoercion {
   }
 
   private def findWiderCommonType(types: Seq[DataType]): Option[DataType] 
= {
-types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
-  case Some(d) => findWiderTypeForTwo(d, c)
-  case None => None
-})
+// findWiderTypeForTwo doesn't satisfy the associative law, i.e. (a op 
b) op c may not equal
+// to a op (b op c). This is only a problem for StringType. Excluding 
StringType,
+// findWiderTypeForTwo satisfies the associative law. For instance, 
(TimestampType,
+// IntegerType, StringType) should have StringType as the wider common 
type.
+val (stringTypes, nonStringTypes) = types.partition(_ == StringType)
--- End diff --

It's expected to , let me also fix it for array types. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20816: [SPARK-21479][SQL] Outer join filter pushdown in ...

2018-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20816


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20816
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20930: [SPARK-23811][Core] FetchFailed comes before Succ...

2018-04-17 Thread xuanyuanking
Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/20930#discussion_r182294068
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -750,6 +752,10 @@ private[spark] class TaskSetManager(
   if (tasksSuccessful == numTasks) {
 isZombie = true
   }
+} else if (fetchFailedTaskIndexSet.contains(index)) {
--- End diff --

Great thanks for you two's guidance guidance, that's more clear and the UT 
add for reproducing this problem can also used for checking it!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20816: [SPARK-21479][SQL] Outer join filter pushdown in null su...

2018-04-17 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20816
  
since this does fix the problem in a reasonable way, I'm merging it and 
will clean it up in https://github.com/apache/spark/pull/21083


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2408/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21038: [SPARK-22968][DStream] Throw an exception on partition r...

2018-04-17 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21038
  
Thanks @koeninger for the review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21053
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2407/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20636
  
**[Test build #89477 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89477/testReport)**
 for PR 20636 at commit 
[`21b3708`](https://github.com/apache/spark/commit/21b3708f3acc4a170469e854d1be45d597a4a7d1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21053
  
**[Test build #89476 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89476/testReport)**
 for PR 21053 at commit 
[`98465b1`](https://github.com/apache/spark/commit/98465b1e5584acfd15b97b2fa239481b238a9237).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21053: [SPARK-23924][SQL] Add element_at function

2018-04-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21053
  
@ueshin would it be possible to review this again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19868: [SPARK-22676] Avoid iterating all partition paths when s...

2018-04-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/19868
  
@cloud-fan
Thanks a lot for merging. 
I will address the left comments today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21019: [SPARK-23948] Trigger mapstage's job listener in submitM...

2018-04-17 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/21019
  
@squito @jiangxb1987
Thanks for merging.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21037: [SPARK-23919][SQL] Add array_position function

2018-04-17 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21037
  
@ueshin could you please review again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20636
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2406/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21038: [SPARK-22968][DStream] Throw an exception on part...

2018-04-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21038


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20636
  
**[Test build #89475 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89475/testReport)**
 for PR 20636 at commit 
[`21b3708`](https://github.com/apache/spark/commit/21b3708f3acc4a170469e854d1be45d597a4a7d1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21038: [SPARK-22968][DStream] Throw an exception on partition r...

2018-04-17 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/21038
  
Seems like that should help address the confusion.  Merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21073
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89473/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21073
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21073: [SPARK-23936][SQL] Implement map_concat

2018-04-17 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21073
  
**[Test build #89473 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89473/testReport)**
 for PR 21073 at commit 
[`44137cc`](https://github.com/apache/spark/commit/44137cc9a9949b4218d973dc46d905d3ce301bcd).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

2018-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21057
  
( I regret I happened to come over form Korea to Singapore too fast before 
your flight :-) )


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

2018-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21057
  
It's not urgent at all. I would appreciate it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21074: [SPARK-21811][SQL] Fix the inconsistency behavior...

2018-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21074#discussion_r182284460
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -176,10 +176,16 @@ object TypeCoercion {
   }
 
   private def findWiderCommonType(types: Seq[DataType]): Option[DataType] 
= {
-types.foldLeft[Option[DataType]](Some(NullType))((r, c) => r match {
-  case Some(d) => findWiderTypeForTwo(d, c)
-  case None => None
-})
+// findWiderTypeForTwo doesn't satisfy the associative law, i.e. (a op 
b) op c may not equal
+// to a op (b op c). This is only a problem for StringType. Excluding 
StringType,
+// findWiderTypeForTwo satisfies the associative law. For instance, 
(TimestampType,
+// IntegerType, StringType) should have StringType as the wider common 
type.
+val (stringTypes, nonStringTypes) = types.partition(_ == StringType)
--- End diff --

Out of curiosity, does this work with array types too (array of string vs 
array of non string types)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

2018-04-17 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21057
  
I will flight to Korea for a company workshop today. I can do this maybe
only in at tonight. If this isn't urgent, then it is okay.

On Wed, Apr 18, 2018, 10:03 AM Hyukjin Kwon 
wrote:

> Actually @viirya , would you be interested in
> this if you are available? I will do this by myself but I am currently not
> quite available. If you are busy too, let me try it anyway.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

2018-04-17 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21057
  
Actually @viirya, would you be interested in this if you are available? I 
will do this by myself but I am currently not quite available. If you are busy 
too, let me try it anyway.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21038: [SPARK-22968][DStream] Throw an exception on partition r...

2018-04-17 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/21038
  
Ping @koeninger , would you please help to review again. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >