[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21941
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21941
  
no idea, but `HiveClientSuites` seems flaky: 
https://issues.apache.org/jira/browse/SPARK-23622 (the error message is 
different though...)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21563: [SPARK-24557][ML] ClusteringEvaluator support arr...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21563


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21754: [SPARK-24705][SQL] ExchangeCoordinator broken whe...

2018-08-01 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207116871
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeCoordinatorSuite.scala
 ---
@@ -278,6 +278,25 @@ class ExchangeCoordinatorSuite extends SparkFunSuite 
with BeforeAndAfterAll {
 try f(spark) finally spark.stop()
   }
 
+  def withSparkSession(pairs: (String, String)*)(f: SparkSession => Unit): 
Unit = {
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21941
  
@maropu Is this a transient failure ? Does not seem related to my change ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21563: [SPARK-24557][ML] ClusteringEvaluator support array inpu...

2018-08-01 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/21563
  
LGTM. Merged into master. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21754
  
Oh, my bad. I just wanted to say; `EnsureRequirements ` sets `2` in 
ExchangeCoordinator, then the number changes from `2` to `1`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21963: [SPARK-21274][FOLLOW-UP][SQL] Enable support of MINUS AL...

2018-08-01 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/21963
  
@maropu OK. I will create a new JIRA. Actually i missed it in original PR. 
@gatorsmile had put a code review comment to add "MINUS" in the documentation 
that got me to test the scenario :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21892
  
Also, can you update the description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21919
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93960/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21919
  
**[Test build #93960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93960/testReport)**
 for PR 21919 at commit 
[`399562e`](https://github.com/apache/spark/commit/399562ec54deec657f24c4a2a95a2d3c6698a35f).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21919
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21963: [SPARK-21274][FOLLOW-UP][SQL] Enable support of MINUS AL...

2018-08-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21963
  
Probably, IMO we need a new jira for this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21953
  
**[Test build #93964 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93964/testReport)**
 for PR 21953 at commit 
[`3986e75`](https://github.com/apache/spark/commit/3986e75c3c000e7a7e7674be6837d663499f35f1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21964: [SPARK-24788][SQL] RelationalGroupedDataset.toString wit...

2018-08-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21964
  
@gatorsmile `KeyValueGroupedDataset` has the same issue? It seems there is 
no chance for `KeyValueGroupedDataset` to have unresolved exprs.
https://github.com/apache/spark/pull/21752#discussion_r204883667


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when submitte...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21927
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1625/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when submitte...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21927
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93934/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21954
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21953: [SPARK-24992][Core] spark should randomize yarn local di...

2018-08-01 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21953
  
Jenkins, test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21954: [SPARK-23908][SQL] Add transform function.

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21954
  
**[Test build #93934 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93934/testReport)**
 for PR 21954 at commit 
[`ee450c5`](https://github.com/apache/spark/commit/ee450c5ef3f99d3bbf8dbbd05273bc63005bbccb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when submitte...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21927
  
**[Test build #93963 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93963/testReport)**
 for PR 21927 at commit 
[`bb819f3`](https://github.com/apache/spark/commit/bb819f383b4024945d84622c90605872052213be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21945: [SPARK-24989][Core] Add retrying support for OutOfDirect...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21945
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21945: [SPARK-24989][Core] Add retrying support for OutOfDirect...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21945
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93924/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21964: [SPARK-24788][SQL] RelationalGroupedDataset.toString wit...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21964
  
**[Test build #93962 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93962/testReport)**
 for PR 21964 at commit 
[`c4e7490`](https://github.com/apache/spark/commit/c4e7490f1762aff5ae5b7126adb9ddd8d987a77d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21962
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21964: [SPARK-24788][SQL] RelationalGroupedDataset.toString wit...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21964
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21964: [SPARK-24788][SQL] RelationalGroupedDataset.toString wit...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21964
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1624/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21945: [SPARK-24989][Core] Add retrying support for OutOfDirect...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21945
  
**[Test build #93924 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93924/testReport)**
 for PR 21945 at commit 
[`bb6841b`](https://github.com/apache/spark/commit/bb6841b3a7a160e252fe35dab82f4ddeb0032591).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21962
  
cc @rxin @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21892: [SPARK-24945][SQL] Switching to uniVocity 2.7.2

2018-08-01 Thread jbax
Github user jbax commented on the issue:

https://github.com/apache/spark/pull/21892
  
univocity-parsers-2.7.3 released. Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21964: [SPARK-24788][SQL] RelationalGroupedDataset.toStr...

2018-08-01 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/21964

[SPARK-24788][SQL] RelationalGroupedDataset.toString with unresolved exprs 
should not fail

## What changes were proposed in this pull request?
In the current master, `toString` throws an exception when 
`RelationalGroupedDataset` has unresolved expressions;
```
scala> spark.range(0, 10).groupBy("id")
res4: org.apache.spark.sql.RelationalGroupedDataset = 
RelationalGroupedDataset: [grouping expressions: [id: bigint], value: [id: 
bigint], type: GroupBy]

scala> spark.range(0, 10).groupBy('id)
org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
dataType on unresolved object, tree: 'id
  at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
  at 
org.apache.spark.sql.RelationalGroupedDataset$$anonfun$12.apply(RelationalGroupedDataset.scala:474)
  at 
org.apache.spark.sql.RelationalGroupedDataset$$anonfun$12.apply(RelationalGroupedDataset.scala:473)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
  at 
org.apache.spark.sql.RelationalGroupedDataset.toString(RelationalGroupedDataset.scala:473)
  at 
scala.runtime.ScalaRunTime$.scala$runtime$ScalaRunTime$$inner$1(ScalaRunTime.scala:332)
  at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:337)
  at scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:345)
```

Closes #21752

## How was this patch tested?
Added tests in `DataFrameAggregateSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-24788

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21964


commit 465e7624073016d01ae6d3c5df501bf9b2c6410b
Author: Chris Horn 
Date:   2018-07-11T21:25:26Z

SPARK-24788 failing test case

commit e995b0bf2824593532056ff0048e65e8a33e5aad
Author: Chris Horn 
Date:   2018-07-11T21:25:54Z

SPARK-24788 fixed UnresolvedException when toString an unresolved grouping 
expression

commit 5213635d595f76261a8387e5a5135ebd9bcfa8d9
Author: Chris Horn 
Date:   2018-07-13T19:09:35Z

simplify test description; remove whitespace

commit 2e48604ff9aadebc4f7f3f8edeee252722967da9
Author: Chris Horn 
Date:   2018-07-13T19:22:07Z

do not use Matchers

commit c4e7490f1762aff5ae5b7126adb9ddd8d987a77d
Author: Takeshi Yamamuro 
Date:   2018-08-02T06:20:34Z

Fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21962
  
**[Test build #93961 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93961/testReport)**
 for PR 21962 at commit 
[`0135ba4`](https://github.com/apache/spark/commit/0135ba4987238a29da1693a101c286cc0bace9b4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21963: [SPARK-21274][FOLLOW-UP][SQL] Enable support of MINUS AL...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21963
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21962
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1623/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21962
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21963: [SPARK-21274][FOLLOW-UP][SQL] Enable support of MINUS AL...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21963
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1622/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21919
  
**[Test build #93960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93960/testReport)**
 for PR 21919 at commit 
[`399562e`](https://github.com/apache/spark/commit/399562ec54deec657f24c4a2a95a2d3c6698a35f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21935
  
**[Test build #93959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93959/testReport)**
 for PR 21935 at commit 
[`921e6cb`](https://github.com/apache/spark/commit/921e6cb371d2c3499008bf46c43c256f702ddcde).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21963: [SPARK-21274][FOLLOWUP][SQL] Enable support of MINUS ALL

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21963
  
**[Test build #93958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93958/testReport)**
 for PR 21963 at commit 
[`1dd61a2`](https://github.com/apache/spark/commit/1dd61a2da5cca08b584026a8a363bbcee39f164f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21963: [SPARK-21274][FOLLOWUP] Enable support of MINUS A...

2018-08-01 Thread dilipbiswal
GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/21963

[SPARK-21274][FOLLOWUP] Enable support of MINUS ALL

## What changes were proposed in this pull request?
Enable support for MINUS ALL which was gated at AstBuilder.

## How was this patch tested?
Added tests in SQLQueryTestSuite and modify PlanParserSuite.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark minus-all

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21963.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21963


commit 1dd61a2da5cca08b584026a8a363bbcee39f164f
Author: Dilip Biswal 
Date:   2018-08-01T20:59:51Z

[SPARK-21274][FOLLOWUP] Enable support of MINUS ALL




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21935: [SPARK-24773] Avro: support logical timestamp type with ...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21935
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1621/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21919
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21915: [SPARK-24954][Core] Fail fast on job submit if run a bar...

2018-08-01 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/21915
  
LGTM pending jenkins. Was there a JIRA reported about 
KafkaContinuousSourceSuite failure? cc: @tedyu ?


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93889/testReport/org.apache.spark.sql.kafka010/KafkaMicroBatchV1SourceSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21962
  
**[Test build #93957 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93957/testReport)**
 for PR 21962 at commit 
[`7f70aaa`](https://github.com/apache/spark/commit/7f70aaadbad680f41b4b3f42798c2f64e94e1e6a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21962
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93957/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21962
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21006: [SPARK-22256][MESOS] - Introduce spark.mesos.driver.memo...

2018-08-01 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21006
  
@pmackles ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21608
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93944/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21962
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21608
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21962
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1620/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21935: [SPARK-24773] Avro: support logical timestamp typ...

2018-08-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21935#discussion_r207110511
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
---
@@ -103,31 +108,49 @@ object SchemaConverters {
   catalystType: DataType,
   nullable: Boolean = false,
   recordName: String = "topLevelRecord",
-  prevNameSpace: String = ""): Schema = {
+  prevNameSpace: String = "",
+  outputTimestampType: AvroOutputTimestampType.Value = 
AvroOutputTimestampType.TIMESTAMP_MICROS
--- End diff --

It is also used in `CatalystDataToAvro`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21608
  
**[Test build #93944 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93944/testReport)**
 for PR 21608 at commit 
[`2e582f6`](https://github.com/apache/spark/commit/2e582f6a3d44887ca5b8ab57d2ddcb73609d1608).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21951
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93956/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21951
  
**[Test build #93956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93956/testReport)**
 for PR 21951 at commit 
[`a5762d7`](https://github.com/apache/spark/commit/a5762d76cbb12d3da0fd4721cea90456bea2a3ef).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21951
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21951
  
https://issues.apache.org/jira/browse/SPARK-24996 is created,


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21962: [SPARK-24865][FOLLOW-UP] Remove AnalysisBarrier LogicalP...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21962
  
**[Test build #93957 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93957/testReport)**
 for PR 21962 at commit 
[`7f70aaa`](https://github.com/apache/spark/commit/7f70aaadbad680f41b4b3f42798c2f64e94e1e6a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21962: [SPARK-24865] Remove AnalysisBarrier LogicalPlan ...

2018-08-01 Thread gatorsmile
GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/21962

[SPARK-24865] Remove AnalysisBarrier LogicalPlan Node

## What changes were proposed in this pull request?
Remove the AnalysisBarrier LogicalPlan node, which is useless now.

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark refactor2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21962.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21962


commit 7f70aaadbad680f41b4b3f42798c2f64e94e1e6a
Author: Xiao Li 
Date:   2018-08-02T06:03:41Z

clean




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21927#discussion_r207108174
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -340,6 +340,22 @@ class DAGScheduler(
 }
   }
 
+  /**
+   * Check to make sure we don't launch a barrier stage with unsupported 
RDD chain pattern. The
+   * following patterns are not supported:
+   * 1. Ancestor RDDs that have different number of partitions from the 
resulting RDD (eg.
+   * union()/coalesce()/first()/PartitionPruningRDD);
+   * 2. An RDD that depends on multiple barrier RDDs (eg. 
barrierRdd1.zip(barrierRdd2)).
+   */
+  private def checkBarrierStageWithRDDChainPattern(rdd: RDD[_], 
numPartitions: Int): Unit = {
--- End diff --

It would be nice to rename `numPartitions` to `numTasksInStage` (or a 
better name).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for ...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21951


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21927#discussion_r207108898
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1946,4 +1990,11 @@ private[spark] object DAGScheduler {
 
   // Number of consecutive stage attempts allowed before a stage is aborted
   val DEFAULT_MAX_CONSECUTIVE_STAGE_ATTEMPTS = 4
+
+  // Error message when running a barrier stage that have unsupported RDD 
chain pattern.
+  val ERROR_MESSAGE_RUN_BARRIER_WITH_UNSUPPORTED_RDD_CHAIN_PATTERN =
+"[SPARK-24820][SPARK-24821]: Barrier execution mode does not allow the 
following pattern of " +
+  "RDD chain within a barrier stage:\n1. Ancestor RDDs that have 
different number of " +
+  "partitions from the resulting RDD (eg. 
union()/coalesce()/first()/PartitionPruningRDD);\n" +
--- End diff --

Please also list `take()`. It would be nice to provide a workaround for 
`first()` and `take()`: `barrierRdd.collect().head (scala), 
barrierRdd.collect()[0] (python)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21951
  
**[Test build #93956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93956/testReport)**
 for PR 21951 at commit 
[`a5762d7`](https://github.com/apache/spark/commit/a5762d76cbb12d3da0fd4721cea90456bea2a3ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21951
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21951
  
Thanks! Merged to master. 

Please ignore the last commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21951
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1619/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] ExchangeCoordinator broken when dupli...

2018-08-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21754
  
> For example, in the test of this pr, it sets 3 in ExchangeCoordinator;

How can this happen? Join has 2 children so `ExchangeCoordinator` can at 
most have 2 exchanges.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21942: [SPARK-24283][ML] Make ml.StandardScaler skip con...

2018-08-01 Thread hhbyyh
Github user hhbyyh commented on a diff in the pull request:

https://github.com/apache/spark/pull/21942#discussion_r207103066
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala ---
@@ -160,15 +160,89 @@ class StandardScalerModel private[ml] (
   @Since("2.0.0")
   override def transform(dataset: Dataset[_]): DataFrame = {
 transformSchema(dataset.schema, logging = true)
-val scaler = new feature.StandardScalerModel(std, mean, $(withStd), 
$(withMean))
-
-// TODO: Make the transformer natively in ml framework to avoid extra 
conversion.
-val transformer: Vector => Vector = v => 
scaler.transform(OldVectors.fromML(v)).asML
+val transformer: Vector => Vector = v => transform(v)
 
 val scale = udf(transformer)
 dataset.withColumn($(outputCol), scale(col($(inputCol
   }
 
+  /**
+   * Since `shift` will be only used in `withMean` branch, we have it as
+   * `lazy val` so it will be evaluated in that branch. Note that we don't
+   * want to create this array multiple times in `transform` function.
+   */
+  private lazy val shift: Array[Double] = mean.toArray
+
+   /**
+* Applies standardization transformation on a vector.
+*
+* @param vector Vector to be standardized.
+* @return Standardized vector. If the std of a column is zero, it will 
return default `0.0`
+* for the column with zero std.
+*/
+  @Since("2.3.0")
+  def transform(vector: Vector): Vector = {
--- End diff --

private[spark]?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21927: [SPARK-24820][SPARK-24821][Core] Fail fast when s...

2018-08-01 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/21927#discussion_r207107889
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -340,6 +340,22 @@ class DAGScheduler(
 }
   }
 
+  /**
+   * Check to make sure we don't launch a barrier stage with unsupported 
RDD chain pattern. The
+   * following patterns are not supported:
+   * 1. Ancestor RDDs that have different number of partitions from the 
resulting RDD (eg.
+   * union()/coalesce()/first()/PartitionPruningRDD);
--- End diff --

`coalesce()` is not safe when shuffle is false because it may cause the 
number of tasks doesn't match the number of partitions for the RDD that uses 
barrier mode.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207107551
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+  barrierEpochByStageIdAndAttempt.get((stageId

[GitHub] spark issue #21943: [SPARK-24795][Core][FOLLOWUP] Kill all running tasks whe...

2018-08-01 Thread mengxr
Github user mengxr commented on the issue:

https://github.com/apache/spark/pull/21943
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93927/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #93927 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93927/testReport)**
 for PR 21469 at commit 
[`ed072fc`](https://github.com/apache/spark/commit/ed072fcf057f982275d0daf69787ed812f03e87b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21960
  
Please close this and proceed in #20838. I already approved your PR roughly 
a week ago.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21754: [SPARK-24705][SQL] ExchangeCoordinator broken whe...

2018-08-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207106829
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeCoordinatorSuite.scala
 ---
@@ -278,6 +278,25 @@ class ExchangeCoordinatorSuite extends SparkFunSuite 
with BeforeAndAfterAll {
 try f(spark) finally spark.stop()
   }
 
+  def withSparkSession(pairs: (String, String)*)(f: SparkSession => Unit): 
Unit = {
--- End diff --

why do we need it? we can still call the old `withSparkSession` and set 
confs in the body.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21961
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21754: [SPARK-24705][SQL] ExchangeCoordinator broken whe...

2018-08-01 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21754#discussion_r207106352
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/Exchange.scala 
---
@@ -52,6 +52,14 @@ case class ReusedExchangeExec(override val output: 
Seq[Attribute], child: Exchan
   // Ignore this wrapper for canonicalizing.
   override def doCanonicalize(): SparkPlan = child.canonicalized
 
+  override protected def doPrepare(): Unit = {
+child match {
+  case shuffleExchange @ ShuffleExchangeExec(_, _, Some(coordinator)) 
=>
+coordinator.registerExchange(shuffleExchange)
--- End diff --

updated


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21951
  
LGTM. shall we create a JIRA ticket to apply this to other 
`DeclarativeAggregate`s?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207106350
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+  barrierEpochByStageIdAndAttempt.get((stageId

[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21961
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21959: [SPARK-23698] Define xrange() for Python 3 in dumpdata_s...

2018-08-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21959
  
I am going to review and merge that one soon. It doesn't need to open 
multiple PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21961
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21951: [SPARK-24957][SQL][FOLLOW-UP] Clean the code for AVERAGE

2018-08-01 Thread rednaxelafx
Github user rednaxelafx commented on the issue:

https://github.com/apache/spark/pull/21951
  
LGTM as well. Thanks a lot!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21961: Spark 20597

2018-08-01 Thread Satyajitv
GitHub user Satyajitv opened a pull request:

https://github.com/apache/spark/pull/21961

Spark 20597

## What changes were proposed in this pull request?
   
(Please fill in changes proposed in this fix)
Default topic first checks if TOPIC_OPTION_KEY is provided,
 if TOPIC_OPTION_KEY is provided then
defaulttopic=TOPIC_OPTION_KEY
 else  TOPIC_OPTION_KEY is not provided then
 defaulttopic=PATH_OPTION_KEY
## How was this patch tested?
Have tested the code in local, but would start writing tests once the 
approach is confirmed by @jaceklaskowski, as I am expecting change requests.

PF more details on https://issues.apache.org/jira/browse/SPARK-20597

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Satyajitv/spark SPARK-20597

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21961.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21961


commit d4e1ed0c25121ad5bf24cfe137e2ee1bff430c94
Author: Satyajit Vegesna 
Date:   2018-08-02T05:12:57Z

SPARK-20597 KafkaSourceProvider falls back on path as synonym for topic

commit 381e66fa0bdd14b5754d8d81710021714e5fc031
Author: Satyajit Vegesna 
Date:   2018-08-02T05:22:07Z

Added parameters that were mistakenly taken out in previous commit

commit 6dc893a681721b51e61a9df099ae8f2c865c38c1
Author: Satyajit Vegesna 
Date:   2018-08-02T05:24:13Z

Added parameters that were mistakenly taken out in previous commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21961: Spark 20597

2018-08-01 Thread holdensmagicalunicorn
Github user holdensmagicalunicorn commented on the issue:

https://github.com/apache/spark/pull/21961
  
@Satyajitv, thanks! I am a bot who has found some folks who might be able 
to help with the review:@tdas, @zsxwing and @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207106000
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+  barrierEpochByStageIdAndAttempt.get((stageId, 

[GitHub] spark issue #17185: [SPARK-19602][SQL] Support column resolution of fully qu...

2018-08-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17185
  
overall LGTM, my major concern is how to do O(1) lookup for the 3 part name


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17185: [SPARK-19602][SQL] Support column resolution of f...

2018-08-01 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17185#discussion_r207105608
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala
 ---
@@ -71,19 +71,27 @@ trait NamedExpression extends Expression {
* multiple qualifiers, it is possible that there are other possible way 
to refer to this
* attribute.
*/
-  def qualifiedName: String = (qualifier.toSeq :+ name).mkString(".")
+  def qualifiedName: String = {
+if (qualifier.isDefined) {
+  (qualifier.get :+ name).mkString(".")
+} else {
+  name
+}
+  }
 
   /**
* Optional qualifier for the expression.
+   * Qualifier can also contain the fully qualified information, for e.g, 
Sequence of string
+   * containing the database and the table name
*
* For now, since we do not allow using original table name to qualify a 
column name once the
* table is aliased, this can only be:
*
* 1. Empty Seq: when an attribute doesn't have a qualifier,
*e.g. top level attributes aliased in the SELECT clause, or column 
from a LocalRelation.
-   * 2. Single element: either the table name or the alias name of the 
table.
+   * 2. Seq with a Single element: either the table name or the alias name 
of the table.
--- End diff --

3. a seq of 2 elements: database name and table name.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21898: [SPARK-24817][Core] Implement BarrierTaskContext....

2018-08-01 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21898#discussion_r207105381
  
--- Diff: core/src/main/scala/org/apache/spark/BarrierCoordinator.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark
+
+import java.util.{Timer, TimerTask}
+import java.util.concurrent.ConcurrentHashMap
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.rpc.{RpcCallContext, RpcEnv, ThreadSafeRpcEndpoint}
+import org.apache.spark.scheduler.{LiveListenerBus, SparkListener, 
SparkListenerStageCompleted}
+
+/**
+ * A coordinator that handles all global sync requests from 
BarrierTaskContext. Each global sync
+ * request is generated by `BarrierTaskContext.barrier()`, and identified 
by
+ * stageId + stageAttemptId + barrierEpoch. Reply all the blocking global 
sync requests upon
+ * received all the requests for a group of `barrier()` calls. If the 
coordinator doesn't collect
+ * enough global sync requests within a configured time, fail all the 
requests due to timeout.
+ */
+private[spark] class BarrierCoordinator(
+timeout: Int,
+listenerBus: LiveListenerBus,
+override val rpcEnv: RpcEnv) extends ThreadSafeRpcEndpoint with 
Logging {
+
+  private val timer = new Timer("BarrierCoordinator barrier epoch 
increment timer")
+
+  private val listener = new SparkListener {
+override def onStageCompleted(stageCompleted: 
SparkListenerStageCompleted): Unit = {
+  val stageInfo = stageCompleted.stageInfo
+  // Remove internal data from a finished stage attempt.
+  cleanupSyncRequests(stageInfo.stageId, stageInfo.attemptNumber)
+  barrierEpochByStageIdAndAttempt.remove((stageInfo.stageId, 
stageInfo.attemptNumber))
+}
+  }
+
+  // Epoch counter for each barrier (stage, attempt).
+  private val barrierEpochByStageIdAndAttempt = new 
ConcurrentHashMap[(Int, Int), AtomicInteger]
+
+  // Remember all the blocking global sync requests for each barrier 
(stage, attempt).
+  private val syncRequestsByStageIdAndAttempt =
+new ConcurrentHashMap[(Int, Int), ArrayBuffer[RpcCallContext]]
+
+  override def onStart(): Unit = {
+super.onStart()
+listenerBus.addToStatusQueue(listener)
+  }
+
+  /**
+   * Get the array of [[RpcCallContext]]s that correspond to a barrier 
sync request from a stage
+   * attempt.
+   */
+  private def getOrInitSyncRequests(
+  stageId: Int,
+  stageAttemptId: Int,
+  numTasks: Int = 0): ArrayBuffer[RpcCallContext] = {
+val requests = syncRequestsByStageIdAndAttempt.putIfAbsent((stageId, 
stageAttemptId),
+  new ArrayBuffer[RpcCallContext](numTasks))
+if (requests == null) {
+  syncRequestsByStageIdAndAttempt.get((stageId, stageAttemptId))
+} else {
+  requests
+}
+  }
+
+  /**
+   * Clean up the array of [[RpcCallContext]]s that correspond to a 
barrier sync request from a
+   * stage attempt.
+   */
+  private def cleanupSyncRequests(stageId: Int, stageAttemptId: Int): Unit 
= {
+val requests = syncRequestsByStageIdAndAttempt.remove((stageId, 
stageAttemptId))
+if (requests != null) {
+  requests.clear()
+}
+logInfo(s"Removed all the pending barrier sync requests from Stage 
$stageId (Attempt " +
+  s"$stageAttemptId).")
+  }
+
+  /**
+   * Get the barrier epoch that correspond to a barrier sync request from 
a stage attempt.
+   */
+  private def getOrInitBarrierEpoch(stageId: Int, stageAttemptId: Int): 
AtomicInteger = {
+val barrierEpoch = 
barrierEpochByStageIdAndAttempt.putIfAbsent((stageId, stageAttemptId),
+  new AtomicInteger(0))
+if (barrierEpoch == null) {
+  barrierEpochByStageIdAndAttempt.get((stageId

[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21960
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21754
  
**[Test build #93955 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93955/testReport)**
 for PR 21754 at commit 
[`8ae2865`](https://github.com/apache/spark/commit/8ae286560b6fe167fe8deb8f4a30c70cacd2c4a6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21941
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93936/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21754: [SPARK-24705][SQL] Cannot reuse an exchange operator wit...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21754
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1618/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21941: [SPARK-24966][SQL] Implement precedence rules for set op...

2018-08-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21941
  
**[Test build #93936 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93936/testReport)**
 for PR 21941 at commit 
[`e7d69db`](https://github.com/apache/spark/commit/e7d69db7cd0c23d6ee9012b5f48b17e5aeac8d66).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21960: [SPARK-23698] Remove unused definitions of long and unic...

2018-08-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21960
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >