[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12953#issuecomment-217495328
  
**[Test build #58000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58000/consoleFull)**
 for PR 12953 at commit 
[`dbb6632`](https://github.com/apache/spark/commit/dbb663222ba379fbf0b846e2342173f0f0a0ecef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13370][SQL] Require whitespace between ...

2016-05-06 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/12897#issuecomment-217494270
  
@yhuai I checked the most recent 1.6 branch. They both interpret `1.0L` as 
`1.0 as L`.

## SQL:
```scala
scala> sql("select 1.0L").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias(1.0 AS L#2)]
+- OneRowRelation$
```

## Hive:
```scala
scala> sql("select 1.0L").explain(true)
== Parsed Logical Plan ==
'Project [unresolvedalias(1.0 AS L#0)]
+- OneRowRelation$
```

I am leaning towards not changing the behavior. What is your opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12949#issuecomment-217493037
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12949#issuecomment-217493039
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58001/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12949#issuecomment-217492842
  
**[Test build #58001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58001/consoleFull)**
 for PR 12949 at commit 
[`167a7d6`](https://github.com/apache/spark/commit/167a7d6d8ad13a9d754e22dbd75cd9a16e9d1a56).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...

2016-05-06 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/12949#discussion_r62354994
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
 ---
@@ -239,8 +239,13 @@ case class DataSource(
 }
   }
 
-  /** Create a resolved [[BaseRelation]] that can be used to read data 
from this [[DataSource]] */
-  def resolveRelation(): BaseRelation = {
+  /**
+   * Create a resolved [[BaseRelation]] that can be used to read data from 
or write data into this
+   * [[DataSource]]
+   *
+   * @param checkPathExist A flag to indicate whether to check the 
existence of path or not.
+   */
+  def resolveRelation(checkPathExist: Boolean = true): BaseRelation = {
--- End diff --

I am not sure I understand this change. For a `FileFormat`, when do we not 
need to check if the path exists?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12952#issuecomment-217492073
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12952#issuecomment-217492075
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58002/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12952#issuecomment-217491949
  
**[Test build #58002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58002/consoleFull)**
 for PR 12952 at commit 
[`9412b9b`](https://github.com/apache/spark/commit/9412b9b7743cda7df22e7834059e7c7be2e1eb85).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13370][SQL] Require whitespace between ...

2016-05-06 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/12897#issuecomment-217491293
  
@hvanhovell What is the behavior of 1.6? Does 1.6 treat `L` as a suffix for 
a bigint literal?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

2016-05-06 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/12309#issuecomment-217489895
  
I don't understand `./dev/mima` passes on my laptop. I also verified that 
`./dev/mima` fails without my changes in `MimaExcludes.scala`. Something weird 
with the Jenkins build?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12893#issuecomment-217488982
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57998/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12893#issuecomment-217488980
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12893#issuecomment-217488726
  
**[Test build #57998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57998/consoleFull)**
 for PR 12893 at commit 
[`e408fdf`](https://github.com/apache/spark/commit/e408fdf43c207a189f6316a80599e7f54eb832b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12952#discussion_r62352447
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -394,7 +405,7 @@ class Dataset[T] private[sql](
* @group basic
* @since 1.6.0
*/
-  def schema: StructType = queryExecution.analyzed.schema
+  def schema: StructType = resolvedTEncoder.schema
--- End diff --

I'm kind of worried about it. We can't guarantee encoder's schema is always 
same with plan's schema(in this PR we add a project to try to make them 
consistent, but it can't handle inner field). If they are different, users may 
select a un-exist column which exists in schema.

cc @marmbrus 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12871#issuecomment-217485501
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12871#issuecomment-217485502
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57999/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12871#issuecomment-217485371
  
**[Test build #57999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57999/consoleFull)**
 for PR 12871 at commit 
[`0261d25`](https://github.com/apache/spark/commit/0261d252f8baa1e823a97261e111a3e93019a0dc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-05-06 Thread RussellSpitzer
Github user RussellSpitzer commented on the pull request:

https://github.com/apache/spark/pull/10655#issuecomment-217483240
  
Sorry I forgot about this, I'll clean this up tomorrow and get it ready


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15183][Streaming] Adding outputMode to ...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12958#issuecomment-217482035
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15182] [ML] Copy MLlib doc to ML: ml.fe...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12957#issuecomment-217481568
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15182] [ML] Copy MLlib doc to ML: ml.fe...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12957#issuecomment-217481571
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58005/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15182] [ML] Copy MLlib doc to ML: ml.fe...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12957#issuecomment-217481463
  
**[Test build #58005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58005/consoleFull)**
 for PR 12957 at commit 
[`2cc977e`](https://github.com/apache/spark/commit/2cc977e15bfe86c0944fab9fb3f0609339d580a0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-15183

2016-05-06 Thread agsachin
GitHub user agsachin opened a pull request:

https://github.com/apache/spark/pull/12958

SPARK-15183

## What changes were proposed in this pull request?

while experimenting with structure streaming. I found that mode() is used 
for non-continuous queries while outputMode() is used for continuous queries.
ouputMode is not defined, so I have written the some raw implementation and 
test cases just to make sure the streaming app works 

Note:-
/** Start a query */
  private[sql] def startQuery(
  name: String,
  checkpointLocation: String,
  df: DataFrame,
  sink: Sink,
  trigger: Trigger = ProcessingTime(0),
  triggerClock: Clock = new SystemClock(),
  outputMode: OutputMode = Append): ContinuousQuery = {
As per me outputMode should be defined before triggerClock, the constructor 
with  outputMode defined will be used more often then triggerClock.
I have added triggerClock() method also 


## How was this patch tested?

using unit test locally


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agsachin/spark streaming

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12958.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12958


commit b418b4526e57b1ef437b9dab7779c3be1a5fd497
Author: sachin aggarwal 
Date:   2016-05-06T15:47:16Z

SPARK-15183




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15182] [ML] Copy MLlib doc to ML: ml.fe...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12957#issuecomment-217478950
  
**[Test build #58005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58005/consoleFull)**
 for PR 12957 at commit 
[`2cc977e`](https://github.com/apache/spark/commit/2cc977e15bfe86c0944fab9fb3f0609339d580a0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15182] [ML] Copy MLlib doc to ML: ml.fe...

2016-05-06 Thread hhbyyh
GitHub user hhbyyh opened a pull request:

https://github.com/apache/spark/pull/12957

[SPARK-15182] [ML] Copy MLlib doc to ML: ml.feature

## What changes were proposed in this pull request?

We should now begin copying algorithm details from the spark.mllib guide to 
spark.ml as needed, rather than just linking back to the corresponding 
algorithms in the spark.mllib user guide.

## How was this patch tested?

manual review for doc.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hhbyyh/spark tfidfdoc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12957.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12957


commit 2cc977e15bfe86c0944fab9fb3f0609339d580a0
Author: Yuhao Yang 
Date:   2016-05-06T15:38:07Z

copy doc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11714][Mesos] Make Spark on Mesos honor...

2016-05-06 Thread skonto
Github user skonto commented on the pull request:

https://github.com/apache/spark/pull/11157#issuecomment-217477341
  
@mgummelt ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10953#issuecomment-217477017
  
@markgrover Mind adding `Closes #10681` in the PR description so that 
merging script can close that together?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15180][SQL] Support subexpression elimi...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12956#issuecomment-217474807
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11714][Mesos] Make Spark on Mesos honor...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11157#issuecomment-217475441
  
**[Test build #58004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58004/consoleFull)**
 for PR 11157 at commit 
[`9c7cf33`](https://github.com/apache/spark/commit/9c7cf332ccf350e721d25b0070b7c2637261ccaf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15180][SQL] Support subexpression elimi...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12956#issuecomment-217474810
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57997/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15180][SQL] Support subexpression elimi...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12956#issuecomment-217474554
  
**[Test build #57997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57997/consoleFull)**
 for PR 12956 at commit 
[`55e43ef`](https://github.com/apache/spark/commit/55e43ef9f25c68d0c3773f36156b34d42a9baedc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SubExprEliminationState(isNull: String, value: String, 
exprCode: Option[ExprCode])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12952#issuecomment-217473744
  
**[Test build #58002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58002/consoleFull)**
 for PR 12952 at commit 
[`9412b9b`](https://github.com/apache/spark/commit/9412b9b7743cda7df22e7834059e7c7be2e1eb85).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12949#issuecomment-217473754
  
**[Test build #58003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58003/consoleFull)**
 for PR 12949 at commit 
[`8f5b688`](https://github.com/apache/spark/commit/8f5b688754bc493548ecab714cb1a56136c3b02e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12954#issuecomment-217472088
  
**[Test build #57994 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57994/consoleFull)**
 for PR 12954 at commit 
[`f0871c9`](https://github.com/apache/spark/commit/f0871c921285a05602cf566c9f2c23901224d73e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-06 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12952#discussion_r62345191
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -463,7 +463,12 @@ class SparkSession private(
*/
   @Experimental
   def range(start: Long, end: Long, step: Long, numPartitions: Int): 
Dataset[java.lang.Long] = {
-new Dataset(self, Range(start, end, step, numPartitions), 
Encoders.LONG)
+val encoder = {
+  val schema = StructType(Seq(StructField("id", LongType, nullable = 
false)))
+  ExpressionEncoder[java.lang.Long]().copy[java.lang.Long](schema = 
schema)
+}
+
+new Dataset(self, Range(start, end, step, numPartitions), encoder)
--- End diff --

We are now using the encoder schema as Dataset schema, thus we need to 
rename the default primitive encoder column name "value" to the desired name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15112][SQL] Allows query plan schema an...

2016-05-06 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/12952#discussion_r62345241
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
---
@@ -502,7 +507,7 @@ class SparkSession private(
 
   /*  *
|  Catalog-related methods |
-   * - -- */
+   *  */
--- End diff --

Mysterious missing space...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15155][Mesos] Optionally ignore default...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12933#issuecomment-217472869
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15155][Mesos] Optionally ignore default...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12933#issuecomment-217472873
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57993/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15155][Mesos] Optionally ignore default...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12933#issuecomment-217472670
  
**[Test build #57993 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57993/consoleFull)**
 for PR 12933 at commit 
[`d2b7ad4`](https://github.com/apache/spark/commit/d2b7ad444e02b947f4a7264018b4e48610731408).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14939][SQL] Add FoldablePropagation opt...

2016-05-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/12719#issuecomment-217472671
  
Hi, @cloud-fan .
Now, it's ready for review.
Could you review this when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12954#issuecomment-217472331
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57994/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12954#issuecomment-217472325
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improve the physical p...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12947#issuecomment-217471682
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improve the physical p...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12947#issuecomment-217471690
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57995/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improve the physical p...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12947#issuecomment-217471315
  
**[Test build #57995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57995/consoleFull)**
 for PR 12947 at commit 
[`438d70e`](https://github.com/apache/spark/commit/438d70e02cfaf9e3b6beccc8d3a8d0c65f7499da).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12953#issuecomment-217468171
  
**[Test build #58000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58000/consoleFull)**
 for PR 12953 at commit 
[`dbb6632`](https://github.com/apache/spark/commit/dbb663222ba379fbf0b846e2342173f0f0a0ecef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15173][SQL] DataFrameWriter.insertInto ...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12949#issuecomment-217468181
  
**[Test build #58001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58001/consoleFull)**
 for PR 12949 at commit 
[`167a7d6`](https://github.com/apache/spark/commit/167a7d6d8ad13a9d754e22dbd75cd9a16e9d1a56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-06 Thread cenyuhai
Github user cenyuhai commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-217467789
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...

2016-05-06 Thread techaddict
Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/12953#discussion_r62342197
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -389,11 +389,10 @@ private[spark] class TaskSchedulerImpl(
 // (taskId, stageId, stageAttemptId, accumUpdates)
 val accumUpdatesWithTaskIds: Array[(Long, Int, Int, 
Seq[AccumulableInfo])] = synchronized {
   accumUpdates.flatMap { case (id, updates) =>
-// We should call `acc.value` here as we are at driver side now.  
However, the RPC framework
+// We call `acc.value` here as we are at driver side now.  
However, the RPC framework
--- End diff --

@srowen @cloud-fan Done 👍 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread ntietz
Github user ntietz commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217467133
  
Good call, I will add it to the Java version as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-06 Thread cenyuhai
Github user cenyuhai commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-217466944
  
@andrewor14 I alter the code as what you said, but the test failed because 
of timeout. It seems like that it is none of my business...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-06 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/12893#issuecomment-217463479
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11714][Mesos] Make Spark on Mesos honor...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11157#issuecomment-217465989
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57992/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11714][Mesos] Make Spark on Mesos honor...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11157#issuecomment-217465986
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217465951
  
How about doc'ing the Java version as well in JavaRDDLike.scala?
You're welcome to expand on the java/scaladoc of lots of these methods. 
It'd be nicer to have more complete doc of method args and return type for such 
a central API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11714][Mesos] Make Spark on Mesos honor...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11157#issuecomment-217465665
  
**[Test build #57992 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57992/consoleFull)**
 for PR 11157 at commit 
[`dba3e34`](https://github.com/apache/spark/commit/dba3e34c826ddfcaa096254e1f0d230c49b4349d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15093][SQL] create/delete/rename direct...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12871#issuecomment-217465398
  
**[Test build #57999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57999/consoleFull)**
 for PR 12871 at commit 
[`0261d25`](https://github.com/apache/spark/commit/0261d252f8baa1e823a97261e111a3e93019a0dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15051] [SQL] Create a TypedColumn alias...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12893#issuecomment-217465397
  
**[Test build #57998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57998/consoleFull)**
 for PR 12893 at commit 
[`e408fdf`](https://github.com/apache/spark/commit/e408fdf43c207a189f6316a80599e7f54eb832b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-12639 SQL Improve Explain for Datasource...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10655#issuecomment-217465276
  
ping @RussellSpitzer


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...

2016-05-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12953#discussion_r62340446
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -389,11 +389,10 @@ private[spark] class TaskSchedulerImpl(
 // (taskId, stageId, stageAttemptId, accumUpdates)
 val accumUpdatesWithTaskIds: Array[(Long, Int, Int, 
Seq[AccumulableInfo])] = synchronized {
   accumUpdates.flatMap { case (id, updates) =>
-// We should call `acc.value` here as we are at driver side now.  
However, the RPC framework
+// We call `acc.value` here as we are at driver side now.  
However, the RPC framework
--- End diff --

Since we have no `localValue` anymore, this comment can be removed entirely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/10943#issuecomment-217461632
  
ping @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15180][SQL] Support subexpression elimi...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12956#issuecomment-217457647
  
**[Test build #57997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57997/consoleFull)**
 for PR 12956 at commit 
[`55e43ef`](https://github.com/apache/spark/commit/55e43ef9f25c68d0c3773f36156b34d42a9baedc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15180][SQL] Support subexpression elimi...

2016-05-06 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/12956#issuecomment-217456943
  

Codes:

val ds = Seq(("a", 10), ("a", 20), ("b", 1), ("b", 2), ("c", 
1)).toDS().filter("_2 + 1 > 5").filter("_2 + 1 > 20")
ds.collect()

Generated codes:

/* 030 */   protected void processNext() throws java.io.IOException {   
   
/* 031 */ /*** PRODUCE: Filter (((_2#3 + 1) > 20) && ((_2#3 + 1) > 
5)) */
/* 032 */
/* 033 */ /*** PRODUCE: INPUT */
/* 034 */
/* 035 */ while (inputadapter_input.hasNext()) {
/* 036 */   InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
/* 037 */   /*** CONSUME: Filter (((_2#3 + 1) > 20) && ((_2#3 + 1) 
> 5)) */
/* 038 */
/* 039 */   /* ((input[1, int] + 1) > 20) */
/* 040 */   // Common expression
/* 041 */   /* (input[1, int] + 1) */
/* 042 */   /* input[1, int] */
/* 043 */   /* input[1, int] */
/* 044 */   int inputadapter_value1 = inputadapter_row.getInt(1);
/* 045 */
/* 046 */   int filter_value = -1;
/* 047 */   filter_value = inputadapter_value1 + 1;
/* 048 */
/* 049 */   /* (input[1, int] + 1) */
/* 050 */
/* 051 */   boolean filter_value3 = false;
/* 052 */   filter_value3 = filter_value > 20;
/* 053 */   if (!filter_value3) continue;
/* 054 */   /* ((input[1, int] + 1) > 5) */
/* 055 */   /* (input[1, int] + 1) */
/* 056 */
/* 057 */   boolean filter_value5 = false;
/* 058 */   filter_value5 = filter_value > 5;
/* 059 */   if (!filter_value5) continue;
/* 060 */
/* 061 */   filter_numOutputRows.add(1);
/* 062 */
/* 063 */   /*** CONSUME: WholeStageCodegen */
/* 064 */
/* 065 */   /* input[0, string] */
/* 066 */   boolean inputadapter_isNull = 
inputadapter_row.isNullAt(0);
/* 067 */   UTF8String inputadapter_value = inputadapter_isNull ? 
null : (inputadapter_row.getUTF8String(0));
/* 068 */   filter_holder.reset();
/* 069 */
/* 070 */   filter_rowWriter.zeroOutNullBytes();
/* 071 */
/* 072 */   if (inputadapter_isNull) {
/* 073 */ filter_rowWriter.setNullAt(0);
/* 074 */   } else {
/* 075 */ filter_rowWriter.write(0, inputadapter_value);
/* 076 */   }
/* 077 */
/* 078 */   filter_rowWriter.write(1, inputadapter_value1);
/* 079 */   filter_result.setTotalSize(filter_holder.totalSize());
/* 080 */   append(filter_result);
/* 081 */   if (shouldStop()) return;
/* 082 */ }
/* 083 */   }
/* 084 */ }



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11863#issuecomment-217456649
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11863#issuecomment-217456641
  
**[Test build #57996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57996/consoleFull)**
 for PR 11863 at commit 
[`544bf88`](https://github.com/apache/spark/commit/544bf888984e20dadae852faa8ca1dd26fc416e7).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11863#issuecomment-217456651
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57996/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-217456515
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-217456518
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57982/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11863#issuecomment-217456405
  
**[Test build #57996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57996/consoleFull)**
 for PR 11863 at commit 
[`544bf88`](https://github.com/apache/spark/commit/544bf888984e20dadae852faa8ca1dd26fc416e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13566][CORE] Avoid deadlock between Blo...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11546#issuecomment-217456396
  
**[Test build #57982 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57982/consoleFull)**
 for PR 11546 at commit 
[`27fd070`](https://github.com/apache/spark/commit/27fd07058112dad0760c7fa5480fb43d4f046d96).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15180][SQL] Support subexpression elimi...

2016-05-06 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/12956

[SPARK-15180][SQL] Support subexpression elimination in Fliter

## What changes were proposed in this pull request?

This patch tries to add the support of subexpression elimination in 
wholestage codegen `Fliter`.

Because the predicate expressions are evaluated in `Filter` with an 
optimized ordering that reduces unnecessary evaluation as much as possible, we 
follow this ordering when doing subexpression elimination too.

Due to that, we can't just extract all common subexpressions and evaluate 
them first. Instead, we extract common subexpressions but don't evaluate them. 
We evaluate common subexpressions only when the predicate expressions 
containing them are evaluated.

## How was this patch tested?

Existing tests.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 subexpr-filter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12956


commit 55e43ef9f25c68d0c3773f36156b34d42a9baedc
Author: Liang-Chi Hsieh 
Date:   2016-05-06T14:22:49Z

Support subexpression elimination in Fliter.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/11863#discussion_r62336356
  
--- Diff: 
external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala
 ---
@@ -0,0 +1,259 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka
+
+import java.{ util => ju }
+
+import scala.collection.mutable.ArrayBuffer
+import scala.reflect.{classTag, ClassTag}
+
+import org.apache.kafka.clients.consumer.{ ConsumerConfig, ConsumerRecord }
+import org.apache.kafka.common.TopicPartition
+
+import org.apache.spark.{Partition, SparkContext, SparkException, 
TaskContext}
+import org.apache.spark.internal.Logging
+import org.apache.spark.partial.{BoundedDouble, PartialResult}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.scheduler.ExecutorCacheTaskLocation
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * A batch-oriented interface for consuming from Kafka.
+ * Starting and ending offsets are specified in advance,
+ * so that you can control exactly-once semantics.
+ * @param kafkaParams Kafka
+ * http://kafka.apache.org/documentation.htmll#newconsumerconfigs";>
+ * configuration parameters. Requires "bootstrap.servers" to be set
+ * with Kafka broker(s) specified in host1:port1,host2:port2 form.
+ * @param offsetRanges offset ranges that define the Kafka data belonging 
to this RDD
+ */
+
+class KafkaRDD[
+  K: ClassTag,
+  V: ClassTag] private[spark] (
+sc: SparkContext,
+val kafkaParams: ju.Map[String, Object],
+val offsetRanges: Array[OffsetRange],
+val preferredHosts: ju.Map[TopicPartition, String]
+) extends RDD[ConsumerRecord[K, V]](sc, Nil) with Logging with 
HasOffsetRanges {
+
+  assert("none" ==
+
kafkaParams.get(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG).asInstanceOf[String],
+ConsumerConfig.AUTO_OFFSET_RESET_CONFIG +
+  " must be set to none for executor kafka params, else messages may 
not match offsetRange")
+
+  assert(false ==
--- End diff --

The override is done in the companion object not in this constructor.  And 
it's still possible for subclasses to construct this.  The real question is 
whether you'd ever want to allow executors to mess with offsets, and I'm pretty 
sure the answer is no.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/11863#discussion_r62336181
  
--- Diff: 
external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streaming.kafka
+
+import java.{ util => ju }
+import java.util.concurrent.ConcurrentLinkedQueue
+import java.util.concurrent.atomic.AtomicReference
+
+import scala.annotation.tailrec
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+import scala.reflect.ClassTag
+
+import org.apache.kafka.clients.consumer._
+import org.apache.kafka.common.{ PartitionInfo, TopicPartition }
+
+import org.apache.spark.SparkException
+import org.apache.spark.internal.Logging
+import org.apache.spark.storage.StorageLevel
+import org.apache.spark.streaming.{StreamingContext, Time}
+import org.apache.spark.streaming.dstream._
+import org.apache.spark.streaming.scheduler.{RateController, 
StreamInputInfo}
+import org.apache.spark.streaming.scheduler.rate.RateEstimator
+
+/**
+ *  A stream of {@link org.apache.spark.streaming.kafka.KafkaRDD} where
+ * each given Kafka topic/partition corresponds to an RDD partition.
+ * The spark configuration spark.streaming.kafka.maxRatePerPartition gives 
the maximum number
+ *  of messages
+ * per second that each '''partition''' will accept.
+ * Starting offsets are specified in advance,
+ * and this DStream is not responsible for committing offsets,
+ * so that you can control exactly-once semantics.
+ * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#newconsumerconfigs";>
+ * configuration parameters.
+ *   Requires  "bootstrap.servers" to be set with Kafka broker(s),
+ *   NOT zookeeper servers, specified in host1:port1,host2:port2 form.
+ */
+
+class DirectKafkaInputDStream[K: ClassTag, V: ClassTag] private[spark] (
+_ssc: StreamingContext,
+preferredHosts: ju.Map[TopicPartition, String],
+executorKafkaParams: ju.Map[String, Object],
+driverConsumer: () => Consumer[K, V]
+  ) extends InputDStream[ConsumerRecord[K, V]](_ssc) with Logging {
+
+  @transient private var kc: Consumer[K, V] = null
+  def consumer(): Consumer[K, V] = this.synchronized {
+if (null == kc) {
+  kc = driverConsumer()
+}
+kc
+  }
+  consumer()
+
+  override def persist(newLevel: StorageLevel): DStream[ConsumerRecord[K, 
V]] = {
+log.error("Kafka ConsumerRecord is not serializable. " +
+  "Use .map to extract fields before calling .persist or .window")
+super.persist(newLevel)
+  }
+
+  protected def getBrokers = {
+val c = consumer
+val result = new ju.HashMap[TopicPartition, String]()
+val hosts = new ju.HashMap[TopicPartition, String]()
+val assignments = c.assignment().iterator()
+while (assignments.hasNext()) {
+  val tp: TopicPartition = assignments.next()
+  if (null == hosts.get(tp)) {
+val infos = c.partitionsFor(tp.topic).iterator()
+while (infos.hasNext()) {
+  val i = infos.next()
+  hosts.put(new TopicPartition(i.topic(), i.partition()), 
i.leader.host())
+}
+  }
+  result.put(tp, hosts.get(tp))
+}
+result
+  }
+
+  protected def getPreferredHosts: ju.Map[TopicPartition, String] = {
+if (preferredHosts == DirectKafkaInputDStream.preferBrokers) {
+  getBrokers
+} else {
+  preferredHosts
+}
+  }
+
+  // Keep this consistent with how other streams are named (e.g. "Flume 
polling stream [2]")
+  private[streaming] override def name: String = s"Kafka beta direct 
stream [$id]"
+
+  protected[streaming] override val checkpointData =
+new DirectKafkaInputDStreamCheckpointData
+
+
+  

[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12955#issuecomment-217452079
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Docs] Added Scaladoc for countApprox and coun...

2016-05-06 Thread ntietz
GitHub user ntietz opened a pull request:

https://github.com/apache/spark/pull/12955

[Docs] Added Scaladoc for countApprox and countByValueApprox parameters

This pull request simply adds Scaladoc documentation of the parameters for 
countApprox and countByValueApprox.

This is an important documentation change, as it clarifies what should be 
passed in for the timeout. Without units, this was previously unclear.

I did not open a JIRA ticket per my understanding of the project 
contribution guidelines; as they state, the description in the ticket would be 
essentially just what is in the PR. If I should open one, let me know and I 
will do so.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ntietz/spark rdd-countapprox-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12955.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12955


commit a28014dc45981b79df6e6c18f473565eb740638c
Author: Nicholas Tietz 
Date:   2016-05-06T14:07:21Z

Added Scaladoc for countApprox and countByValueApprox




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13370][SQL] Require whitespace between ...

2016-05-06 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/12897#issuecomment-217451180
  
@rxin @yhuai what is the next step? Are we changing the behavior? Or 
keeping it as it is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15074][Shuffle] Cache shuffle index fil...

2016-05-06 Thread sitalkedia
Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/12944#discussion_r62334156
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ShuffleIndexRecord.java
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+/**
+ * Contains offset and length of the shuffle block data.
+ */
+public class ShuffleIndexRecord {
+  private final long offset;
+  private final long length;
+
+  public ShuffleIndexRecord(long offset, long length) {
+this.offset = offset;
+this.length = length;
+  }
+
+  public long getOffset() {
+return offset;
+  }
+
+  public long getLength() {
+return length;
+  }
+}
--- End diff --

will fix, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15074][Shuffle] Cache shuffle index fil...

2016-05-06 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/12944#issuecomment-217450001
  
@holdenk -  `TransportConf` is not specific to the , it is used to create 
Transport client in other modules as well. Since number of index cache entry is 
very specific to the `ShuffleService`, I did not want to expose that as an api 
in the  `TransportConf`.  Let me know what you think about it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15119] [ML] Add a validator to Decision...

2016-05-06 Thread dominik-jastrzebski
Github user dominik-jastrzebski commented on the pull request:

https://github.com/apache/spark/pull/12895#issuecomment-217449436
  
Ok, I can check the other validators in `treeParams.scala`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12953#issuecomment-217449336
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57990/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12953#issuecomment-217449332
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15087][MINOR][DOC] Follow Up: Fix the C...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12953#issuecomment-217449102
  
**[Test build #57990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57990/consoleFull)**
 for PR 12953 at commit 
[`032e042`](https://github.com/apache/spark/commit/032e042b55fbd8a5fcc932e29c6654d68b499c5b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improve the physical p...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12947#issuecomment-217448700
  
**[Test build #57995 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57995/consoleFull)**
 for PR 12947 at commit 
[`438d70e`](https://github.com/apache/spark/commit/438d70e02cfaf9e3b6beccc8d3a8d0c65f7499da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14613][ML] Add @Since into the matrix a...

2016-05-06 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/12416#issuecomment-217447705
  
@pravingadakh go ahead - would be good to get this in for 2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15074][Shuffle] Cache shuffle index fil...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12944#discussion_r62331757
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ShuffleIndexRecord.java
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+/**
+ * Contains offset and length of the shuffle block data.
+ */
+public class ShuffleIndexRecord {
+  private final long offset;
+  private final long length;
+
+  public ShuffleIndexRecord(long offset, long length) {
+this.offset = offset;
+this.length = length;
+  }
+
+  public long getOffset() {
+return offset;
+  }
+
+  public long getLength() {
+return length;
+  }
+}
--- End diff --

And a newline here maybe :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12954#issuecomment-217447455
  
**[Test build #57994 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57994/consoleFull)**
 for PR 12954 at commit 
[`f0871c9`](https://github.com/apache/spark/commit/f0871c921285a05602cf566c9f2c23901224d73e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread hvanhovell
Github user hvanhovell commented on the pull request:

https://github.com/apache/spark/pull/12954#issuecomment-217447302
  
cc @rxin @davies


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/12951#discussion_r62332468
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala ---
@@ -21,6 +21,7 @@ import java.util.concurrent.{ConcurrentHashMap, 
ConcurrentLinkedQueue}
 
 import scala.collection.JavaConverters._
 import scala.collection.mutable.ArrayBuffer
+import scala.math.{max,min}
--- End diff --

It's a tiny nit, but while here, I usually see `math.max` just written out 
in Scala code; no need to import a standard class's methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15122][SQL] Fix TPC-DS 41 - Normalize p...

2016-05-06 Thread hvanhovell
GitHub user hvanhovell opened a pull request:

https://github.com/apache/spark/pull/12954

[SPARK-15122][SQL] Fix TPC-DS 41 - Normalize predicates before pulling them 
out

## What changes were proposed in this pull request?
The official TPC-DS 41 query currently fails because it contains a scalar 
subquery with a disjunctive correlated predicate (the correlated predicates 
were nested in ORs). This makes the `Analyzer` pull out the entire predicate 
which is wrong and causes the following (correct) analysis exception: `The 
correlated scalar subquery can only contain equality predicates`

This PR fixes this by first simplifing (or normalizing) the correlated 
predicates before pulling them out of the subquery. I have also added a small 
optimizer rule that rewrites correlated scalar subqueries into predicate 
subqueries if they are used in a `Filter` and are wrapped by a predicate. This 
is allows us to use semi joins instead of left outer joins.

## How was this patch tested?
Manual testing on TPC-DS 41, and added a test to SubquerySuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hvanhovell/spark SPARK-15122

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12954.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12954


commit f0871c921285a05602cf566c9f2c23901224d73e
Author: Herman van Hovell 
Date:   2016-05-06T13:39:43Z

Fix TPC-DS 41 - normalize predicates before pulling them out.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1239] Improve fetching of map output st...

2016-05-06 Thread tgravescs
Github user tgravescs commented on a diff in the pull request:

https://github.com/apache/spark/pull/12113#discussion_r62332230
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -428,40 +503,89 @@ private[spark] class MapOutputTrackerMaster(conf: 
SparkConf)
 }
   }
 
+  private def removeBroadcast(bcast: Broadcast[_]): Unit = {
+if (null != bcast) {
+  broadcastManager.unbroadcast(bcast.id,
+removeFromDriver = true, blocking = false)
+}
+  }
+
+  private def clearCachedBroadcast(): Unit = {
+for (cached <- cachedSerializedBroadcast) removeBroadcast(cached._2)
+cachedSerializedBroadcast.clear()
+  }
+
   def getSerializedMapOutputStatuses(shuffleId: Int): Array[Byte] = {
 var statuses: Array[MapStatus] = null
+var retBytes: Array[Byte] = null
 var epochGotten: Long = -1
-epochLock.synchronized {
-  if (epoch > cacheEpoch) {
-cachedSerializedStatuses.clear()
-cacheEpoch = epoch
-  }
-  cachedSerializedStatuses.get(shuffleId) match {
-case Some(bytes) =>
-  return bytes
-case None =>
-  statuses = mapStatuses.getOrElse(shuffleId, Array[MapStatus]())
-  epochGotten = epoch
+
+// Check to see if we have a cached version, returns true if it does
+// and has side effect of setting retBytes.  If not returns false
+// with side effect of setting statuses
+def checkCachedStatuses(): Boolean = {
+  epochLock.synchronized {
+if (epoch > cacheEpoch) {
+  cachedSerializedStatuses.clear()
+  clearCachedBroadcast()
+  cacheEpoch = epoch
+}
+cachedSerializedStatuses.get(shuffleId) match {
+  case Some(bytes) =>
+retBytes = bytes
+true
+  case None =>
+logDebug("cached status not found for : " + shuffleId)
+statuses = mapStatuses.getOrElse(shuffleId, Array[MapStatus]())
+epochGotten = epoch
+false
+}
   }
 }
-// If we got here, we failed to find the serialized locations in the 
cache, so we pulled
-// out a snapshot of the locations as "statuses"; let's serialize and 
return that
-val bytes = MapOutputTracker.serializeMapStatuses(statuses)
-logInfo("Size of output statuses for shuffle %d is %d 
bytes".format(shuffleId, bytes.length))
-// Add them into the table only if the epoch hasn't changed while we 
were working
-epochLock.synchronized {
-  if (epoch == epochGotten) {
-cachedSerializedStatuses(shuffleId) = bytes
+
+if (checkCachedStatuses()) return retBytes
+var shuffleIdLock = shuffleIdLocks.get(shuffleId)
+if (null == shuffleIdLock) {
+  val newLock = new Object()
+  // in general, this condition should be false - but good to be 
paranoid
+  val prevLock = shuffleIdLocks.putIfAbsent(shuffleId, newLock)
--- End diff --

Its purely defensive programming to allow things to work when the 
unexpected happen.  Would you rather have your production job that was running 
for 5 hours throw a null pointer exception or try to fix itself and continue to 
run?
In distributed systems weird things happen and this is processing a message 
from another host/task which you don't have direct control of.  You can get 
network breaks, weird host failures or pauses, etc and a message comes in late 
asking for a shuffle id that isn't there anymore.  
The unregister shuffle which removes the lock for the shuffle id is being 
called from the context cleaner.  So if an RDD goes out of scope and is cleaned 
up the shuffle lock gets removed. As I mention above if some host was slightly 
out of sync and sent a message to fetch that id late, we would throw a null 
pointer exception.  Everything else in the GetMapOutputStatuses handle this 
case and there is actually a test for this (fetching after unregister)  so if 
this line is removed that test fails.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15155][Mesos] Optionally ignore default...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12933#issuecomment-217446062
  
**[Test build #57993 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57993/consoleFull)**
 for PR 12933 at commit 
[`d2b7ad4`](https://github.com/apache/spark/commit/d2b7ad444e02b947f4a7264018b4e48610731408).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Spark-15155][Mesos] Optionally ignore default...

2016-05-06 Thread hellertime
Github user hellertime commented on the pull request:

https://github.com/apache/spark/pull/12933#issuecomment-217445754
  
Rebasing against master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14476][SQL][WIP] Improves the output of...

2016-05-06 Thread clockfly
Github user clockfly commented on the pull request:

https://github.com/apache/spark/pull/12947#issuecomment-217445675
  
@davies 

I made some changes in UI, please check whether it is better now?

```
scala> spark.sql("select * from tt").explain()
== Physical Plan ==
WholeStageCodegen
:  +- BatchedScan HadoopFiles default.tt[id#0L] Format: ParquetFormat, 
InputPaths: file:/home/xzhong10/spark-linux/assembly/spark-warehouse/tt, 
PushedFilters: [], ReadSchema: struct
```


![change_v2](https://cloud.githubusercontent.com/assets/2595532/15074961/8fa7f828-13d4-11e6-95b3-a3df261809f7.jpg)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15080][CORE] Break copyAndReset into co...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12936#issuecomment-217444691
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15080][CORE] Break copyAndReset into co...

2016-05-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12936#issuecomment-217444695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57987/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12951#discussion_r62331312
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala ---
@@ -21,6 +21,7 @@ import java.util.concurrent.{ConcurrentHashMap, 
ConcurrentLinkedQueue}
 
 import scala.collection.JavaConverters._
 import scala.collection.mutable.ArrayBuffer
+import scala.math.{max,min}
--- End diff --

(FYI, running `./dev/run-tests` will trigger style check first. Running the 
first part of this script before submitting more commits will show some 
comments I said here)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15080][CORE] Break copyAndReset into co...

2016-05-06 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12936#issuecomment-21725
  
**[Test build #57987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57987/consoleFull)**
 for PR 12936 at commit 
[`ca9c80c`](https://github.com/apache/spark/commit/ca9c80c4fcc78d571606c9abb0312f24bdc12340).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12951#discussion_r62330957
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Pool.scala ---
@@ -47,6 +49,15 @@ private[spark] class Pool(
   var name = poolName
   var parent: Pool = null
 
+  override def maxShare = {
--- End diff --

Maybe specifying return types? (See Return types in 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15176][Core] Add maxShares setting to P...

2016-05-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/12951#discussion_r62330929
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -98,6 +98,14 @@ private[spark] class TaskSetManager(
   var totalResultSize = 0L
   var calculatedTasks = 0
 
+  override def maxShare = {
--- End diff --

Maybe specifying return types? (See Return types in 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   6   7   >