date:20161017

[GitHub] spark issue #15467: [SPARK-17912][SQL] Refactor code generation to get data ...

2016-10-17 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15467
  
Will do

On Sun, Oct 16, 2016, 11:35 PM Kazuaki Ishizaki 
wrote:

> @ericl , could you please review this? cc
> @davies 
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15316
  
**[Test build #67058 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67058/consoleFull)**
 for PR 15316 at commit 
[`f082643`](https://github.com/apache/spark/commit/f082643130e1f39918e06dbe8a39a9a4f49739f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/15481
  
LGTM, sorry to bring in deadlock issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query i...

2016-10-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15502#discussion_r83586555
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -510,7 +510,7 @@ private[hive] case class InsertIntoHiveTable(
 child: LogicalPlan,
 overwrite: Boolean,
 ifNotExists: Boolean)
-  extends LogicalPlan with Command {
--- End diff --

why it's not a command anymore?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query i...

2016-10-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15502#discussion_r83587470
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -510,7 +510,7 @@ private[hive] case class InsertIntoHiveTable(
 child: LogicalPlan,
 overwrite: Boolean,
 ifNotExists: Boolean)
-  extends LogicalPlan with Command {
--- End diff --

In the Command, this PR requires [the child must be 
empty](https://github.com/gatorsmile/spark/blob/9cfebc523e4b88c3df3ffae8ca5ea92e98a0a616/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Command.scala#L28)
 . Should we convert `InsertIntoHiveTable` to a non-child `Command`? 

Just FYI, in Spark 2.1, `InsertIntoTable` is still a `LogicalPlan` instead 
of a `Command`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15316
  
Also cc @cloud-fan and @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15316
  
cc @hvanhovell @rxin Any more comment about this PR? I assume Spark 2.0.2 
needs it. 

Recently, when we analyzing the JIRA 
https://issues.apache.org/jira/browse/SPARK-17709, we are unable to see the 
plan due the analyzer failure. The users have to manually rebuild it with this 
fix. Then, we can see the failed analyzed plan. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15148
  
**[Test build #67055 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67055/consoleFull)**
 for PR 15148 at commit 
[`66d553a`](https://github.com/apache/spark/commit/66d553a4e2bd8c219c09e17db11962cd49114a24).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67055/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15316
  
LGTM, cc @hvanhovell @rxin for final sign-off


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...

2016-10-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15502
  
LGTM, merging to 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query i...

2016-10-17 Thread gatorsmile

Github user gatorsmile closed the pull request at:

https://github.com/apache/spark/pull/15502


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15502: [SPARK-17892] [SQL] [2.0] Do Not Optimize Query in CTAS ...

2016-10-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15502
  
Thanks! Close it now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15316
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13780: [SPARK-16063][SQL] Add storageLevel to Dataset

2016-10-17 Thread MLnick

Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/13780
  
@marmbrus thanks for merging this. For me there is still an open question 
around handling of deser storage levels on the PySpark side (see my comments 
https://github.com/apache/spark/pull/13780/files#r67833027). Would like to get 
your thoughts on that.

What is blocked on this by the way? (Just to understand).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15508: [DO-NOT-MERGE]

2016-10-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15508
  
cc @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15508: [DO-NOT-MERGE]

2016-10-17 Thread cloud-fan

GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/15508

[DO-NOT-MERGE]



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark api-backport

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15508.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15508


commit e7be7a98eb532fb200774d28c0f8cb96487070e4
Author: Wenchen Fan 
Date:   2016-10-17T07:48:18Z

backport global temp view API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15508: [DO-NOT-MERGE]

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15508
  
**[Test build #67059 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67059/consoleFull)**
 for PR 15508 at commit 
[`e7be7a9`](https://github.com/apache/spark/commit/e7be7a98eb532fb200774d28c0f8cb96487070e4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15509: Merge pull request #1 from apache/master

2016-10-17 Thread someorz

GitHub user someorz opened a pull request:

https://github.com/apache/spark/pull/15509

Merge pull request #1 from apache/master

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.

update

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/someorz/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15509.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15509


commit 65c6538a24b42fd4de934623553155ada76125e7
Author: someorz <24164...@qq.com>
Date:   2016-10-17T07:52:29Z

Merge pull request #1 from apache/master

update




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15509: Merge pull request #1 from apache/master

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15509
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15508: [DO-NOT-MERGE]

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15508
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67059/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15508: [DO-NOT-MERGE]

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15508
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15508: [DO-NOT-MERGE]

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15508
  
**[Test build #67059 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67059/consoleFull)**
 for PR 15508 at commit 
[`e7be7a9`](https://github.com/apache/spark/commit/e7be7a98eb532fb200774d28c0f8cb96487070e4).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15510: new

2016-10-17 Thread codlife

GitHub user codlife opened a pull request:

https://github.com/apache/spark/pull/15510

new 

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/codlife/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15510.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15510


commit 673c29b2166e002d97b914ef8f8316df71fc8be7
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T02:02:21Z

solve spark-17447

commit a4609059350af3ebeb68e5acdfc99daf424a817a
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T02:26:46Z

Update Partitioner.scala

commit 7829bd0a3c66c474ec67f64d1ef043d0e251cdf6
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T12:26:54Z

solve spark-17447

commit 8ddc442fc40f71d85fcaef8e4a721f6b31a5ea5c
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T12:33:19Z

fix  code style

commit 81c0eb9bb45b15dc746d935afa7a3259bb0efcd9
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T15:20:09Z

solve spark-17447

commit f5d1e24d38f4a24f2ebc29214eb1a331846a0b1b
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T15:21:44Z

Update Partitioner.scala

commit e717f65ff419e152a02e359f1241343d48e56977
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T10:45:34Z

Merge branch 'master' of https://github.com/codlife/spark

commit e426ccfabeb4e9baa38bceac893db7d985cfa860
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T10:51:57Z

solve SPARK-17521

commit af1a102192794bce88afab172f3b074e901d8383
Author: codlife 
Date:   2016-09-13T11:32:34Z

Merge pull request #2 from apache/master

NEW

commit 8bfcd6b66950b40953d984fee93e8b16cbf7af05
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T13:00:12Z

fix

commit 379cd5a687d83c363179071a998bd689feb80e71
Author: codlife 
Date:   2016-09-13T13:03:39Z

Update SparkContext.scala

commit f4546685958ccfb12f0d994e7acefaf7d4ece600
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T13:07:51Z

Merge branch 'master' of https://github.com/codlife/spark

commit 1d0d4fcf9d9c06a0d03fc8dd8ba7582e4945231a
Author: codlife <1004910...@qq.com>
Date:   2016-10-17T07:46:39Z

support stand json file




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/9
  
**[Test build #67051 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67051/consoleFull)**
 for PR 9 at commit 
[`b3ea01a`](https://github.com/apache/spark/commit/b3ea01a630f275de37a521f8faa6cb5f5efaae43).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15510: new

2016-10-17 Thread codlife

Github user codlife closed the pull request at:

https://github.com/apache/spark/pull/15510


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/9
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67051/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15509: Merge pull request #1 from apache/master

2016-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15509
  
Could you please close this? It seems mistakenly opened. @someorz 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15509: Merge pull request #1 from apache/master

2016-10-17 Thread someorz

Github user someorz closed the pull request at:

https://github.com/apache/spark/pull/15509


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15481
  
**[Test build #67054 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67054/consoleFull)**
 for PR 15481 at commit 
[`2997ccb`](https://github.com/apache/spark/commit/2997ccb25dd1bb7dfcef44054f91d5d1132cd686).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15481
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15481
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67054/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflict check...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15423
  
**[Test build #67060 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67060/consoleFull)**
 for PR 15423 at commit 
[`cb0691c`](https://github.com/apache/spark/commit/cb0691c01bdf11212d001dcf3e6675c8b36c49ff).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #67061 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67061/consoleFull)**
 for PR 14136 at commit 
[`eb264d9`](https://github.com/apache/spark/commit/eb264d9e42b1b8e3c90ead29c74538307e369681).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [WIP][SPARK-17931]taskScheduler has some unneeded serial...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #67062 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67062/consoleFull)**
 for PR 15505 at commit 
[`bee165a`](https://github.com/apache/spark/commit/bee165a6c731ebea8b92e172807890e5f57c6fc5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/9
  
Please remove `WIP` in the description. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15511: [SPARK-17969]I think it's user unfriendly to proc...

2016-10-17 Thread codlife

GitHub user codlife opened a pull request:

https://github.com/apache/spark/pull/15511

[SPARK-17969]I think it's user unfriendly to process standard json file 
with DataFrame

## What changes were proposed in this pull request?

Currently, with DataFrame API, we can't load standard json file directly, 
so we can provide an override method to process this.

## How was this patch tested?
manual tests

Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/codlife/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15511.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15511


commit 673c29b2166e002d97b914ef8f8316df71fc8be7
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T02:02:21Z

solve spark-17447

commit a4609059350af3ebeb68e5acdfc99daf424a817a
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T02:26:46Z

Update Partitioner.scala

commit 7829bd0a3c66c474ec67f64d1ef043d0e251cdf6
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T12:26:54Z

solve spark-17447

commit 8ddc442fc40f71d85fcaef8e4a721f6b31a5ea5c
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T12:33:19Z

fix  code style

commit 81c0eb9bb45b15dc746d935afa7a3259bb0efcd9
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T15:20:09Z

solve spark-17447

commit f5d1e24d38f4a24f2ebc29214eb1a331846a0b1b
Author: codlife <1004910...@qq.com>
Date:   2016-09-10T15:21:44Z

Update Partitioner.scala

commit e717f65ff419e152a02e359f1241343d48e56977
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T10:45:34Z

Merge branch 'master' of https://github.com/codlife/spark

commit e426ccfabeb4e9baa38bceac893db7d985cfa860
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T10:51:57Z

solve SPARK-17521

commit af1a102192794bce88afab172f3b074e901d8383
Author: codlife 
Date:   2016-09-13T11:32:34Z

Merge pull request #2 from apache/master

NEW

commit 8bfcd6b66950b40953d984fee93e8b16cbf7af05
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T13:00:12Z

fix

commit 379cd5a687d83c363179071a998bd689feb80e71
Author: codlife 
Date:   2016-09-13T13:03:39Z

Update SparkContext.scala

commit f4546685958ccfb12f0d994e7acefaf7d4ece600
Author: codlife <1004910...@qq.com>
Date:   2016-09-13T13:07:51Z

Merge branch 'master' of https://github.com/codlife/spark

commit 1d0d4fcf9d9c06a0d03fc8dd8ba7582e4945231a
Author: codlife <1004910...@qq.com>
Date:   2016-10-17T07:46:39Z

support stand json file

commit 9639a148c12021b523f0af0edb49b72f21fe273e
Author: codlife 
Date:   2016-10-17T08:26:33Z

Merge pull request #3 from apache/master

new

commit 2084079e419e10abe58999adacaa020dfdcef964
Author: codlife 
Date:   2016-10-17T08:35:10Z

Update DataFrameReader.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15511
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931]taskScheduler has some unneeded serializati...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67062/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931]taskScheduler has some unneeded serializati...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #67062 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67062/consoleFull)**
 for PR 15505 at commit 
[`bee165a`](https://github.com/apache/spark/commit/bee165a6c731ebea8b92e172807890e5f57c6fc5).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931]taskScheduler has some unneeded serializati...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15511
  
I don't quite understand this -- what does "standard" mean? This still 
doesn't load a 'standard JSON' file. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12761: [SPARK-14464] [MLLIB] Better support for logistic regres...

2016-10-17 Thread dbtsai

Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/12761
  
I'm benchmarking LOR with 14M features of internal company dataset 
(unfortunately, it's not public). 

Regrading using sparse data structure for aggregation, I'm not so sure how 
much this will improve the performance. Since after computing the gradient sum 
for all the data in one partitions, the gradient vector will be no longer to be 
very sparse. Even it's sparse, after couple depth of aggregation, it will be 
very dense. Also, we perform the compression in the shuffle phase, so if there 
are sparse, even it's in dense vector representation, the vector should take 
around the same size as sparse representation. We may need to do more 
investigation on this to understand how much performance we can gain in 
practice by using sparse vector for aggregating the gradients.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread codlife

Github user codlife commented on the issue:

https://github.com/apache/spark/pull/15511
  
In standard json file, multi lines json object is legal, but currently, we 
can just load single-line json obejct directly. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread MLnick

Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r83599465
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -303,6 +312,20 @@ class KMeans @Since("1.5.0") (
   @Since("1.5.0")
   def setSeed(value: Long): this.type = set(seed, value)
 
+  /** @group setParam */
--- End diff --

Agree with @sethah on this - initial model should take precedence - 
essentially ignoring any `setK` call whether before or after setting initial 
model.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15376
  
`KafkaSourceSuite` failure seems to be irrelevant.
```
[info] KafkaSourceSuite:
[info] - cannot stop Kafka stream (1 minute, 1 second)
[info] - subscribing topic by name from latest offsets *** FAILED *** (10 
seconds, 511 milliseconds)
[info]   The code passed to eventually never returned normally. Attempted 
669 times over 10.01201477801 seconds. Last failure message: assertion 
failed: Partition [topic-2, 0] metadata not propagated after timeout. 
(KafkaTestUtils.scala:312)
[info]   org.scalatest.exceptions.TestFailedDueToTimeoutException:
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15376
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-17 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r83600176
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -303,6 +312,20 @@ class KMeans @Since("1.5.0") (
   @Since("1.5.0")
   def setSeed(value: Long): this.type = set(seed, value)
 
+  /** @group setParam */
--- End diff --

My personally preference is throwing an exception to make it clear for 
users; but I don't have strong opinion about this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15512: The SerializerInstance instance used when deseria...

2016-10-17 Thread witgo

GitHub user witgo opened a pull request:

https://github.com/apache/spark/pull/15512

The SerializerInstance instance used when deserializing a TaskResult is not 
reused 

## What changes were proposed in this pull request?
The following code is called when the DirectTaskResult instance is 
deserialized

```scala

  def value(): T = {
if (valueObjectDeserialized) {
  valueObject
} else {
  // Each deserialization creates a new instance of SerializerInstance, 
which is very time-consuming
  val resultSer = SparkEnv.get.serializer.newInstance()
  valueObject = resultSer.deserialize(valueBytes)
  valueObjectDeserialized = true
  valueObject
}
  }

```

In the case of stage has a lot of tasks, reuse SerializerInstance instance 
can improve the scheduling performance of three times
 
The test data is TPC-DS 2T (Parquet) and  SQL statement as follows (query 
2)):


```sql

select  i_item_id, 
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4 
 from store_sales, customer_demographics, date_dim, item, promotion
 where ss_sold_date_sk = d_date_sk and
   ss_item_sk = i_item_sk and
   ss_cdemo_sk = cd_demo_sk and
   ss_promo_sk = p_promo_sk and
   cd_gender = 'M' and 
   cd_marital_status = 'M' and
   cd_education_status = '4 yr Degree' and
   (p_channel_email = 'N' or p_channel_event = 'N') and
   d_year = 2001 
 group by i_item_id
 order by i_item_id
 limit 100;

```

`spark-defaults.conf` file:

```
spark.master   yarn-client
spark.executor.instances   20
spark.driver.memory16g
spark.executor.memory  30g
spark.executor.cores   5
spark.default.parallelism  100 
spark.sql.shuffle.partitions   10 
spark.serializer   
org.apache.spark.serializer.KryoSerializer
spark.driver.maxResultSize  0
spark.rpc.netty.dispatcher.numThreads   8
spark.executor.extraJavaOptions  -XX:+UseG1GC 
-XX:+UseStringDeduplication -XX:G1HeapRegionSize=16M -XX:MetaspaceSize=256M 
spark.cleaner.referenceTracking.blocking true
spark.cleaner.referenceTracking.blocking.shuffle true

```


Performance test results are as follows 

[SPARK-17930](https://github.com/witgo/spark/tree/SPARK-17930)| 
[ed14633](https://github.com/witgo/spark/commit/ed1463341455830b8867b721a1b34f291139baf3])
 | -
54.5 s|231.7 s


## How was this patch tested?

Existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/witgo/spark SPARK-17930

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15512.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15512


commit 037871d8843760fbbdeab344d8228bfaeba6f6ae
Author: Guoqiang Li 
Date:   2016-10-16T03:18:00Z

The SerializerInstance instance used when deserializing a TaskResult is not 
reused




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67064 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67064/consoleFull)**
 for PR 15376 at commit 
[`401c4ee`](https://github.com/apache/spark/commit/401c4eea6aa40f09280aadf300eb74c5a847f638).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: The SerializerInstance instance used when deserializing ...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15512
  
**[Test build #67063 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67063/consoleFull)**
 for PR 15512 at commit 
[`037871d`](https://github.com/apache/spark/commit/037871d8843760fbbdeab344d8228bfaeba6f6ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15511
  
I guess it'd be nicer if this PR resembles 
https://github.com/apache/spark/pull/14151
The suggested change is to read each JSON object per file which I guess we 
can share some codes in the PR.

Also, as we have a `JSONOptions` and `DataFrameReader.option(...)` API, I 
think it'd be nicer if this one is added as an option rather than introducing 
another API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15511
  
BTW, I guess per-line JSON also complies a standard - 
https://tools.ietf.org/html/rfc7159#section-4. We should add a test, fix the 
title to summarise what the PR proposes and fill the PR description. I think 
also we can also alternatively close this, wait until 14151 is merged and then 
open again whan you are ready to start working on this..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/15512
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15316
  
**[Test build #67058 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67058/consoleFull)**
 for PR 15316 at commit 
[`f082643`](https://github.com/apache/spark/commit/f082643130e1f39918e06dbe8a39a9a4f49739f5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15316
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67058/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread codlife

Github user codlife commented on the issue:

https://github.com/apache/spark/pull/15511
  
Compile is ok, but when we call show(), we will get a _corrupt_record, 
besides when we call select on this df, we will get an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15316: [SPARK-17751] [SQL] Remove spark.sql.eagerAnalysis and O...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15316
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15302: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-17 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/15302
  
Hi, @hvanhovell .

When using `Expression`, I faced two situations.
- `checkAnalysis` raises exceptions because the column is unresolved, e.g., 
`country` is unresolved.
- As a workaround, I tried to use string literal 'country', but then 
optimizer `ConstantFolding` replaces that as `false` because 'country' < 'KR' 
is `false`.
```sql
ALTER TABLE sales DROP PARTITION (country < 'KR')
```
To avoid this situations, I can add some rule to `checkAnalysis`. But, it 
seems not a good idea. Could you give some advice for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15503: Fix example of tf_idf with minDocFreq

2016-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15503
  
Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15503: Fix example of tf_idf with minDocFreq

2016-10-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15503


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15512
  
Hm, if the benchmark you give generalizes much that is certainly 
compelling. I guess I'm surprised that instantiating the object can be so 
expensive relative to deserialization since it just happens once per task. 

But it is a fairly simple change. `ThreadLocal` avoids thread safety issues 
though I do wonder if the serializers can hold on to state that would make this 
a source of memory leak?

Maybe ... @squito or @zsxwing has a thought?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67052 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67052/consoleFull)**
 for PR 15376 at commit 
[`401c4ee`](https://github.com/apache/spark/commit/401c4eea6aa40f09280aadf300eb74c5a847f638).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67052/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15511
  
OK, I think in both cases "standard" JSON is read, and in both cases, each 
record is a JSON document. These aren't different cases. If you mean to read 
small JSON files as records, you just use wholeTextFiles, as you show. I do not 
think wrapping this up with an extra flag helps enough to justify this because 
callers can easily implement this. There are a hundred other variations on 
this, and the reason we don't implement them all is exactly because there are 
so many variations to bottle up like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15274: [SPARK-17699] Support for parsing JSON string columns

2016-10-17 Thread DanielMe

Github user DanielMe commented on the issue:

https://github.com/apache/spark/pull/15274
  
Is there any workaround I can use to achieve a similar effect in 1.6?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14650: [SPARK-17062][MESOS] add conf option to mesos dispatcher

2016-10-17 Thread skonto

Github user skonto commented on the issue:

https://github.com/apache/spark/pull/14650
  
@vanzin & @srowen pls review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15511: [SPARK-17969]I think it's user unfriendly to process sta...

2016-10-17 Thread codlife

Github user codlife commented on the issue:

https://github.com/apache/spark/pull/15511
  
@srowen , you are right! I propose this method just to make it more user 
friendly,  With this method, user can load a standard json file directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15511: [SPARK-17969]I think it's user unfriendly to proc...

2016-10-17 Thread codlife

Github user codlife commented on a diff in the pull request:

https://github.com/apache/spark/pull/15511#discussion_r83616031
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -240,16 +240,35 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
 
   /**
* Loads a JSON file (one object per line) and returns the result as a 
[[DataFrame]].
+   * With param isStandard we can load multi-lines json object directly.
* See the documentation on the overloaded `json()` method with varargs 
for more details.
*
* @since 1.4.0
*/
-  def json(path: String): DataFrame = {
+
+  def json(path: String, isStandard: Boolean = false ): DataFrame = {
 // This method ensures that calls that explicit need single argument 
works, see SPARK-16009
-json(Seq(path): _*)
+if (!isStandard) {
+  json(Seq(path): _*)
+} else {
+  val jsonRDD = sparkSession.sparkContext.wholeTextFiles(path)
+.map(line => line.toString().replaceAll("\\s+", ""))
+.map { jsonLine =>
+  val index = jsonLine.indexOf(",")
--- End diff --

maybe this code is bad, I just want to get the json contents
such as: ("filename",json_contents)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflict check...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15423
  
**[Test build #67060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67060/consoleFull)**
 for PR 15423 at commit 
[`cb0691c`](https://github.com/apache/spark/commit/cb0691c01bdf11212d001dcf3e6675c8b36c49ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflict check...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15423
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflict check...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15423
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67060/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #67056 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67056/consoleFull)**
 for PR 15505 at commit 
[`8a6062d`](https://github.com/apache/spark/commit/8a6062d4ac3cc33ac64f6d6ad78c416e0fc30125).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67056/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15487: [SPARK-17940][SQL] Fixed a typo in LAST function and imp...

2016-10-17 Thread lins05

Github user lins05 commented on the issue:

https://github.com/apache/spark/pull/15487
  
@HyukjinKwon thanks, I'll update the PR accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15450: [SPARK-3261] [MLLIB] KMeans clusterer can return duplica...

2016-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15450
  
@sethah I agree that when there are lots of unique points (>> k) then this 
is almost certain to not happen, and that's most real-world use cases, but the 
question indeed is what should happen when this is not the case. In that sense, 
this change only affects corner cases so isn't really a big deal either way.

Yes the one case is clear: sampling with replacement when the data set has 
< k (unique) points. It will always return k centroids, so must return 
duplicates. In this case, every point will be at distance 0 from some centroid 
and so I don't think the centroids can move apart. It stops in 1 iteration with 
the degenerate solution, with some centroids assigned 0 points. Not the end of 
the world but not exactly meaningful.

The more interesting case is k-means ||. Of course, again, if there are < k 
unique points to start, in this case as well, returning k centroids means 
returning duplicates. Same argument there -- seems to be no value in returning 
k centroids.

This is really the sum of the argument to me, regardless of what Derrick's 
case is.

A twist: it's possible, but quite improbable, for k-means || to choose 
fewer than k unique centroids, when there are >= k distinct points. This is 
most likely when there are barely more than k distinct points. In that case 
it's possible that duplicated centroids do get pulled apart and do end up doing 
something meaningful. I am arguing this case is not worth dealing with because 
it's rare and it doesn't meaningfully harm the quality of the resulting 
clustering, but, that point is arguable.

I am about 7/10 in favor of the change, certainly the bit about sampling 
without replacement, but the rest I could drop if there's any significant 
objection to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15377: [SPARK-17802] Improved caller context logging.

2016-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15377
  
@jerryshao can I take your temperature about how against this change you 
are? maybe @lins05 can elaborate again on what this is preventing, and what 
happens in case of a race between two threads. That is, what do we expect in 
the average case before and after?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-17 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15408
  
Going once, going twice, any more comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14136
  
**[Test build #67061 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67061/consoleFull)**
 for PR 14136 at commit 
[`eb264d9`](https://github.com/apache/spark/commit/eb264d9e42b1b8e3c90ead29c74538307e369681).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14136
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67061/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14136
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15481: [SPARK-17929] [CORE] Fix deadlock when CoarseGrainedSche...

2016-10-17 Thread scwf

Github user scwf commented on the issue:

https://github.com/apache/spark/pull/15481
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15492: [DO NOT MERGE][TEST] Testing flakiness of StreamingQuery...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15492
  
**[Test build #67057 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67057/consoleFull)**
 for PR 15492 at commit 
[`5bf08e6`](https://github.com/apache/spark/commit/5bf08e67b7c7800d80df3c0c52ca0e187bbd442d).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15492: [DO NOT MERGE][TEST] Testing flakiness of StreamingQuery...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15492
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15492: [DO NOT MERGE][TEST] Testing flakiness of StreamingQuery...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15492
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67057/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15376
  
**[Test build #67064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67064/consoleFull)**
 for PR 15376 at commit 
[`401c4ee`](https://github.com/apache/spark/commit/401c4eea6aa40f09280aadf300eb74c5a847f638).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15376: [SPARK-17796][SQL] Support wildcard character in filenam...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15376
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67064/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15512
  
**[Test build #67063 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67063/consoleFull)**
 for PR 15512 at commit 
[`037871d`](https://github.com/apache/spark/commit/037871d8843760fbbdeab344d8228bfaeba6f6ae).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15512
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67063/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15512: [SPARK-17930][CORE]The SerializerInstance instance used ...

2016-10-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15512
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...

2016-10-17 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15376#discussion_r83621898
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -246,7 +247,28 @@ case class LoadDataCommand(
 val loadPath =
   if (isLocal) {
 val uri = Utils.resolveURI(path)
-if (!new File(uri.getPath()).exists()) {
+val filePath = uri.getPath()
+val exists = if (filePath.contains("*")) {
+  val fileSystem = FileSystems.getDefault
+  val pathPattern = fileSystem.getPath(filePath)
+  val dir = pathPattern.getParent.toString
+  val filePattern = pathPattern.getName(pathPattern.getNameCount - 
1).toString
+  if (dir.contains("*")) {
+throw new AnalysisException(
+  s"LOAD DATA input path allows only filename wildcard: $path")
+  }
+
+  val files = new File(dir).listFiles()
+  if (files == null) {
+false
+  } else {
+val matcher = fileSystem.getPathMatcher("glob:" + filePattern)
--- End diff --

I was looking up how this works, and found 
http://stackoverflow.com/a/14164134/64174 which suggests that this might not 
work unless the glob starts with "**". However, I wonder if you can just pass 
this method `"glob:" + pathPattern` in this case anyway to have it match the 
whole absolute path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...

2016-10-17 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15376#discussion_r83622009
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -1886,6 +1887,37 @@ class SQLQuerySuite extends QueryTest with 
SQLTestUtils with TestHiveSingleton {
 }
   }
 
+  test("SPARK-17796 Support wildcard character in filename for LOAD DATA 
LOCAL INPATH") {
+withTempDir { dir =>
+  for (i <- 1 to 3) {
+val writer = new PrintWriter(new File(s"$dir/part-r-$i"))
--- End diff --

PS I think you can use Files.write from Guava to do this a little more 
easily


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15376: [SPARK-17796][SQL] Support wildcard character in ...

2016-10-17 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15376#discussion_r83621049
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -246,7 +247,28 @@ case class LoadDataCommand(
 val loadPath =
   if (isLocal) {
 val uri = Utils.resolveURI(path)
-if (!new File(uri.getPath()).exists()) {
+val filePath = uri.getPath()
+val exists = if (filePath.contains("*")) {
+  val fileSystem = FileSystems.getDefault
+  val pathPattern = fileSystem.getPath(filePath)
+  val dir = pathPattern.getParent.toString
+  val filePattern = pathPattern.getName(pathPattern.getNameCount - 
1).toString
--- End diff --

I think `getFileName` returns the last element in the path?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #11105: [SPARK-12469][CORE] Data Property accumulators fo...

2016-10-17 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/11105#discussion_r83623396
  
--- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
@@ -126,4 +126,14 @@ private[spark] class TaskContextImpl(
 taskMetrics.registerAccumulator(a)
   }
 
+  private var rddPartitionInfo: (Int, Int, Int) = null
--- End diff --

Yah that sounds pretty reasonable. I'll do this update next.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15377: [SPARK-17802] Improved caller context logging.

2016-10-17 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/15377
  
@srowen I'm not against this change, personally because the usage of flag 
is wired to me and frankly saying I haven't seen such pattern in the Spark code.

Since we want to avoid re-executing the code if CallContext is not enabled 
or not existed. We could get this information immutably like:

```scala
private[util] object CallerContext {
  val callerContextSupported: Boolean = if (callContext is not enabled) {
 false
  } else if (callContext is not existed) {
false 
  } else {
true
  }
}

```

And in `setCurrentContext` we could get rid of this mutable things, use the 
flag to decide whether to set the caller context without reset flag. That will 
simplify the codes and make others easy to understand.

What do you think?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 549 matches

Mail list logo