[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18953
  
**[Test build #80869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80869/testReport)**
 for PR 18953 at commit 
[`7954d52`](https://github.com/apache/spark/commit/7954d5223eee4bfaf7825ec79eaad36c524362dc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18986
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18986
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80868/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18986
  
**[Test build #80868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80868/testReport)**
 for PR 18986 at commit 
[`1647447`](https://github.com/apache/spark/commit/1647447b4e29f43e8bbb13ff5eb97f55e7b2ea99).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18849
  
If the new flag `DATASOURCE_HIVE_COMPATIBLE` is set to `true` when creating 
a table, are we sure it can be `true` forever? Is it a reliable flag we can 
trust? Is that possible the `ALTER TABLE` commands by the 
previous/current/future versions of Spark SQL might also change the hive 
compatibility? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18986
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80866/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18986
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18986
  
**[Test build #80866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80866/testReport)**
 for PR 18986 at commit 
[`627116a`](https://github.com/apache/spark/commit/627116a39e5a3cc5c2479dda98f794120aa2db52).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18986
  
**[Test build #80868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80868/testReport)**
 for PR 18986 at commit 
[`1647447`](https://github.com/apache/spark/commit/1647447b4e29f43e8bbb13ff5eb97f55e7b2ea99).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80867/testReport)**
 for PR 18973 at commit 
[`ac7d785`](https://github.com/apache/spark/commit/ac7d785041a55143663cade01f004f12e7284d93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread stanzhai
Github user stanzhai commented on the issue:

https://github.com/apache/spark/pull/18986
  
In MySQL conversion of values from one string type to numeric, will be 
compared as floating-point (real) numbers.

[](https://dev.mysql.com/doc/refman/5.7/en/type-conversion.html)

The following rules describe how conversion occurs for comparison 
operations:

- If one or both arguments are NULL, the result of the comparison is NULL, 
except for the NULL-safe <=> equality comparison operator. For NULL <=> NULL, 
the result is true. No conversion is needed.
- If both arguments in a comparison operation are strings, they are 
compared as strings.
- If both arguments are integers, they are compared as integers.
- Hexadecimal values are treated as binary strings if not compared to a 
number.
- If one of the arguments is a TIMESTAMP or DATETIME column and the other 
argument is a constant, the constant is converted to a timestamp before the 
comparison is performed. This is done to be more ODBC-friendly.   > Note that 
this is not done for the arguments to IN()! To be safe, always use complete 
datetime, date, or time strings when doing comparisons. For example, to achieve 
best results when using BETWEEN with date or time values, use CAST() to 
explicitly convert the values to the desired data type.
- A single-row subquery from a table or tables is not considered a 
constant. For example, if a subquery returns an integer to be compared to a 
DATETIME value, the comparison is done as two integers. The integer is not 
converted to a temporal value. To compare the operands as DATETIME values, use 
CAST() to explicitly convert the subquery value to DATETIME.
- If one of the arguments is a decimal value, comparison depends on the 
other argument. The arguments are compared as decimal values if the other 
argument is a decimal or integer value, or as floating-point values if the 
other argument is a floating-point value.
- In all other cases, the arguments are compared as floating-point (real) 
numbers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

2017-08-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18968
  
@dilipbiswal Thanks for comment.

This issue is happened at optimization phase, so I think it is supposed 
that we already pass `checkAnalysis`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80863/testReport)**
 for PR 18973 at commit 
[`60a3586`](https://github.com/apache/spark/commit/60a3586b90ea0ee05634fbb3c49605ba76bdc89c).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `class DebugForeachWriter[A : Encoder]() extends ForeachWriter[A] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18998: [SPARK-21748][ML] Migrate the implementation of HashingT...

2017-08-18 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/18998
  
cc @yanboliang @WeichenXu123 who I believe are interested in this PR. Could 
you take a look please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18998: [SPARK-21748][ML] Migrate the implementation of HashingT...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18998
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18998: [SPARK-21748][ML] Migrate the implementation of H...

2017-08-18 Thread facaiy
GitHub user facaiy opened a pull request:

https://github.com/apache/spark/pull/18998

[SPARK-21748][ML] Migrate the implementation of HashingTF from MLlib to ML

## What changes were proposed in this pull request?

Migrate the implementation of HashingTF from MLlib to ML.

## How was this patch tested?

+ [ ] Pass all unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/facaiy/spark ENH/migrate_hash_tf_to_ml

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18998.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18998


commit 6a25da2cc96a744aaf047280ac414e5ff4515434
Author: Yan Facai (颜发才) 
Date:   2017-08-19T02:24:14Z

ENH: implement HashingTF in ml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80864/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18973
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80864/testReport)**
 for PR 18973 at commit 
[`28c2f4b`](https://github.com/apache/spark/commit/28c2f4ba7f2cc2318aef18e3340bdfba001f9440).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class DebugForeachWriter[A : Encoder]() extends ForeachWriter[A] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18986
  
**[Test build #80866 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80866/testReport)**
 for PR 18986 at commit 
[`627116a`](https://github.com/apache/spark/commit/627116a39e5a3cc5c2479dda98f794120aa2db52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18953
  
Hi, @cloud-fan .
Could you review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15435
  
**[Test build #80865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80865/testReport)**
 for PR 15435 at commit 
[`46d49c9`](https://github.com/apache/spark/commit/46d49c901313600c92072d1fbb0a947220b677cb).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15435
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80865/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15435
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18762: [SPARK-21566][SQL][Python] Python method for summ...

2017-08-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18762


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15435
  
**[Test build #80865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80865/testReport)**
 for PR 15435 at commit 
[`46d49c9`](https://github.com/apache/spark/commit/46d49c901313600c92072d1fbb0a947220b677cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18762: [SPARK-21566][SQL][Python] Python method for summary

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18762
  
Thanks for doing this @aray, LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18647: [MINOR][PYTHON] Remove obsolete codes for parsing abstra...

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18647
  
Although I'd recommend doing a JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18647: [MINOR][PYTHON] Remove obsolete codes for parsing abstra...

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18647
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18953
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80861/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18953
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18953
  
**[Test build #80861 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80861/testReport)**
 for PR 18953 at commit 
[`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

2017-08-18 Thread dilipbiswal
Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/18968
  
@viirya Hi Simon, many thanks for finding this. Instead of adding the 
compensation code in the resolve logic for in-subquery expression can we 
consider to move the semantic checking of comparing the count of arguments in 
either side of in-subquery expression to checkAnalysis instead ? We do several 
other checks in checkAnalysis for subquery expression. I just feel this may be 
a little cleaner ?

For your reference, i quickly tried it 
[here](https://github.com/dilipbiswal/spark/tree/pr-18968-viirya)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18896: [SPARK-21681][ML] fix bug of MLOR do not work correctly ...

2017-08-18 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/18896
  
Thanks for catching this @WeichenXu123! I just added a note about the 
intent of test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...

2017-08-18 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18896#discussion_r134076580
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -1392,6 +1415,61 @@ class LogisticRegressionSuite
 assert(model2.interceptVector.toArray.sum ~== 0.0 absTol eps)
   }
 
+  test("multinomial logistic regression with zero variance (SPARK-21681)") 
{
+val sqlContext = multinomialDatasetWithZeroVar.sqlContext
+import sqlContext.implicits._
+val mlr = new 
LogisticRegression().setFamily("multinomial").setFitIntercept(true)
+  
.setElasticNetParam(0.0).setRegParam(0.0).setStandardization(true).setWeightCol("weight")
+
+val model = mlr.fit(multinomialDatasetWithZeroVar)
+
+/*
+ Use the following R code to load the data and train the model using 
glmnet package.
+
+ library("glmnet")
+ data <- read.csv("path", header=FALSE)
+ label = as.factor(data$V1)
+ w = data$V2
+ features = as.matrix(data.frame(data$V3, data$V4))
+ coefficients = coef(glmnet(features, label, weights=w, 
family="multinomial",
+ alpha = 0, lambda = 0))
+ coefficients
+ $`0`
+ 3 x 1 sparse Matrix of class "dgCMatrix"
+s0
+ 0.2658824
+ data.V3 0.1881871
+ data.V4 .
+
+ $`1`
+ 3 x 1 sparse Matrix of class "dgCMatrix"
+  s0
+  0.53604701
+ data.V3 -0.02412645
+ data.V4  .
+
+ $`2`
+ 3 x 1 sparse Matrix of class "dgCMatrix"
+ s0
+ -0.8019294
+ data.V3 -0.1640607
+ data.V4  .
+*/
+
+val coefficientsR = new DenseMatrix(3, 2, Array(
+  0.1881871, -0.0,
--- End diff --

Why `-0.0`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18896: [SPARK-21681][ML] fix bug of MLOR do not work cor...

2017-08-18 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/18896#discussion_r134076552
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
 ---
@@ -238,8 +238,17 @@ class LogisticAggregatorSuite extends SparkFunSuite 
with MLlibTestSparkContext {
 val aggConstantFeature = getNewAggregator(instancesConstantFeature,
   Vectors.dense(coefArray ++ interceptArray), fitIntercept = true, 
isMultinomial = true)
 instances.foreach(aggConstantFeature.add)
+
 // constant features should not affect gradient
-assert(aggConstantFeature.gradient(0) === 0.0)
+def validateGradient(grad: Vector): Unit = {
+  assert(grad(0) === 0.0)
+  grad.toArray.foreach { gradientValue =>
--- End diff --

The problem with this test was that it checked that part of the gradient 
was zero, but didn't check that the rest of the gradient was correct. Here, 
you're checking that the rest of the gradient isn't nan or infinite, but not 
that it's actually correct. A more appropriate test, IMO, is to also run an 
aggregator over the same instances with the constant feature filtered out, then 
check that the portion of the gradients they share are the same. e.g.

scala
val aggConstantFeature = getNewAggregator(instancesConstantFeature,
  Vectors.dense(coefArray ++ interceptArray), fitIntercept = true, 
isMultinomial = true)
val filteredInstances = instancesConstantFeature.map { case Instance(l, 
w, f) =>
  Instance(l, w, Vectors.dense(f.toArray.tail))
}
val aggMultinomial = getNewAggregator(filteredInstances,
  Vectors.dense(coefArray.slice(3, 6) ++ interceptArray), fitIntercept 
= true,
  isMultinomial = true)
filteredInstances.foreach(aggMultinomial.add)
instancesConstantFeature.foreach(aggConstantFeature.add)

// constant features should not affect gradient
assert(aggConstantFeature.gradient.toArray.take(numClasses) === 
Array.fill(numClasses)(0.0))
assert(aggMultinomial.gradient.toArray === 
aggConstantFeature.gradient.toArray.slice(3, 9))


Just to note, this code is just for an example, not meant to be copy and 
pasted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18538
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80862/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18981
  
Also to be clear pandoc is pretty optional, we want to use for packaging 
but in local mode it doesn't really matter. So LGTM pending jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18538
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18981: Fixed pandoc dependency issue in python/setup.py

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18981
  
Thanks for fixing this @dusktreader :) OSError seems to mostly occur when 
pandoc is not installed.

I think automatically attempting to install pandoc is more likely to lead 
us into difficulty rather than skipping it (since otherwise in part pypandoc 
would just do this automatically).

Jenkins, OK to Test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18538
  
**[Test build #80862 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80862/testReport)**
 for PR 18538 at commit 
[`a7db896`](https://github.com/apache/spark/commit/a7db8962745bd000da0737018eef4b1680425c90).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18281
  
Also to be clear, I'm fine with the changes I've suggested being left for a 
follow up (but if we do go ahead and merge this without those changes lets make 
it an explicit follow up task).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134075412
  
--- Diff: python/pyspark/ml/wrapper.py ---
@@ -118,11 +118,13 @@ def _transfer_params_to_java(self):
 """
 Transforms the embedded params to the companion Java object.
 """
-paramMap = self.extractParamMap()
 for param in self.params:
-if param in paramMap:
-pair = self._make_java_param_pair(param, paramMap[param])
+if param in self._paramMap:
+pair = self._make_java_param_pair(param, 
self._paramMap[param])
 self._java_obj.set(pair)
+if param in self._defaultParamMap:
--- End diff --

Should this be an else if? No need to transfer the default value if we've 
explicitly set it to another value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18982: [SPARK-21685][PYTHON][ML] PySpark Params isSet st...

2017-08-18 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18982#discussion_r134075445
  
--- Diff: python/pyspark/ml/tests.py ---
@@ -455,6 +455,14 @@ def test_logistic_regression_check_thresholds(self):
 LogisticRegression, threshold=0.42, thresholds=[0.5, 0.5]
 )
 
+def test_preserve_set_state(self):
+model = Binarizer()
+self.assertFalse(model.isSet("threshold"))
+model._transfer_params_to_java()
--- End diff --

Would it make sense to do an actual transform here instead of the two inner 
parts of the transform?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18905: [SPARK-21660][YARN][Shuffle] Yarn ShuffleService failed ...

2017-08-18 Thread LiShuMing
Github user LiShuMing commented on the issue:

https://github.com/apache/spark/pull/18905
  
Sorry,  busy recently, I will update it today...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18732
  
cc @HyukjinKwon @BryanCutler 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18970: [SPARK-21468][PYSPARK][ML] Python API for FeatureHasher

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18970
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18622: [SPARK-21340] Bring pyspark BinaryClassificationMetrics ...

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18622
  
Oh wait sorry I misread that code, it looks like we have this already 
wrapped behind in the Spark ML API.

Since we aren't actively working on the Spark MLlib APIs right now I 
probably don't see us merging this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18622: [SPARK-21340] Bring pyspark BinaryClassificationMetrics ...

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18622
  
So while we are trying to limit the changes in mllib, we are currently 
exposing BinaryClassificationMetrics in the ML models so this makes sense to 
expose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17849: [SPARK-10931][ML][PYSPARK] PySpark Models Copy Param Val...

2017-08-18 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17849
  
LGTM, its certainly sort of an intermediary fix state but making the params 
accessible without users having to go through py4j manually is worth while.

I'll leave this over the weekend in case anyone has issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80864 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80864/testReport)**
 for PR 18973 at commit 
[`28c2f4b`](https://github.com/apache/spark/commit/28c2f4ba7f2cc2318aef18e3340bdfba001f9440).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18997: [SPARK-21788][SS]Handle more exceptions when stopping a ...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18997
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80859/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18997: [SPARK-21788][SS]Handle more exceptions when stopping a ...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18997
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18997: [SPARK-21788][SS]Handle more exceptions when stopping a ...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18997
  
**[Test build #80859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80859/testReport)**
 for PR 18997 at commit 
[`8aa3a29`](https://github.com/apache/spark/commit/8aa3a2933225313988b50377b26f680151488535).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17373: [SPARK-12664][ML] Expose probability in mlp model

2017-08-18 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17373
  
We can open a JIRA to track


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18973: [SPARK-21765] Set isStreaming on leaf nodes for streamin...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18973
  
**[Test build #80863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80863/testReport)**
 for PR 18973 at commit 
[`60a3586`](https://github.com/apache/spark/commit/60a3586b90ea0ee05634fbb3c49605ba76bdc89c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18994: [SPARK-21784][SQL] Adds support for defining information...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18994
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18994: [SPARK-21784][SQL] Adds support for defining information...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18994
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80855/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18538
  
**[Test build #80862 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80862/testReport)**
 for PR 18538 at commit 
[`a7db896`](https://github.com/apache/spark/commit/a7db8962745bd000da0737018eef4b1680425c90).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18994: [SPARK-21784][SQL] Adds support for defining information...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18994
  
**[Test build #80855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80855/testReport)**
 for PR 18994 at commit 
[`4839e84`](https://github.com/apache/spark/commit/4839e8419ca7360f0feafeceec8f3832102e3dba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TableConstraints(`
  * `sealed trait TableConstraint `
  * `case class PrimaryKey(`
  * `case class ForeignKey(`
  * `case class AlterTableAddConstraintCommand(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18953
  
**[Test build #80861 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80861/testReport)**
 for PR 18953 at commit 
[`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18953
  
Retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18953
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18953
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80858/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18953
  
**[Test build #80858 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80858/testReport)**
 for PR 18953 at commit 
[`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SQL] Support for pushing down filters for ...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18995
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SQL] Support for pushing down filters for ...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18995
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80857/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18995: [SPARK-21787][SQL] Support for pushing down filters for ...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18995
  
**[Test build #80857 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80857/testReport)**
 for PR 18995 at commit 
[`b31acc5`](https://github.com/apache/spark/commit/b31acc5ac5b34fcc8868252a7e3217b959c6d6d8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18953: [SPARK-20682][SQL] Update ORC data source based o...

2017-08-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/18953#discussion_r134068925
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
@@ -205,38 +220,53 @@ class OrcQuerySuite extends QueryTest with 
BeforeAndAfterAll with OrcTest {
   spark.range(0, 10).write
 .option("compression", "ZLIB")
 .orc(file.getCanonicalPath)
+  val maybeOrcFile = 
file.listFiles().find(_.getName.endsWith(".zlib.orc"))
+  assert(maybeOrcFile.isDefined)
+  val orcFilePath = new Path(maybeOrcFile.get.getAbsolutePath)
+  val conf = OrcFile.readerOptions(new Configuration())
   val expectedCompressionKind =
-
OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
+OrcFile.createReader(orcFilePath, conf).getCompressionKind
   assert("ZLIB" === expectedCompressionKind.name())
 }
 
 withTempPath { file =>
   spark.range(0, 10).write
 .option("compression", "SNAPPY")
 .orc(file.getCanonicalPath)
+  val maybeOrcFile = 
file.listFiles().find(_.getName.endsWith(".snappy.orc"))
+  assert(maybeOrcFile.isDefined)
+  val orcFilePath = new Path(maybeOrcFile.get.getAbsolutePath)
+  val conf = OrcFile.readerOptions(new Configuration())
   val expectedCompressionKind =
-
OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
+OrcFile.createReader(orcFilePath, conf).getCompressionKind
   assert("SNAPPY" === expectedCompressionKind.name())
 }
 
 withTempPath { file =>
   spark.range(0, 10).write
 .option("compression", "NONE")
 .orc(file.getCanonicalPath)
+  val maybeOrcFile = file.listFiles().find(_.getName.endsWith(".orc"))
+  assert(maybeOrcFile.isDefined)
+  val orcFilePath = new Path(maybeOrcFile.get.getAbsolutePath)
+  val conf = OrcFile.readerOptions(new Configuration())
   val expectedCompressionKind =
-
OrcFileOperator.getFileReader(file.getCanonicalPath).get.getCompression
+OrcFile.createReader(orcFilePath, conf).getCompressionKind
   assert("NONE" === expectedCompressionKind.name())
 }
   }
 
-  // Following codec is not supported in Hive 1.2.1, ignore it now
-  ignore("LZO compression options for writing to an ORC file not supported 
in Hive 1.2.1") {
--- End diff --

This is a known improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-18 Thread budde
Github user budde commented on the issue:

https://github.com/apache/spark/pull/18029
  
@yssharma Let me know what you think of my review suggestions. I should be 
able to review any updates from here on in a timely manner but you will still 
need @brkyvz or another Spark committer to do the final review and approve the 
PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134068101
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  var initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.LATEST
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.TRIM_HORIZON.
+ */
+case object TrimHorizon extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.TRIM_HORIZON
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.AT_TIMESTAMP.
+ */
+case class AtTimestamp(timestamp: Date) extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.AT_TIMESTAMP
+}
+
+/**
+ * Companion object for InitialPosition that returns
+ * appropriate version of InitialPositionInStream.
+ */
+object InitialPosition {
+
+  /**
+   * Returns instance of Latest with InitialPositionInStream.LATEST.
+   * @return [[Latest]]
+   */
+  def latest() : InitialPosition = {
+Latest
+  }
+
+  /**
+   * Returns instance of Latest with InitialPositionInStream.TRIM_HORIZON.
+   * @return [[TrimHorizon]]
+   */
+  def trimHorizon() : InitialPosition = {
+TrimHorizon
+  }
+
+  /**
+   * Returns instance of AtTimestamp with 
InitialPositionInStream.AT_TIMESTAMP.
+   * @return [[AtTimestamp]]
+   */
+  def atTimestamp(timestamp: Date) : InitialPosition = {
+AtTimestamp(timestamp)
+  }
+
+  /**
+   * Returns instance of [[InitialPosition]] based on the passed 
[[InitialPositionInStream]].
+   * @return [[InitialPosition]]
+   */
+  def kinesisInitialPositionInStream(
--- End diff --

Ok, I see that it is being used to maintain the original APIs present in 
```KinesisUtils```. However, we should be deprecating ```KinesisUtils``` at 
some (undetermined?) point. Would be good to remove this method at that time as 
well.

I'd suggest adding a comment to the docs for this method indicating it 
exists to maintain compatibility with the original ```KinesisUtils``` API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134063427
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  var initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  def instance: InitialPosition = this
--- End diff --

Is this necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134067474
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala
 ---
@@ -148,18 +149,28 @@ private[kinesis] class KinesisReceiver[T](
 
 kinesisCheckpointer = new KinesisCheckpointer(receiver, 
checkpointInterval, workerId)
 val kinesisProvider = kinesisCreds.provider
-val kinesisClientLibConfiguration = new KinesisClientLibConfiguration(
+var kinesisClientLibConfiguration = new KinesisClientLibConfiguration(
--- End diff --

Keep this a val, but you can introduce a new scope with a temp val using 
braces, e.g.: 

```scala
val kinesisClientLibConfiguration = {
  val baseClientLibConfiguration = new KinesisClientLibConfiguration(
  checkpointAppName,
  streamName,
  ...
.withKinesisEndpoint(endpointUrl)
.withInitialPositionInStream(initialPosition.initialPositionInStream)
...

  initialPosition match {
// see comment below
...
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134063380
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  var initialPositionInStream: InitialPositionInStream
--- End diff --

This should be a ```val``` or, better yet, a ```def``` (```def``` can be 
overridden with ```val``` in child classes)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134067523
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  var initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.LATEST
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.TRIM_HORIZON.
+ */
+case object TrimHorizon extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.TRIM_HORIZON
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.AT_TIMESTAMP.
+ */
+case class AtTimestamp(timestamp: Date) extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.AT_TIMESTAMP
+}
+
+/**
+ * Companion object for InitialPosition that returns
+ * appropriate version of InitialPositionInStream.
+ */
+object InitialPosition {
+
+  /**
+   * Returns instance of Latest with InitialPositionInStream.LATEST.
+   * @return [[Latest]]
+   */
+  def latest() : InitialPosition = {
--- End diff --

I'd just make this a ```val``` or at least remove the parens as this method 
has no side-effects


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134067110
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala
 ---
@@ -148,18 +149,28 @@ private[kinesis] class KinesisReceiver[T](
 
 kinesisCheckpointer = new KinesisCheckpointer(receiver, 
checkpointInterval, workerId)
 val kinesisProvider = kinesisCreds.provider
-val kinesisClientLibConfiguration = new KinesisClientLibConfiguration(
+var kinesisClientLibConfiguration = new KinesisClientLibConfiguration(
   checkpointAppName,
   streamName,
   kinesisProvider,
   dynamoDBCreds.map(_.provider).getOrElse(kinesisProvider),
   cloudWatchCreds.map(_.provider).getOrElse(kinesisProvider),
   workerId)
 .withKinesisEndpoint(endpointUrl)
-.withInitialPositionInStream(initialPositionInStream)
+
.withInitialPositionInStream(initialPosition.initialPositionInStream)
 .withTaskBackoffTimeMillis(500)
 .withRegionName(regionName)
 
+// Update the Kinesis client lib config with timestamp
+// if InitialPositionInStream.AT_TIMESTAMP is passed
+kinesisClientLibConfiguration =
+  if (initialPosition.initialPositionInStream == 
InitialPositionInStream.AT_TIMESTAMP) {
--- End diff --

Here's a more-stylish way of doing this in Scala:

```
initialPosition match {
  case atTimestamp: AtTimestamp =>

baseClientLibConfiguration.withTimestampAtInitialPositionInStream(atTimestamp.timestamp)
  case _ =>
baseClientLibConfiguration
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134067791
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -308,7 +308,6 @@ object KinesisInputDStream {
   private[kinesis] val DEFAULT_KINESIS_ENDPOINT_URL: String =
 "https://kinesis.us-east-1.amazonaws.com;
   private[kinesis] val DEFAULT_KINESIS_REGION_NAME: String = "us-east-1"
-  private[kinesis] val DEFAULT_INITIAL_POSITION_IN_STREAM: 
InitialPositionInStream =
-InitialPositionInStream.LATEST
+  private[kinesis] val DEFAULT_INITIAL_POSITION_IN_STREAM: InitialPosition 
= InitialPosition.latest
--- End diff --

*nit* Rename this to ```DEFAULT_INITIAL_POSITION``` to reflect the new 
class name


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134067693
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  var initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.LATEST
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.TRIM_HORIZON.
+ */
+case object TrimHorizon extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.TRIM_HORIZON
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.AT_TIMESTAMP.
+ */
+case class AtTimestamp(timestamp: Date) extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.AT_TIMESTAMP
+}
+
+/**
+ * Companion object for InitialPosition that returns
+ * appropriate version of InitialPositionInStream.
+ */
+object InitialPosition {
+
+  /**
+   * Returns instance of Latest with InitialPositionInStream.LATEST.
+   * @return [[Latest]]
+   */
+  def latest() : InitialPosition = {
+Latest
+  }
+
+  /**
+   * Returns instance of Latest with InitialPositionInStream.TRIM_HORIZON.
+   * @return [[TrimHorizon]]
+   */
+  def trimHorizon() : InitialPosition = {
+TrimHorizon
+  }
+
+  /**
+   * Returns instance of AtTimestamp with 
InitialPositionInStream.AT_TIMESTAMP.
+   * @return [[AtTimestamp]]
+   */
+  def atTimestamp(timestamp: Date) : InitialPosition = {
+AtTimestamp(timestamp)
+  }
+
+  /**
+   * Returns instance of [[InitialPosition]] based on the passed 
[[InitialPositionInStream]].
+   * @return [[InitialPosition]]
+   */
+  def kinesisInitialPositionInStream(
--- End diff --

Is this method really necessary? Especially if it can only be used for a 
subset of the official ```InitialPositionInStream``` implementations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134068186
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  var initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  def instance: InitialPosition = this
--- End diff --

Looks like it is for Java compatibility


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-18 Thread budde
Github user budde commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r134067537
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,107 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  var initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.LATEST
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.TRIM_HORIZON.
+ */
+case object TrimHorizon extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.TRIM_HORIZON
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.AT_TIMESTAMP.
+ */
+case class AtTimestamp(timestamp: Date) extends InitialPosition {
+  def instance: InitialPosition = this
+  override var initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.AT_TIMESTAMP
+}
+
+/**
+ * Companion object for InitialPosition that returns
+ * appropriate version of InitialPositionInStream.
+ */
+object InitialPosition {
+
+  /**
+   * Returns instance of Latest with InitialPositionInStream.LATEST.
+   * @return [[Latest]]
+   */
+  def latest() : InitialPosition = {
+Latest
+  }
+
+  /**
+   * Returns instance of Latest with InitialPositionInStream.TRIM_HORIZON.
+   * @return [[TrimHorizon]]
+   */
+  def trimHorizon() : InitialPosition = {
--- End diff --

Change to ```val```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18538
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80860/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18538
  
**[Test build #80860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80860/testReport)**
 for PR 18538 at commit 
[`a4ca3cd`](https://github.com/apache/spark/commit/a4ca3cd18852abc8076905a586c6b0f4b622cff6).
 * This patch **fails to generate documentation**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18538
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18538
  
**[Test build #80860 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80860/testReport)**
 for PR 18538 at commit 
[`a4ca3cd`](https://github.com/apache/spark/commit/a4ca3cd18852abc8076905a586c6b0f4b622cff6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18940: [SPARK-21501] Change CacheLoader to limit entries...

2017-08-18 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18940#discussion_r134058845
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -597,7 +597,8 @@ private[spark] object SparkConf extends Logging {
   DeprecatedConfig("spark.scheduler.executorTaskBlacklistTime", 
"2.1.0",
 "Please use the new blacklisting options, spark.blacklist.*"),
   DeprecatedConfig("spark.yarn.am.port", "2.0.0", "Not used any more"),
-  DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any more")
+  DeprecatedConfig("spark.executor.port", "2.0.0", "Not used any 
more"),
+  DeprecatedConfig("spark.shuffle.service.index.cache.entries", 
"2.3.0", "Not used any more")
--- End diff --

It would be good to mention the new config that's replacing it in the 
warning message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler Unit T...

2017-08-18 Thread squito
Github user squito commented on the issue:

https://github.com/apache/spark/pull/16992
  
been a while so lets run tests again just to check:
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16992: [SPARK-19662][SCHEDULER][TEST] Add Fair Scheduler...

2017-08-18 Thread squito
Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/16992#discussion_r134056863
  
--- Diff: docs/job-scheduling.md ---
@@ -235,7 +235,7 @@ properties:
   of the cluster. By default, each pool's `minShare` is 0.
 
 The pool properties can be set by creating an XML file, similar to 
`conf/fairscheduler.xml.template`,
-and setting a `spark.scheduler.allocation.file` property in your
+and either setting `fairscheduler.xml` into classpath or a 
`spark.scheduler.allocation.file` property in your
--- End diff --

super nit:

... and either putting a file named `fairscheduler.xml` on the classpath, 
or setting `spark.scheduler.allocation.file` ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18997: [SPARK-21788][SS]Handle more exceptions when stopping a ...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18997
  
**[Test build #80859 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80859/testReport)**
 for PR 18997 at commit 
[`8aa3a29`](https://github.com/apache/spark/commit/8aa3a2933225313988b50377b26f680151488535).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18997: [SPARK-21788][SS]Handle more exceptions when stop...

2017-08-18 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/18997

[SPARK-21788][SS]Handle more exceptions when stopping a streaming query

## What changes were proposed in this pull request?

Add more cases we should view as a normal query stop rather than a failure.

## How was this patch tested?

The new unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-21788

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18997.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18997


commit 8aa3a2933225313988b50377b26f680151488535
Author: Shixiong Zhu 
Date:   2017-08-18T20:45:23Z

Handle more exceptions when stopping a streaming query




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18978: [SPARK-21737][YARN]Create communication channel between ...

2017-08-18 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/18978
  
So, one thing I'm a little confused about is: why are there any changes 
necessary to the transport library at all?

The transport library abstracts away the concept of users / secrets so that 
the same server can support multiple secrets. This is how the YARN shuffle 
service works. The `SecurityManager` is just a naive implementation of a secret 
holder that only supports one secret.

In my view, to implement this, you can do it in two different ways:

- have the AM `RpcEnv` also listen for connections, and register both the 
appId / app secret, and a "client" user name (which can be hardcoded) with the 
Client-to-AM token as the secret.

- create a separate `RpcEnv` for this feature that accepts any user and 
maps it to the `Client-to-AM` token.

In neither cases there should be the need to make any modifications to the 
transport library. Is there any reason why that would not work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18953
  
**[Test build #80858 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80858/testReport)**
 for PR 18953 at commit 
[`8548b73`](https://github.com/apache/spark/commit/8548b73d971ef5751594f5204aea83a3ead8bd4b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18972: [SPARK-21720][SQL] Fix 64KB JVM bytecode limit problem w...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18972
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18972: [SPARK-21720][SQL] Fix 64KB JVM bytecode limit problem w...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18972
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80853/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18972: [SPARK-21720][SQL] Fix 64KB JVM bytecode limit problem w...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18972
  
**[Test build #80853 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80853/testReport)**
 for PR 18972 at commit 
[`569a8bd`](https://github.com/apache/spark/commit/569a8bdd86f89d2077576c0aed591d6afc27d637).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18940: [SPARK-21501] Change CacheLoader to limit entries based ...

2017-08-18 Thread tgravescs
Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/18940
  
+1. Any further comments. @vanzin @jiangxb1987 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18996: [MINOR][TYPO] Fix typos: runnning and Excecutors

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18996
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80856/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18996: [MINOR][TYPO] Fix typos: runnning and Excecutors

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18996
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18996: [MINOR][TYPO] Fix typos: runnning and Excecutors

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18996
  
**[Test build #80856 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80856/testReport)**
 for PR 18996 at commit 
[`cf749df`](https://github.com/apache/spark/commit/cf749df92de7e1485752610a2396a14724431250).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >