date:20160829

[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.7+ spark-cloud mo...

2016-08-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12004
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14861: [SPARK-17287] [PYSPARK] Add recursive kwarg to Python Sp...

2016-08-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14861
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14857: [SPARK-17261][PYSPARK] Using HiveContext after re-creati...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14857
  
**[Test build #64558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64558/consoleFull)**
 for PR 14857 at commit 
[`986a24f`](https://github.com/apache/spark/commit/986a24fab27e258f263590a2e55cb88c0f8a662a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ...

2016-08-29 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/14746#discussion_r76639805
  
--- Diff: sql/core/src/main/java/org/apache/spark/sql/ViewType.java ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql;
+
+/**
+ * ViewType is used to specify the type of views.
+ */
+public enum ViewType {
--- End diff --

I thought you want me to use public enum. Let me change it now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-08-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64575/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] PEP8 on documentation examples

2016-08-29 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14830
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64563/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] PEP8 on documentation examples

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14830
  
**[Test build #64563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64563/consoleFull)**
 for PR 14830 at commit 
[`14b2260`](https://github.com/apache/spark/commit/14b2260e9bb45737ed7a7580f8b7aa45caae7694).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14553: [SPARK-16963] Changes to Source trait and related implem...

2016-08-29 Thread frreiss

Github user frreiss commented on the issue:

https://github.com/apache/spark/pull/14553
  
@rxin and @marmbrus, would it be possible to get this PR reviewed soon? I 
can split it into smaller chunks if that would make things easier; I just need 
to know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13065
  
**[Test build #64579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64579/consoleFull)**
 for PR 13065 at commit 
[`c41e308`](https://github.com/apache/spark/commit/c41e308a261fa0303d45a63306732af5909c373e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14204
  
**[Test build #64575 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64575/consoleFull)**
 for PR 14204 at commit 
[`987caa3`](https://github.com/apache/spark/commit/987caa36f49889746f544e6d6d0ad1c94e643d81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13584: [SPARK-15509][ML][SparkR] R MLlib algorithms should supp...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13584
  
**[Test build #64578 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64578/consoleFull)**
 for PR 13584 at commit 
[`1bc150f`](https://github.com/apache/spark/commit/1bc150f8af93f0e5d35e40fd39e33176c974d8cf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #64576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64576/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14638: [SPARK-11374][SQL] Support `skip.header.line.count` opti...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14638
  
**[Test build #64569 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64569/consoleFull)**
 for PR 14638 at commit 
[`1e22b68`](https://github.com/apache/spark/commit/1e22b68c370475714079a4c65250d5941c1f5998).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14623: [SPARK-17044][SQL] Make test files for window functions ...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14623
  
**[Test build #64570 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64570/consoleFull)**
 for PR 14623 at commit 
[`1229fdc`](https://github.com/apache/spark/commit/1229fdcde3a8e0b3b79a472319f1042f39d8ba6d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14426: [SPARK-16475][SQL] Broadcast Hint for SQL Queries

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14426
  
**[Test build #64574 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64574/consoleFull)**
 for PR 14426 at commit 
[`377b625`](https://github.com/apache/spark/commit/377b6251eec50a728aea17d306695dbec865269e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14116: [SPARK-16452][SQL] Support basic INFORMATION_SCHEMA

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14116
  
**[Test build #64577 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64577/consoleFull)**
 for PR 14116 at commit 
[`7543069`](https://github.com/apache/spark/commit/754306970c4635624207c8082b175b4e195928b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12004: [SPARK-7481][build] [WIP] Add Hadoop 2.7+ spark-cloud mo...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12004
  
**[Test build #64580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64580/consoleFull)**
 for PR 12004 at commit 
[`f39018e`](https://github.com/apache/spark/commit/f39018eee40ef463ebfdfb0f6a7ba6384b46c459).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14784: [SPARK-17210][SPARKR] sparkr.zip is not distributed to e...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14784
  
**[Test build #64566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64566/consoleFull)**
 for PR 14784 at commit 
[`986cddc`](https://github.com/apache/spark/commit/986cddc360d0008fcca395e53254d59b0e4b6988).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14452
  
**[Test build #64572 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64572/consoleFull)**
 for PR 14452 at commit 
[`c6d987f`](https://github.com/apache/spark/commit/c6d987f584859224a05b17a157be30389066dccb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14712
  
**[Test build #64567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64567/consoleFull)**
 for PR 14712 at commit 
[`3407c7f`](https://github.com/apache/spark/commit/3407c7f7aa62503e62c7c4847ad6a2568a676c38).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14691: [SPARK-16407][STREAMING] Allow users to supply custom st...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14691
  
**[Test build #64568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64568/consoleFull)**
 for PR 14691 at commit 
[`c7bbffc`](https://github.com/apache/spark/commit/c7bbffcdbdc1723e165eec0bb481b1f385e18ac9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14435: [SPARK-16756][SQL][WIP] Add `sql` function to LogicalPla...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14435
  
**[Test build #64573 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64573/consoleFull)**
 for PR 14435 at commit 
[`3a3f8ac`](https://github.com/apache/spark/commit/3a3f8acb5d27bce8d9cdce4538e62745b3bc8757).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14527: [SPARK-16938][SQL] `drop/dropDuplicate` should handle th...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14527
  
**[Test build #64571 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64571/consoleFull)**
 for PR 14527 at commit 
[`9359601`](https://github.com/apache/spark/commit/93596018d9d6e06085fcb5df065f30946954c93b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14803
  
**[Test build #64565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64565/consoleFull)**
 for PR 14803 at commit 
[`0d841e2`](https://github.com/apache/spark/commit/0d841e27e647d4187be56ddd88ac4f06eb560ed7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14858: [SPARK-17219][ML] Add NaN value handling in Bucketizer

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14858
  
**[Test build #64557 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64557/consoleFull)**
 for PR 14858 at commit 
[`bfb5b33`](https://github.com/apache/spark/commit/bfb5b333d0a4e4a9d05a25cc0d47a5cdbd496965).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14805: [MINOR][DOCS] Fix minor typos in python example code

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14805
  
**[Test build #64564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64564/consoleFull)**
 for PR 14805 at commit 
[`36c99f8`](https://github.com/apache/spark/commit/36c99f83711fe912fbaa66cbeac07ae4a1bb1d2e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14859
  
**[Test build #64556 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64556/consoleFull)**
 for PR 14859 at commit 
[`1f23b05`](https://github.com/apache/spark/commit/1f23b0596b98cf333cab303f4b5ab53940bafbca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14855: [SPARK-17284] [SQL] Remove Statistics-related Table Prop...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14855
  
**[Test build #64560 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64560/consoleFull)**
 for PR 14855 at commit 
[`ce8e8b8`](https://github.com/apache/spark/commit/ce8e8b89a5b61648daaa59578e2b6a99ec2f6d74).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14850: [SPARK-17279][SQL] better error message for NPE during S...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14850
  
**[Test build #64562 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64562/consoleFull)**
 for PR 14850 at commit 
[`1bd8382`](https://github.com/apache/spark/commit/1bd83826ff2bd2c99a47fe0029cd82910d355a7e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] PEP8 on documentation examples

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14830
  
**[Test build #64563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64563/consoleFull)**
 for PR 14830 at commit 
[`14b2260`](https://github.com/apache/spark/commit/14b2260e9bb45737ed7a7580f8b7aa45caae7694).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14860: [SPARK-17264] [SQL] DataStreamWriter should document tha...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14860
  
**[Test build #64555 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64555/consoleFull)**
 for PR 14860 at commit 
[`b73074b`](https://github.com/apache/spark/commit/b73074bbe67ddd69caf0b65fafe36016bf805422).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14862: [SPARK-17295][SQL] Create TestHiveSessionState use refle...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14862
  
**[Test build #64554 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64554/consoleFull)**
 for PR 14862 at commit 
[`0867d2a`](https://github.com/apache/spark/commit/0867d2ac853b7634de53de0f07665636598ca454).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14864
  
**[Test build #64552 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64552/consoleFull)**
 for PR 14864 at commit 
[`07196a8`](https://github.com/apache/spark/commit/07196a8acbf6f0a68f29f96d1eeea74f53bbeb8a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14854: [SPARK-17283][WIP][Core] Cancel job in RDD.take() as soo...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14854
  
**[Test build #64561 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64561/consoleFull)**
 for PR 14854 at commit 
[`e9c7dfb`](https://github.com/apache/spark/commit/e9c7dfb46360a2f3fa689ca448a26b114dbc02b5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14863: [SPARK-16992][PYSPARK] use map comprehension in doc

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14863
  
**[Test build #64553 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64553/consoleFull)**
 for PR 14863 at commit 
[`7a2621e`](https://github.com/apache/spark/commit/7a2621ec2e1f588ae2252327ced61c41c28a9243).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14856: [SPARK-17241][SparkR][MLlib] SparkR spark.glm should hav...

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14856
  
**[Test build #64559 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64559/consoleFull)**
 for PR 14856 at commit 
[`6417049`](https://github.com/apache/spark/commit/6417049e9185434bc23c651217d73a88abe4f606).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13231: [SPARK-15453] [SQL] Sort Merge Join to use bucketing met...

2016-08-29 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/13231
  
Continuing this work in a new PR : 
https://github.com/apache/spark/pull/14864


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14862: [SPARK-17295][SQL] Create TestHiveSessionState use refle...

2016-08-29 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14862
  
We are trying to get rid of `HiveSessionState`. Thus, I am not sure what 
you did here is in our direction. cc @cloud-fan @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14691: [SPARK-16407][STREAMING] Allow users to supply custom st...

2016-08-29 Thread shaneknapp

Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/14691
  
jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14239: [SPARK-16593] [CORE] [WIP] Provide a pre-fetch mechanism...

2016-08-29 Thread tgravescs

Github user tgravescs commented on the issue:

https://github.com/apache/spark/pull/14239
  
thanks for the explanation, this makes much more sense now. I'm still a bit 
concerned about the memory usage of this though, especially with external 
shuffle on the nodemanager.

Were you using the external shuffle to test this or just the shuffle built 
into the executors?  How much memory did you give whatever was shuffling and 
how big were the blocks being fetched?

Does this look at all about the size its trying to cache vs size available 
to shuffle handler?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-08-29 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/14864
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-08-29 Thread tejasapatil

GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/14864

[SPARK-15453] [SQL] FileSourceScanExec to extract `outputOrdering` 
information

## What changes were proposed in this pull request?

Extracting sort ordering information in `FileSourceScanExec` so that 
planner can make use of it. My motivation to make this change was to get Sort 
Merge join in par with Hive's Sort-Merge-Bucket join when the source tables are 
bucketed + sorted.

Query:

```
val df = (0 until 16).map(i => (i % 8, i * 2, i.toString)).toDF("i", "j", 
"k").coalesce(1)
df.write.bucketBy(8, "j", "k").sortBy("j", "k").saveAsTable("table8")
df.write.bucketBy(8, "j", "k").sortBy("j", "k").saveAsTable("table9")
context.sql("SELECT * FROM table8 a JOIN table9 b ON a.j=b.j AND 
a.k=b.k").explain(true)
```

Before:

```
== Physical Plan ==
*SortMergeJoin [j#120, k#121], [j#123, k#124], Inner
:- *Sort [j#120 ASC, k#121 ASC], false, 0
:  +- *Project [i#119, j#120, k#121]
: +- *Filter (isnotnull(k#121) && isnotnull(j#120))
:+- *FileScan orc default.table8[i#119,j#120,k#121] Batched: false, 
Format: ORC, InputPaths: 
file:/Users/tejasp/Desktop/dev/tp-spark/spark-warehouse/table8, 
PartitionFilters: [], PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: 
struct
+- *Sort [j#123 ASC, k#124 ASC], false, 0
+- *Project [i#122, j#123, k#124]
+- *Filter (isnotnull(k#124) && isnotnull(j#123))
 +- *FileScan orc default.table9[i#122,j#123,k#124] Batched: false, Format: 
ORC, InputPaths: 
file:/Users/tejasp/Desktop/dev/tp-spark/spark-warehouse/table9, 
PartitionFilters: [], PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: 
struct
```

After:  (note that the `Sort` step is no longer there)

```
== Physical Plan ==
*SortMergeJoin [j#49, k#50], [j#52, k#53], Inner
:- *Project [i#48, j#49, k#50]
:  +- *Filter (isnotnull(k#50) && isnotnull(j#49))
: +- *FileScan orc default.table8[i#48,j#49,k#50] Batched: false, 
Format: ORC, InputPaths: 
file:/Users/tejasp/Desktop/dev/tp-spark/spark-warehouse/table8, 
PartitionFilters: [], PushedFilters: [IsNotNull(k), IsNotNull(j)], ReadSchema: 
struct
+- *Project [i#51, j#52, k#53]
   +- *Filter (isnotnull(j#52) && isnotnull(k#53))
  +- *FileScan orc default.table9[i#51,j#52,k#53] Batched: false, 
Format: ORC, InputPaths: 
file:/Users/tejasp/Desktop/dev/tp-spark/spark-warehouse/table9, 
PartitionFilters: [], PushedFilters: [IsNotNull(j), IsNotNull(k)], ReadSchema: 
struct
```

## How was this patch tested?

Added a test case in `JoinSuite`. Ran all other tests in `JoinSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark SPARK-15453_smb_optimization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14864


commit 07196a8acbf6f0a68f29f96d1eeea74f53bbeb8a
Author: Tejas Patil 
Date:   2016-08-26T07:00:35Z

[SPARK-15453] [SQL] Sort Merge Join to use bucketing metadata to optimize 
query plan

BEFORE

```
val df = (0 until 16).map(i => (i % 8, i * 2, i.toString)).toDF("i", "j", 
"k").coalesce(1)
hc.sql("DROP TABLE table8").collect
df.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(8, 
"j", "k").sortBy("j", "k").saveAsTable("table8")
hc.sql("DROP TABLE table9").collect
df.write.format("org.apache.spark.sql.hive.orc.OrcFileFormat").bucketBy(8, 
"j", "k").sortBy("j", "k").saveAsTable("table9")

hc.sql("SELECT * FROM table8 a JOIN table9 b ON a.j=b.j AND 
a.k=b.k").explain(true)

== Parsed Logical Plan ==
'Project [*]
+- 'Join Inner, (('a.j = 'b.j) && ('a.k = 'b.k))
:- 'UnresolvedRelation table8, a
+- 'UnresolvedRelation table9, b

== Analyzed Logical Plan ==
i: int, j: int, k: string, i: int, j: int, k: string
Project [i#119, j#120, k#121, i#122, j#123, k#124]
+- Join Inner, ((j#120 = j#123) && (k#121 = k#124))
:- SubqueryAlias a
:  +- SubqueryAlias table8
: +- Relation[i#119,j#120,k#121] orc
+- SubqueryAlias b
  +- SubqueryAlias table9
+- Relation[i#122,j#123,k#124] orc

== Optimized Logical Plan ==
Join Inner, ((j#120 = j#123) && (k#121 = k#124))
:- Filter (isnotnull(k#121) && isnotnull(j#120))
:  +- Relation[i#119,j#120,k#121] orc
+- Filter (isnotnull(k#124) && isnotnull(j#123))
+- Relation[i#122,j#123,k#124] orc

== Physical Plan ==
*SortMergeJoin [j#120, k#121], [j#123, k#124], Inner
:- *Sort [j#120 ASC, k#121 ASC], false, 0
:  +- *Project [i#119, j#120, k#121]
: +- *Filter (isnotnull(k#121) && isno

[GitHub] spark pull request #14863: [SPARK-16992][PYSPARK] use map comprehension in d...

2016-08-29 Thread Stibbons

GitHub user Stibbons opened a pull request:

https://github.com/apache/spark/pull/14863

[SPARK-16992][PYSPARK] use map comprehension in doc

Code is equivalent, but map comprehency is most of the time faster than a 
map.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Stibbons/spark map_comprehension

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14863.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14863


commit 7a2621ec2e1f588ae2252327ced61c41c28a9243
Author: Gaetan Semet 
Date:   2016-08-29T14:30:18Z

use map comprehension

Signed-off-by: Gaetan Semet 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-08-29 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/13065
  
@hvanhovell yea, thx for letting me know. I'll do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13065: [SPARK-15214][SQL] Code-generation for Generate

2016-08-29 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/13065
  
@maropu I have updated the PR. Want to take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14862: [SPARK-17295][SQL] Create TestHiveSessionState us...

2016-08-29 Thread jiangxb1987

GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/14862

[SPARK-17295][SQL] Create TestHiveSessionState use reflect logic based on 
the setting of CATALOG_IMPLEMENTATION

## What changes were proposed in this pull request?

Currently we create a new `TestHiveSessionState` in `TestHive`, but in 
`SparkSession` we create `SessionState`/`HiveSessionState` use reflect logic 
based on the setting of CATALOG_IMPLEMENTATION, we should make the both 
consist, then we can test the reflect logic of `SparkSession` in `TestHive`.

To achieve this, we add `test-hive` to the value set of 
CATALOG_IMPLEMENTATION, and updated relative references.

## How was this patch tested?

existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark testhive

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14862.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14862


commit 0867d2ac853b7634de53de0f07665636598ca454
Author: jiangxingbo 
Date:   2016-08-29T09:45:16Z

create TestHiveSessionState use reflect logic based on the setting of 
CATALOG_IMPLEMENTATION.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-29 Thread mpjlu

Github user mpjlu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14597#discussion_r76624379
  
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
 """
 Creates a ChiSquared feature selector.
 
-:param numTopFeatures: number of features that selector will select.
-
 >>> data = [
 ... LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
 ... LabeledPoint(1.0, SparseVector(3, {1: 9.0, 2: 6.0})),
 ... LabeledPoint(1.0, [0.0, 9.0, 8.0]),
 ... LabeledPoint(2.0, [8.0, 9.0, 5.0])
 ... ]
->>> model = ChiSqSelector(1).fit(sc.parallelize(data))
+>>> model = 
ChiSqSelector().setNumTopFeatures(1).fit(sc.parallelize(data))
+>>> model.transform(SparseVector(3, {1: 9.0, 2: 6.0}))
+SparseVector(1, {0: 6.0})
+>>> model.transform(DenseVector([8.0, 9.0, 5.0]))
+DenseVector([5.0])
+>>> model = 
ChiSqSelector().setPercentile(0.34).fit(sc.parallelize(data))
 >>> model.transform(SparseVector(3, {1: 9.0, 2: 6.0}))
 SparseVector(1, {0: 6.0})
 >>> model.transform(DenseVector([8.0, 9.0, 5.0]))
 DenseVector([5.0])
+>>> data = [
+... LabeledPoint(0.0, SparseVector(4, {0: 8.0, 1: 7.0})),
+... LabeledPoint(1.0, SparseVector(4, {1: 9.0, 2: 6.0, 3: 4.0})),
+... LabeledPoint(1.0, [0.0, 9.0, 8.0, 4.0]),
+... LabeledPoint(2.0, [8.0, 9.0, 5.0, 9.0])
+... ]
+>>> model = ChiSqSelector().setAlpha(0.1).fit(sc.parallelize(data))
+>>> model.transform(DenseVector([1.0,2.0,3.0,4.0]))
+DenseVector([4.0])
 
 .. versionadded:: 1.4.0
 """
-def __init__(self, numTopFeatures):
-self.numTopFeatures = int(numTopFeatures)
+def __init__(self):
+self.param = 50
--- End diff --

Use three variables are clear. It needs another selectionType variable also.
In the fit function, according to the selection type, call different 
functions. If you prefer that, I will change the code. I am ok for both 
methods.  Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector...

2016-08-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14597#discussion_r76622682
  
--- Diff: python/pyspark/mllib/feature.py ---
@@ -276,24 +276,64 @@ class ChiSqSelector(object):
 """
 Creates a ChiSquared feature selector.
 
-:param numTopFeatures: number of features that selector will select.
-
 >>> data = [
 ... LabeledPoint(0.0, SparseVector(3, {0: 8.0, 1: 7.0})),
 ... LabeledPoint(1.0, SparseVector(3, {1: 9.0, 2: 6.0})),
 ... LabeledPoint(1.0, [0.0, 9.0, 8.0]),
 ... LabeledPoint(2.0, [8.0, 9.0, 5.0])
 ... ]
->>> model = ChiSqSelector(1).fit(sc.parallelize(data))
+>>> model = 
ChiSqSelector().setNumTopFeatures(1).fit(sc.parallelize(data))
+>>> model.transform(SparseVector(3, {1: 9.0, 2: 6.0}))
+SparseVector(1, {0: 6.0})
+>>> model.transform(DenseVector([8.0, 9.0, 5.0]))
+DenseVector([5.0])
+>>> model = 
ChiSqSelector().setPercentile(0.34).fit(sc.parallelize(data))
 >>> model.transform(SparseVector(3, {1: 9.0, 2: 6.0}))
 SparseVector(1, {0: 6.0})
 >>> model.transform(DenseVector([8.0, 9.0, 5.0]))
 DenseVector([5.0])
+>>> data = [
+... LabeledPoint(0.0, SparseVector(4, {0: 8.0, 1: 7.0})),
+... LabeledPoint(1.0, SparseVector(4, {1: 9.0, 2: 6.0, 3: 4.0})),
+... LabeledPoint(1.0, [0.0, 9.0, 8.0, 4.0]),
+... LabeledPoint(2.0, [8.0, 9.0, 5.0, 9.0])
+... ]
+>>> model = ChiSqSelector().setAlpha(0.1).fit(sc.parallelize(data))
+>>> model.transform(DenseVector([1.0,2.0,3.0,4.0]))
+DenseVector([4.0])
 
 .. versionadded:: 1.4.0
 """
-def __init__(self, numTopFeatures):
-self.numTopFeatures = int(numTopFeatures)
+def __init__(self):
+self.param = 50
--- End diff --

It seems like param is used to mean many different things. Why not 
different fields like in the Scala version for clarity?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14597: [SPARK-17017][MLLIB][ML] add a chiSquare Selector based ...

2016-08-29 Thread mpjlu

Github user mpjlu commented on the issue:

https://github.com/apache/spark/pull/14597
  
Hi @srowen , I have added Python API and test cases for ChiSqSelector. 
Could you kindly review it again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11956: [SPARK-14098][SQL] Generate Java code that gets a float/...

2016-08-29 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/11956
  
@davies Could you please share your great opinions regarding these design 
questions among our community while we know you are busy?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] PEP8 on documentation exam...

2016-08-29 Thread Stibbons

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/14830#discussion_r76608499
  
--- Diff: examples/src/main/python/als.py ---
@@ -62,10 +62,10 @@ def update(i, mat, ratings):
   example. Please use pyspark.ml.recommendation.ALS for more
   conventional use.""", file=sys.stderr)
 
-spark = SparkSession\
-.builder\
-.appName("PythonALS")\
-.getOrCreate()
+spark = (SparkSession
--- End diff --

I have not changed all this initilization lines, since they do not appear 
most of the time in the documentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] autopep8 on documentation example...

2016-08-29 Thread Stibbons

Github user Stibbons commented on the issue:

https://github.com/apache/spark/pull/14830
  
Cool I wasn't sure of it.

No pbl, I can even split it into several PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] autopep8 on documentation example...

2016-08-29 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/14830
  
For what its worth pep8 says:

> The preferred way of wrapping long lines is by using Python's implied 
line continuation inside parentheses, brackets and braces. Long lines can be 
broken over multiple lines by wrapping expressions in parentheses. These should 
be used in preference to using a backslash for line continuation.

So this sounds like keeping in line with the general more pep8ification of 
the code - but I am a little concerned about just how many files this touches 
now that it isn't just an autogenerated change*, but I'll try and set aside 
some time this week to review it (I'm currently ~13 hours off my regular 
timezone so my review times may be a little erratic).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14830: [SPARK-16992][PYSPARK] autopep8 on documentation example...

2016-08-29 Thread Stibbons

Github user Stibbons commented on the issue:

https://github.com/apache/spark/pull/14830
  
Here is a new proposal. I've taken into account your remark, hope all 
$on/$off things are ok, and added some minor rework with the multiline syntax 
(I find using \ weird and inelegant, using parenthesis "()" make is more 
readable, TMHO). 

Tell me what you think about this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] autopep8 on documentation ...

2016-08-29 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14830#discussion_r76599413
  
--- Diff: examples/src/main/python/ml/aft_survival_regression.py ---
@@ -17,9 +17,9 @@
 
 from __future__ import print_function
 
+from pyspark.ml.linalg import Vectors
 # $example on$
 from pyspark.ml.regression import AFTSurvivalRegression
-from pyspark.ml.linalg import Vectors
--- End diff --

In that case, move the `# $example on$` comment up above the `from 
pyspark.ml.linalg import Vectors`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] autopep8 on documentation ...

2016-08-29 Thread Stibbons

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/14830#discussion_r76598828
  
--- Diff: examples/src/main/python/ml/aft_survival_regression.py ---
@@ -17,9 +17,9 @@
 
 from __future__ import print_function
 
+from pyspark.ml.linalg import Vectors
 # $example on$
 from pyspark.ml.regression import AFTSurvivalRegression
-from pyspark.ml.linalg import Vectors
--- End diff --

I actually prefer this line be in the doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ...

2016-08-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14746#discussion_r76597119
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -105,7 +96,14 @@ case class CreateViewCommand(
 }
 val sessionState = sparkSession.sessionState
 
-if (isTemporary) {
+// 1) CREATE VIEW: create a temp view when users explicitly specify 
the keyword TEMPORARY;
+// otherwise, create a permanent view no matter 
whether the temporary view
+// with the same name exists or not.
+// 2) ALTER VIEW: alter the temporary view if the temp view exists; 
otherwise, try to alter
+//the permanent view. Here, it follows the same 
resolution like DROP VIEW,
+//since users are unable to specify the keyword 
TEMPORARY.
+if (viewType == ViewType.Temporary ||
+(viewType != ViewType.Permanent && 
sessionState.catalog.isTemporaryTable(name))) {
--- End diff --

This is not so readable, how about we use 3 branches for temp view and 
permanent view and any view individually?

Also, we don't need to mention CREATE VIEW or ALTER VIEW here, the semantic 
is clearly defined by the SaveMode and ViewType, we just need to document how 
these codes match the semantic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...

2016-08-29 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14851
  
cc @jkbradley thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ...

2016-08-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14746#discussion_r76596767
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala ---
@@ -69,23 +66,17 @@ case class CreateViewCommand(
 
   override def output: Seq[Attribute] = Seq.empty[Attribute]
 
-  if (!isTemporary) {
--- End diff --

why remove this check?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14746: [SPARK-17180] [SQL] Fix View Resolution Order in ...

2016-08-29 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14746#discussion_r76596601
  
--- Diff: sql/core/src/main/java/org/apache/spark/sql/ViewType.java ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql;
+
+/**
+ * ViewType is used to specify the type of views.
+ */
+public enum ViewType {
--- End diff --

This doesn't need to be public to end users, we can put it in `view.scala` 
and use `sealed trait` to implement it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14833: fixed a typo

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14833


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14833: fixed a typo

2016-08-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14833
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14833: fixed a typo

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14833
  
**[Test build #3235 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3235/consoleFull)**
 for PR 14833 at commit 
[`a93ce34`](https://github.com/apache/spark/commit/a93ce34873b4fe55675feaee26f32251468dceeb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] autopep8 on documentation ...

2016-08-29 Thread Stibbons

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/14830#discussion_r76595360
  
--- Diff: examples/src/main/python/ml/binarizer_example.py ---
@@ -17,9 +17,10 @@
 
 from __future__ import print_function
 
-from pyspark.sql import SparkSession
 # $example on$
 from pyspark.ml.feature import Binarizer
+from pyspark.sql import SparkSession
--- End diff --

yes I see, makes perfectly sense !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-29 Thread steveloughran

Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r76593850
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -244,6 +244,31 @@ class SparkHadoopUtil extends Logging {
   }
 
   /**
+   * List directories/files matching the path and return the `FileStatus` 
results.
+   * If the pattern is not a regexp then a simple `getFileStatus(pattern)`
+   * is called to get the status of that path.
+   * If the path/pattern does not match anything in the filesystem,
+   * an empty sequence is returned.
+   * @param pattern pattern
+   * @return a possibly empty array of FileStatus entries
+   */
+  def globToFileStatus(pattern: Path): Array[FileStatus] = {
--- End diff --

essentially if anything which might be a wildcard is hit, it gets handed 
off to the globber for the full interpretation. Same for ^ and ], which are 
only part of a pattern within the context of an opening [

Its only those strings which can be verified to be regexp free in a simple 
context-free string scan that say "absolutely no patterns here"

regarding the bigger change: most of it is isolation of the sensitive code 
*and the tests to verify behaviour*


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14731: [SPARK-17159] [streaming]: optimise check for new...

2016-08-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14731#discussion_r76594233
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala 
---
@@ -244,6 +244,31 @@ class SparkHadoopUtil extends Logging {
   }
 
   /**
+   * List directories/files matching the path and return the `FileStatus` 
results.
+   * If the pattern is not a regexp then a simple `getFileStatus(pattern)`
+   * is called to get the status of that path.
+   * If the path/pattern does not match anything in the filesystem,
+   * an empty sequence is returned.
+   * @param pattern pattern
+   * @return a possibly empty array of FileStatus entries
+   */
+  def globToFileStatus(pattern: Path): Array[FileStatus] = {
--- End diff --

Yea, but then that's wrong if for example my path actually has a ? or ^ or 
] in it. It doesn't seem essential and seems even problematic to add this 
behavior change to an otherwise clear fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14833: fixed a typo

2016-08-29 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14833
  
**[Test build #3235 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3235/consoleFull)**
 for PR 14833 at commit 
[`a93ce34`](https://github.com/apache/spark/commit/a93ce34873b4fe55675feaee26f32251468dceeb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
> What if I am writing explicitly an empty string out? Does it become just 
1,,2?

Yes. It becomes `1,,2` in 2.0, and the same `1,,2` with this patch -- no 
behavior changes.

> Can you also clarify whether this is behavior changing, or something else?

This patch behaves differently from 2.0 when reading `1,,2` back: (given 
`nullValue` the default value: empty string ""), `1,,2` would be read back as 
`1,,2` in 2.0, but would be read back as `1,[null],[null],2` with this patch.

@rxin ~





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14861: [SPARK-17287] [PySpark] Add `recursive` kwarg to ...

2016-08-29 Thread jpiper

GitHub user jpiper opened a pull request:

https://github.com/apache/spark/pull/14861

[SPARK-17287] [PySpark] Add `recursive` kwarg to Java Python 
`SparkContext.addFile`

## What changes were proposed in this pull request?

Add the ability to add entire directories using the PySpark interface 
`SparkContext.addFile(dir, recursive=True)`


## How was this patch tested?

I've added a test file in a nested folders in `python/test_support`. I use 
`addFile` to distribute this folder, and then read the file back using the 
directory structure.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jpiper/spark jpiper/pyspark_addfiles

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14861


commit cabcca30252a189def2b0357b3951fac0870a3db
Author: Jason Piper 
Date:   2016-08-29T09:53:51Z

Add `recursive` kwarg to Java Python `SparkContext.addFile`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14830: [SPARK-16992][PYSPARK] autopep8 on documentation ...

2016-08-29 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/14830#discussion_r76582404
  
--- Diff: examples/src/main/python/ml/binarizer_example.py ---
@@ -17,9 +17,10 @@
 
 from __future__ import print_function
 
-from pyspark.sql import SparkSession
 # $example on$
 from pyspark.ml.feature import Binarizer
+from pyspark.sql import SparkSession
--- End diff --

Some of the examples files are used in generating the website 
documentation, and the "example on" and "example off" tags are used to 
determine which parts get pulled in to the website (in this case this is done 
since we don't want to have the same boiler plate imports for each example - 
rather showing the ones specific to that). You can take a look at 
`./docs/ml-features.md` which includes this file to see how its used in 
markdown and the generated website documentation at 
http://spark.apache.org/docs/latest/ml-features.html#binarizer .

The instructions for building the docs locally are located at 
`./docs/README.md` - let me know if you need any help with that - the 
documentation build is sometimes a bit overlooked since many of the developers 
don't build it manually often.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-29 Thread Stibbons

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/14567#discussion_r76582188
  
--- Diff: python/pep8rc ---
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+[pep8]
--- End diff --

I don't know if they can be merged. Will try it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14860: [SPARK-17264] [SQL] DataStreamWriter should docum...

2016-08-29 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/14860

[SPARK-17264] [SQL] DataStreamWriter should document that it only supports 
Parquet for now

## What changes were proposed in this pull request?

Clarify that only parquet files are supported by DataStreamWriter now

## How was this patch tested?

(Doc build -- no functional changes to test)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-17264

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14860.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14860


commit b73074bbe67ddd69caf0b65fafe36016bf805422
Author: Sean Owen 
Date:   2016-08-29T09:51:30Z

Clarify that only parquet files are supported by DataStreamWriter now




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14536: Merge pull request #1 from apache/master

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14536


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14449: [SPARK-16843][MLLIB] add the percentage ChiSquare...

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14449


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10572: SPARK-12619 Combine small files in a hadoop direc...

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10572


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #10995: [SPARK-13120] [test-maven] Shade protobuf-java

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10995


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12695: [SPARK-14914] Normalize Paths/URIs for windows.

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12695


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13658: [SPARK-15937] [yarn] Improving the logic to wait ...

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13658


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14505: Branch 2.0

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14505


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14810: Branch 1.6

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14810


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12694: [SPARK-14914] Fix Command too long for windows. E...

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12694


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12753: [SPARK-3767] [CORE] Support wildcard in Spark pro...

2016-08-29 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12753


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...

2016-08-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14788
  
For `date_add`, `date_sub`, `add_month`, I think we should support both 
DateType and TimestampType, and the return type should depend on the input type.

For `last_day`, `first_day`, we should support both DateType and 
TimestampType, but the return type should always be DateType

For `date_trunc`, we should support both DateType and TimestampType, but 
the return type should always be TimestampType

cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14849: [BUILD] Closes some stale PRs.

2016-08-29 Thread srowen

Github user srowen closed the pull request at:

https://github.com/apache/spark/pull/14849


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-29 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14567#discussion_r76580634
  
--- Diff: python/pep8rc ---
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+[pep8]
--- End diff --

Maybe I'm missing something but should they be in the same file or are they 
separate?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...

2016-08-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14859
  
Good point, I suppose there is a weak promise there that it runs on 
Windows. 
Could anyone else who knows Windows weigh in? I assume @dongjoon-hyun is on 
board.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-29 Thread lw-lin

Github user lw-lin commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76579917
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --

@viirya maybe test against the following?

- + this patch's changes to ObjectExpressionsSuite.scala
- + this patch's changes to ExpressionEvalHelper.scala (this is also 
critical)
- - this patch's changes to objects.scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...

2016-08-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14859
  
Ah, I thought Windows is already officially supported assuming from this 
documentation 
https://github.com/apache/spark/blob/master/docs/index.md#downloading.

BTW, I do understand your concerns but I believe this will make easy to 
review Window-specific tests. I mean, at least, we can identify 
Windows-specific problems easily and as you already know, I believe it is hard 
to review the PRs for Windows-specific problems currently.

I wouldn't mind if this should be closed because, at least, I proposed a 
automated build on Windows here so reviewers can use this after manually 
merging this anyway.

My personal opinion is, though, to try to use this as it does not affect 
code-base or other builds.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-29 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14712
  
Looks like Jenkins doesn't work for a while.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-29 Thread Stibbons

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/14567#discussion_r76577358
  
--- Diff: python/pep8rc ---
@@ -0,0 +1,21 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+[pep8]
--- End diff --

actually, tox.ini looks more similar to this pep8rc than isort.cfg, but 
github doesn't show it that way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14567: [SPARK-16992][PYSPARK] Python Pep8 formatting and...

2016-08-29 Thread Stibbons

Github user Stibbons commented on a diff in the pull request:

https://github.com/apache/spark/pull/14567#discussion_r76577137
  
--- Diff: dev/isort.cfg ---
@@ -1,9 +1,9 @@
 # Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
+# contributor license agreements. See the NOTICE file distributed with
--- End diff --

No, this is not pep8 related!

I deleted dev/tox.ini and created a new file dev/isort.cfg from scratch. 
But the fact is git "finds" file renames based on content similarities, so it 
sees this as a file rename. Actually, I did deleted the tox.ini in another 
commit than the commit that creates isort.ini (see 
https://github.com/apache/spark/pull/14567/commits), so I guess this 
"visualisation" of a file renaming comes from github


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14712
  
add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

2016-08-29 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14712
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...

2016-08-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14859
  
Hm, we also had Travis config that isn't used now, to try to add Java style 
checking. I can see the value in adding Windows testing, but here we have a 
third CI tool involved. I'm concerned that I for example wouldn't know how to 
maintain it.

I suppose we also need to decide if Windows is even supported? I don't 
think it is supported for development, certainly. For deployment -- best effort 
is my understanding, but may not work.

If this relies on a bunch of setup to run (including needing a sorta 
unofficial copy of Hadoop's winutils) then does testing it this way say much 
about how it works on Windows?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14698: [SPARK-17061][SPARK-17093][SQL] `MapObjects` shou...

2016-08-29 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/14698#discussion_r76576622
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionEvalHelper.scala
 ---
@@ -136,7 +136,7 @@ trait ExpressionEvalHelper extends 
GeneratorDrivenPropertyChecks {
 // some expression is reusing variable names across different 
instances.
 // This behavior is tested in ExpressionEvalHelperSuite.
 val plan = generateProject(
-  GenerateUnsafeProjection.generate(
+  UnsafeProjection.create(
--- End diff --

But looks like this change doesn't reflect in the test? Without this 
change, the added test is passed too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate bui...

2016-08-29 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14859
  
cc 
@rxin 
@srowen (for build)
@JoshRosen (for project infra)
@dongjoon-hyun (who suggested AppVeyor CI)
@steveloughran (who is the author of winutils)
@felixcheung and @shivaram (for SparkR)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14833: fixed a typo

2016-08-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14833
  
Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14859: [SPARK-17200][PROJECT INFRA][BUILD][SparkR] Autom...

2016-08-29 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/14859

[SPARK-17200][PROJECT INFRA][BUILD][SparkR] Automate building and testing 
on Windows (currently SparkR only)

## What changes were proposed in this pull request?

This PR adds the build automation on Windows with 
[AppVeyor](https://www.appveyor.com/) CI tool. 

Currently, this only runs the tests for SparkR as we have been having some 
issues with testing Windows-specific PRs (e.g. 
https://github.com/apache/spark/pull/14743 and 
https://github.com/apache/spark/pull/13165) and hard time to verify this.

One concern is, this build is dependent on 
[steveloughran/winutils](https://github.com/steveloughran/winutils) for 
pre-built Hadoop bin package (who is a Hadoop PMC member).

## How was this patch tested?

Manually, 
https://ci.appveyor.com/project/HyukjinKwon/spark/build/8-SPARK-17200-build

Some tests are already being failed and this was found in 
https://github.com/apache/spark/pull/14743#issuecomment-241405287, which are 
currently as below:

```
Skipped 

1. create DataFrame from RDD (@test_sparkSQL.R#200) - Hive is not build 
with SparkSQL, skipped
2. test HiveContext (@test_sparkSQL.R#1041) - Hive is not build with 
SparkSQL, skipped
3. read/write ORC files (@test_sparkSQL.R#1748) - Hive is not build with 
SparkSQL, skipped
4. enableHiveSupport on SparkSession (@test_sparkSQL.R#2480) - Hive is not 
build with SparkSQL, skipped
Warnings 
---
1. infer types and check types (@test_sparkSQL.R#109) - unable to identify 
current timezone 'C':
please set environment variable 'TZ'
Failed 
-
1. Error: union on two RDDs (@test_binary_function.R#38) 
---
1: textFile(sc, fileName) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_binary_function.R:38
2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions))
3: invokeJava(isStatic = FALSE, objId$id, methodName, ...)
4: stop(readString(conn))
2. Error: zipPartitions() on RDDs (@test_binary_function.R#84) 
-
1: textFile(sc, fileName, 1) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_binary_function.R:84
2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions))
3: invokeJava(isStatic = FALSE, objId$id, methodName, ...)
4: stop(readString(conn))
3. Error: saveAsObjectFile()/objectFile() following textFile() works 
(@test_binaryFile.R#31) 
1: textFile(sc, fileName1, 1) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:31
2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions))
3: invokeJava(isStatic = FALSE, objId$id, methodName, ...)
4: stop(readString(conn))
4. Error: saveAsObjectFile()/objectFile() works on a parallelized list 
(@test_binaryFile.R#46) 
1: objectFile(sc, fileName) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:46
2: callJMethod(sc, "objectFile", path, getMinPartitions(sc, minPartitions))
3: invokeJava(isStatic = FALSE, objId$id, methodName, ...)
4: stop(readString(conn))
5. Error: saveAsObjectFile()/objectFile() following RDD transformations 
works (@test_binaryFile.R#57) 
1: textFile(sc, fileName1) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:57
2: callJMethod(sc, "textFile", path, getMinPartitions(sc, minPartitions))
3: invokeJava(isStatic = FALSE, objId$id, methodName, ...)
4: stop(readString(conn))
6. Error: saveAsObjectFile()/objectFile() works with multiple paths 
(@test_binaryFile.R#85) 
1: objectFile(sc, c(fileName1, fileName2)) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_binaryFile.R:85
2: callJMethod(sc, "objectFile", path, getMinPartitions(sc, minPartitions))
3: invokeJava(isStatic = FALSE, objId$id, methodName, ...)
4: stop(readString(conn))
7. Error: spark.glm save/load (@test_mllib.R#162) 
--
1: read.ml(modelPath) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:162
2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path)
3: invokeJava(isStatic = TRUE, className, methodName, ...)
4: stop(readString(conn))
8. Error: glm save/load (@test_mllib.R#292) 

1: read.ml(modelPath) at 
C:/projects/spark/R/lib/SparkR/tests/testthat/test_mllib.R:292
2: callJStatic("org.apache.spark.ml.r.RWrappers", "load", path)
3: invokeJava(isStatic = TRUE, className, methodName, ...)
4: stop(readString(conn))
9. Error: spark.kmeans (@test_mllib.R#340) 
-
1: read.ml(modelPath) at 
C:/projects/spark/R/lib/S

[GitHub] spark issue #14836: [MINOR][MLlib][SQL] Clean up unused variables and unused...

2016-08-29 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14836
  
I think the other changes are trivial but not wrong. I'd generally not 
bother with these bitty changes. It's not that they're wrong but that it takes 
me some time to go think through whether they're valid, and it's probably not 
worth our time collectively.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 >

501 - 600 of 626 matches

Mail list logo