date:20161031

[GitHub] spark issue #15671: [SPARK-14567][ML]Add instrumentation logs to ML training...

2016-10-31 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15671
  
@sethah Thanks for your reviewing. I have made changes according to your 
comments. And I will create JIRAs for meta algos.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15671: [SPARK-14567][ML]Add instrumentation logs to ML training...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #67878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67878/consoleFull)**
 for PR 15671 at commit 
[`df8734e`](https://github.com/apache/spark/commit/df8734e3166e4ce4f21bcdb4fe33cecff2bc9ddb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread tejasapatil

Github user tejasapatil commented on the issue:

https://github.com/apache/spark/pull/15703
  
This will surely improve performance for the UDAFs where the data shrinks 
(eg. `max` as you pointed out). I am not sure if it would be better for UDAFs 
like `GenericUDAFCollectSet`, `GenericUDAFCollectList` where the aggregation 
does not shrink the data (it might be worse because of conversion cost ?).

It would be good to see some perf numbers to compare (native Spark UDFs) vs 
(before this change) vs (after the change).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67870/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15172
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #67870 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67870/consoleFull)**
 for PR 15172 at commit 
[`daed43c`](https://github.com/apache/spark/commit/daed43c6ee71270adaf57c404adcf41552d01036).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15711: [SPARK-18192] Support all file formats in structured str...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15711
  
**[Test build #67876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67876/consoleFull)**
 for PR 15711 at commit 
[`4a1707e`](https://github.com/apache/spark/commit/4a1707ec3af82afabaaa69f0863da11979e87367).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #67877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67877/consoleFull)**
 for PR 15710 at commit 
[`1c3a645`](https://github.com/apache/spark/commit/1c3a645678647bb60e72c8f445e9a69ebf4bea65).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15706: [SPARK-18189] [SQL] Fix serialization issue in KeyValueG...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15706
  
Thanks! LGTM pending Jenkins.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15706: [SPARK-18189] [SQL] Fix serialization issue in KeyValueG...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15706
  
**[Test build #3388 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3388/consoleFull)**
 for PR 15706 at commit 
[`a9b089a`](https://github.com/apache/spark/commit/a9b089ac9e45e9454bc0b6fa592cc0d624be99a2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15697
  
Build started: [SparkR] `ALL` 
[![PR-15697](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=846E764A-967B-4CB2-A78C-968D86EDFEF2&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/846E764A-967B-4CB2-A78C-968D86EDFEF2)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:846E764A-967B-4CB2-A78C-968D86EDFEF2

I am pretty sure it is fine now :) it seems 3.3.2 was just released.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67874/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #67874 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67874/consoleFull)**
 for PR 15710 at commit 
[`1c906c9`](https://github.com/apache/spark/commit/1c906c9d2132b78d8d566b14ef9a2dbcf5474b26).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15710
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15706: [SPARK-18189] [SQL] Fix serialization issue in KeyValueG...

2016-10-31 Thread seyfe

Github user seyfe commented on the issue:

https://github.com/apache/spark/pull/15706
  
Hi @rxin ,

I added a unittest in 
[DatasetSuite](https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala)
 but I wasn't able to repro it. Bug only reproduce in spark-shell. That's why I 
moved the unittest to 
[ReplSuite](https://github.com/apache/spark/blob/master/repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala).

I checked the history and seems like we are adding unittests to only 
scala-2.11 now. If that's not the case, I can create same one in [ReplSuite 
scala-210](https://github.com/apache/spark/blob/master/repl/scala-2.10/src/test/scala/org/apache/spark/repl/ReplSuite.scala)

Please let me know.

Thanks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67871/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15688
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15688
  
**[Test build #67871 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67871/consoleFull)**
 for PR 15688 at commit 
[`9045ccd`](https://github.com/apache/spark/commit/9045ccd7c1d68f5690efdbbf2bcc21a0bc6646be).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67873/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #67873 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67873/consoleFull)**
 for PR 15710 at commit 
[`e9823e7`](https://github.com/apache/spark/commit/e9823e7fc65ab908456b93f5df1e3d54fa8a14dd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15710
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #3387 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3387/consoleFull)**
 for PR 15710 at commit 
[`e9823e7`](https://github.com/apache/spark/commit/e9823e7fc65ab908456b93f5df1e3d54fa8a14dd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15709
  
Oh, it seems 3.3.1 is just right now released for Windows as well. It seems 
fine now - https://ci.appveyor.com/project/spark-test/spark/build/45-test1122
I will close this if the tests pass fine. 

Maybe, we should take care of using the latest in the future. Thank you for 
quick responses @shivaram !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15711: [SPARK-18192] Support all file formats in structured str...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15711
  
**[Test build #67875 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67875/consoleFull)**
 for PR 15711 at commit 
[`3d6e424`](https://github.com/apache/spark/commit/3d6e424e99909f86176062297c3fadbf3db27454).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15709
  
Oh wait. Please do not merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15709
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67867/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15709
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15709
  
Yeah I'm fine with fixing a version - but as you said its sometimes helpful 
to find if we have some problems with a newly released R version. But yeah we 
should have a stable download URL whichever way we choose. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67866/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15709
  
**[Test build #67867 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67867/consoleFull)**
 for PR 15709 at commit 
[`90fe001`](https://github.com/apache/spark/commit/90fe001145da62391c5a2a9efbdebc201e621e95).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15702
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15711: [SPARK-18192] Support all file formats in structu...

2016-10-31 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/15711

[SPARK-18192] Support all file formats in structured streaming - WIP

## What changes were proposed in this pull request?
I just want to run Jenkins for now.

## How was this patch tested?
WIP

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-18192

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15711.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15711


commit 416ad5fd745c75eccc990363592be7ea7e3e34d3
Author: Reynold Xin 
Date:   2016-11-01T05:24:48Z

[SPARK-18025] Use commit protocol API in structured streaming

commit ed5e5bce6a620b169335413cc198a8c7a7e4a1ad
Author: Reynold Xin 
Date:   2016-11-01T05:26:08Z

Slightly shorter line

commit 70b13e056846178ab24d9a249871ad107e1eb2f0
Author: Reynold Xin 
Date:   2016-11-01T05:30:46Z

Delete more code

commit e9823e7fc65ab908456b93f5df1e3d54fa8a14dd
Author: Reynold Xin 
Date:   2016-11-01T05:32:39Z

Updated documentation

commit 1c906c9d2132b78d8d566b14ef9a2dbcf5474b26
Author: Reynold Xin 
Date:   2016-11-01T05:37:26Z

Configurable commit protocol

commit 3d6e424e99909f86176062297c3fadbf3db27454
Author: Reynold Xin 
Date:   2016-11-01T06:14:57Z

[SPARK-18192] Support all file formats in structured streaming




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15702
  
**[Test build #67866 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67866/consoleFull)**
 for PR 15702 at commit 
[`14a728e`](https://github.com/apache/spark/commit/14a728e76bfe69b114b44193dbcca01479b80423).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-10-31 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13575
  
No problem here. Spark-17924 is super great (I've been watching it all the 
time)  and @rxin thank you for bringing that up! :-D


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-10-31 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13575
  
No problem here. Spark-17924 is super great (I've been watching it all the 
time)  and thank you for bringing that up!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] spark.files should not be pass...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15669
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67868/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] spark.files should not be pass...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15669
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] spark.files should not be pass...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15669
  
**[Test build #67868 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67868/consoleFull)**
 for PR 15669 at commit 
[`230d56c`](https://github.com/apache/spark/commit/230d56c90ce7ff30251a03afe6c677fe9df8faca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15707
  
**[Test build #3386 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3386/consoleFull)**
 for PR 15707 at commit 
[`65ba5c1`](https://github.com/apache/spark/commit/65ba5c14ec976d79fe9ee118807663496d0b7845).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13575: [SPARK-15472][SQL] Add support for writing in `csv`, `js...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/13575
  
@lw-lin sorry we haven't visited this pr in the past, but the goal of this 
pr will be accomplished by https://issues.apache.org/jira/browse/SPARK-17924 
(as a side effect). It reduces a lot of duplicated code as well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15706: [SPARK-18189] [SQL] Fix serialization issue in KeyValueG...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15706
  
**[Test build #3385 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3385/consoleFull)**
 for PR 15706 at commit 
[`07160b9`](https://github.com/apache/spark/commit/07160b93130ba58c29e516368354fea86e5e8865).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67869/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #67869 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67869/consoleFull)**
 for PR 1 at commit 
[`9b89e31`](https://github.com/apache/spark/commit/9b89e315f83a792d62d02d56f46448d339a705e8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/1
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15708: [SPARK-18167] [SQL] Retry when the SQLQuerySuite test fl...

2016-10-31 Thread ericl

Github user ericl commented on the issue:

https://github.com/apache/spark/pull/15708
  
Ok interesting, it seems the test repeatedly fails. This probably means it 
is a suite ordering issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15709
  
I see. That seems nice too but I hope the version is fixed if this one is 
also fine too..

I am worried if the version is suddenly bumped up and some tests are failed 
(at least, I was a bit upset when all other AppVeyor wend suddenly failed). 
Maybe, I can try make the script take the version `latest` which uses that link.

I remember I met a incompatibility case between 3.1.x and 3.3.x when I use 
`env` (it seems it is immutable in 3.3.x but not in 3.1.x) which showed a 
failure mark. It is good to find such cases ahead but it might be better if it 
does not affect other PRs.

(Not a strong opinion but I just wanted to what I thought).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15710: [SPARK-18025] Use commit protocol API in structur...

2016-10-31 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15710#discussion_r85878038
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
 ---
@@ -84,17 +84,22 @@ case class InsertIntoHadoopFsRelationCommand(
 val isAppend = pathExists && (mode == SaveMode.Append)
 
 if (doInsertion) {
-  WriteOutput.write(
-sparkSession,
-query,
-fileFormat,
-qualifiedOutputPath,
-hadoopConf,
-partitionColumns,
-bucketSpec,
-refreshFunction,
-options,
+  val committer = FileCommitProtocol.instantiate(
+sparkSession.sessionState.conf.fileCommitProtocolClass,
+outputPath.toString,
 isAppend)
+
+  WriteOutput.write(
--- End diff --

I'm thinking I should just rename WriteOutput to FileFormatOutput


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15710
  
cc @ericl, @marmbrus, @zsxwing and @lw-lin (I guess this would supersede 
your old PR).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85877935
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
--- End diff --

not tested yet for windows. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #67874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67874/consoleFull)**
 for PR 15710 at commit 
[`1c906c9`](https://github.com/apache/spark/commit/1c906c9d2132b78d8d566b14ef9a2dbcf5474b26).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15703: [SPARK-18186] Migrate HiveUDAFFunction to TypedImperativ...

2016-10-31 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/15703
  
I can't reproduce those test failures when executing failed test cases 
individually. Seems that it's related to execution order. Still investigating.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85877793
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
+if (virtualEnvType == "native") {
+  execCommand(Arrays.asList(virtualPythonExec, "-m", "pip",
+"--cache-dir", System.getProperty("user.home"),
+"install", "-r", pyspark_requirements))
+}
+  }
+
+  def execCommand(commands: java.util.List[String]): Unit = {
+logDebug("Running command:" + commands.asScala.mkString(" "))
+val pb = new ProcessBuilder(commands).inheritIO()
+// pip internally use environment variable `HOME`
+pb.environment().put("HOME", System.getProperty("user.home"))
--- End diff --

For yarn mode, HOME is "/home/" which is not correct. So here I get it from 
system property user.home
launch_container.sh
```
export HOME="/home/"
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #67873 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67873/consoleFull)**
 for PR 15710 at commit 
[`e9823e7`](https://github.com/apache/spark/commit/e9823e7fc65ab908456b93f5df1e3d54fa8a14dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pyspark

2016-10-31 Thread zjffdu

Github user zjffdu commented on the issue:

https://github.com/apache/spark/pull/13599
  
Thanks for the review @mridulm , this approach is trying the move the 
overhead from user to cluster. User just need to specify the requirement file 
and spark will set up the virtualenv automatically. This is consistent with the 
usage of single machine virtualenv setup.  `SPARK-16367` use another approach 
to distribute the dependencies via spark-submit, but require the cluster to be 
homogeneous. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15707
  
**[Test build #3384 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3384/consoleFull)**
 for PR 15707 at commit 
[`0177ded`](https://github.com/apache/spark/commit/0177ded3357a195f48e8e23923b763937ff60cac).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class HadoopCommitProtocolWrapper(path: String, isAppend: Boolean)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15710
  
**[Test build #3387 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3387/consoleFull)**
 for PR 15710 at commit 
[`e9823e7`](https://github.com/apache/spark/commit/e9823e7fc65ab908456b93f5df1e3d54fa8a14dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15710
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15710: [SPARK-18025] Use commit protocol API in structured stre...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67872/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/15709
  
This approach is fine - The other thing we could do is just use the latest 
stable version as described in https://cloud.r-project.org/bin/windows/base/ - 
If you see the link at the bottom it says ```
Note to webmasters: A stable link which will redirect to the current 
Windows binary release is
https://cloud.r-project.org/bin/windows/base/release.htm
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15710: [SPARK-18025] Use commit protocol API in structur...

2016-10-31 Thread rxin

GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/15710

[SPARK-18025] Use commit protocol API in structured streaming

## What changes were proposed in this pull request?
This patch adds a new commit protocol implementation 
ManifestFileCommitProtocol that follows the existing streaming flow, and uses 
it in FileStreamSink to consolidate the write path in structured streaming with 
the batch mode write path.

Note that this would make it trivial to support other functionalities that 
are currently available in batch but not in streaming, including all file 
formats and bucketing.

## How was this patch tested?
Should be covered by existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-18025

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15710.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15710


commit 416ad5fd745c75eccc990363592be7ea7e3e34d3
Author: Reynold Xin 
Date:   2016-11-01T05:24:48Z

[SPARK-18025] Use commit protocol API in structured streaming

commit ed5e5bce6a620b169335413cc198a8c7a7e4a1ad
Author: Reynold Xin 
Date:   2016-11-01T05:26:08Z

Slightly shorter line




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15707: [SPARK-18024][SQL] Introduce an internal commit p...

2016-10-31 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15707


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15707
  
Looks like the test failed due to a flaky test, but other than that 
everything else was fine. I'm going to merge this optimistically.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread mallman

Github user mallman commented on the issue:

https://github.com/apache/spark/pull/15673
  
@rxin I believe https://issues.apache.org/jira/browse/SPARK-18168 will need 
to be resolved before I can rebase this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15707
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15707
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67865/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15705
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15705
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67864/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15707: [SPARK-18024][SQL] Introduce an internal commit protocol...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15707
  
**[Test build #67865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67865/consoleFull)**
 for PR 15707 at commit 
[`65ba5c1`](https://github.com/apache/spark/commit/65ba5c14ec976d79fe9ee118807663496d0b7845).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVERWRITE] ...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15705
  
**[Test build #67864 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67864/consoleFull)**
 for PR 15705 at commit 
[`0daff74`](https://github.com/apache/spark/commit/0daff7475e456754538e65b9f324773218f4f943).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15705: [SPARK-18183] [SPARK-18184] Fix INSERT [INTO|OVER...

2016-10-31 Thread ericl

Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15705#discussion_r85875944
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CatalogFileIndex.scala
 ---
@@ -67,7 +67,10 @@ class CatalogFileIndex(
   val selectedPartitions = 
sparkSession.sessionState.catalog.listPartitionsByFilter(
 table.identifier, filters)
   val partitions = selectedPartitions.map { p =>
-PartitionPath(p.toRow(partitionSchema), p.storage.locationUri.get)
+val path = new Path(p.storage.locationUri.get)
+val fs = path.getFileSystem(hadoopConf)
+PartitionPath(
+  p.toRow(partitionSchema), path.makeQualified(fs.getUri, 
fs.getWorkingDirectory))
--- End diff --

Apparently not. The unit test actually fails if you do that, since the path 
seems to be missing the `file:` prefix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15172: [SPARK-13331] AES support for over-the-wire encry...

2016-10-31 Thread cjjnjust

Github user cjjnjust commented on a diff in the pull request:

https://github.com/apache/spark/pull/15172#discussion_r85875690
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/sasl/SaslRpcHandler.java
 ---
@@ -80,46 +84,71 @@ public void receive(TransportClient client, ByteBuffer 
message, RpcResponseCallb
   delegate.receive(client, message, callback);
   return;
 }
+if (saslServer == null || !saslServer.isComplete()) {
+  ByteBuf nettyBuf = Unpooled.wrappedBuffer(message);
+  SaslMessage saslMessage;
+  try {
+saslMessage = SaslMessage.decode(nettyBuf);
+  } finally {
+nettyBuf.release();
+  }
 
-ByteBuf nettyBuf = Unpooled.wrappedBuffer(message);
-SaslMessage saslMessage;
-try {
-  saslMessage = SaslMessage.decode(nettyBuf);
-} finally {
-  nettyBuf.release();
-}
-
-if (saslServer == null) {
-  // First message in the handshake, setup the necessary state.
-  client.setClientId(saslMessage.appId);
-  saslServer = new SparkSaslServer(saslMessage.appId, secretKeyHolder,
-conf.saslServerAlwaysEncrypt());
-}
+  if (saslServer == null) {
+// First message in the handshake, setup the necessary state.
+client.setClientId(saslMessage.appId);
+saslServer = new SparkSaslServer(saslMessage.appId, 
secretKeyHolder,
+  conf.saslServerAlwaysEncrypt());
+  }
 
-byte[] response;
-try {
-  response = saslServer.response(JavaUtils.bufferToArray(
-saslMessage.body().nioByteBuffer()));
-} catch (IOException ioe) {
-  throw new RuntimeException(ioe);
+  byte[] response;
+  try {
+response = saslServer.response(JavaUtils.bufferToArray(
+  saslMessage.body().nioByteBuffer()));
+  } catch (IOException ioe) {
+throw new RuntimeException(ioe);
+  }
+  callback.onSuccess(ByteBuffer.wrap(response));
 }
-callback.onSuccess(ByteBuffer.wrap(response));
 
 // Setup encryption after the SASL response is sent, otherwise the 
client can't parse the
 // response. It's ok to change the channel pipeline here since we are 
processing an incoming
 // message, so the pipeline is busy and no new incoming messages will 
be fed to it before this
 // method returns. This assumes that the code ensures, through other 
means, that no outbound
 // messages are being written to the channel while negotiation is 
still going on.
 if (saslServer.isComplete()) {
-  logger.debug("SASL authentication successful for channel {}", 
client);
-  isComplete = true;
-  if 
(SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP)))
 {
+  if 
(!SparkSaslServer.QOP_AUTH_CONF.equals(saslServer.getNegotiatedProperty(Sasl.QOP)))
 {
+logger.debug("SASL authentication successful for channel {}", 
client);
+complete(true);
+return ;
+  }
+
+  if (!conf.AesEncryptionEnabled()) {
 logger.debug("Enabling encryption for channel {}", client);
 SaslEncryption.addToChannel(channel, saslServer, 
conf.maxSaslEncryptedBlockSize());
-saslServer = null;
-  } else {
-saslServer.dispose();
-saslServer = null;
+complete(false);
+return;
+  }
+
+  // Extra negotiation should happen after authentication, so return 
directly while
+  // processing authenticate.
+  if (!isAuthenticated) {
+logger.debug("SASL authentication successful for channel {}", 
client);
+isAuthenticated = true;
+return;
+  }
+
+  // Create AES cipher when it is authenticated
+  try {
+AesConfigMessage configMessage = 
AesConfigMessage.decodeMessage(message);
+AesCipher cipher = new AesCipher(configMessage);
+
+// Send response back to client to confirm that server accept 
config.
+callback.onSuccess(JavaUtils.stringToBytes(AesCipher.TRANSFORM));
--- End diff --

This should be call after AddtoChannel, otherwise it will be sent with 
encrypted data while at the moment client is not ready for decryption.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For ad

[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...

2016-10-31 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15688
  
@gatorsmile that's a good point. I checked with hive, the behaviour is:

1. if the given partition spec specifies all partition columns, throw 
exception if the partition not exist
2. if the given partition spec is partial, do nothing if the partition not 
exist

I have updated this PR according to this 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15688
  
**[Test build #67871 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67871/consoleFull)**
 for PR 15688 at commit 
[`9045ccd`](https://github.com/apache/spark/commit/9045ccd7c1d68f5690efdbbf2bcc21a0bc6646be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15541
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67863/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15541
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15541: [SPARK-17637][Scheduler]Packed scheduling for Spark task...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15541
  
**[Test build #67863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67863/consoleFull)**
 for PR 15541 at commit 
[`a820e96`](https://github.com/apache/spark/commit/a820e96284f1d9108ef62cd3ef55171ebd47e08f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15673
  
@mallman can you bring this up-to-date?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15673
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67859/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15673
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15673: [SPARK-17992][SQL] Return all partitions from HiveShim w...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15673
  
**[Test build #67859 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67859/consoleFull)**
 for PR 15673 at commit 
[`1ed3301`](https://github.com/apache/spark/commit/1ed3301ec4dcbcccde4cacd21909de4f97902e20).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15172: [SPARK-13331] AES support for over-the-wire encryption

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15172
  
**[Test build #67870 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67870/consoleFull)**
 for PR 15172 at commit 
[`daed43c`](https://github.com/apache/spark/commit/daed43c6ee71270adaf57c404adcf41552d01036).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15666
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67860/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15666
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15666: [SPARK-11421] [Core][Python][R] Added ability for addJar...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15666
  
**[Test build #67860 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67860/consoleFull)**
 for PR 15666 at commit 
[`26b39de`](https://github.com/apache/spark/commit/26b39de51f9a76b121ebcb70079072dfcc9972bd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859191
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -55,6 +65,11 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 envVars.getOrElse("PYTHONPATH", ""),
 sys.env.getOrElse("PYTHONPATH", ""))
 
+
+  if (conf.getBoolean("spark.pyspark.virtualenv.enabled", false)) {
--- End diff --

virtualEnvEnabled instead ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85860115
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
+if (virtualEnvType == "native") {
+  execCommand(Arrays.asList(virtualPythonExec, "-m", "pip",
+"--cache-dir", System.getProperty("user.home"),
+"install", "-r", pyspark_requirements))
+}
+  }
+
+  def execCommand(commands: java.util.List[String]): Unit = {
+logDebug("Running command:" + commands.asScala.mkString(" "))
+val pb = new ProcessBuilder(commands).inheritIO()
+// pip internally use environment variable `HOME`
+pb.environment().put("HOME", System.getProperty("user.home"))
+val proc = pb.start()
+val exitCode = proc.waitFor()
--- End diff --

Perhaps timed wait and aborting with failure in case of timeout instead of 
indefinite wait.
Though PROCESS_WAIT_TIMEOUT_MS might be too low ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85872283
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -307,6 +387,7 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
 }
 
 private object PythonWorkerFactory {
+  val VIRTUALENV_ID = new AtomicInteger()
--- End diff --

More restrictive acl would be good


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859768
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,67 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
+if (virtualEnvType == "native") {
+  execCommand(Arrays.asList(virtualPythonExec, "-m", "pip",
+"--cache-dir", System.getProperty("user.home"),
--- End diff --

In general, creating this in home directory might not work (home vs cwd 
disk constraints).
Why not simply create a temporary directory and create it there ? Usually 
java.io.tmpdir is mapped to something where there is adequate space (in yarn, 
mesos for example).

Or did I misread the intent here ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859164
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -46,6 +50,12 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   val daemonWorkers = new mutable.WeakHashMap[Socket, Int]()
   val idleWorkers = new mutable.Queue[Socket]()
   var lastActivity = 0L
+  val virtualEnvEnabled = 
conf.getBoolean("spark.pyspark.virtualenv.enabled", false)
+  val virtualEnvType = conf.get("spark.pyspark.virtualenv.type", "native")
+  val virtualEnvPath = conf.get("spark.pyspark.virtualenv.bin.path", "")
+  var virtualEnvName: String = _
+  var virtualPythonExec: String = _
+
--- End diff --

Make these private if not required outside of the class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859942
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
+if (virtualEnvType == "native") {
+  execCommand(Arrays.asList(virtualPythonExec, "-m", "pip",
+"--cache-dir", System.getProperty("user.home"),
+"install", "-r", pyspark_requirements))
+}
+  }
+
+  def execCommand(commands: java.util.List[String]): Unit = {
+logDebug("Running command:" + commands.asScala.mkString(" "))
+val pb = new ProcessBuilder(commands).inheritIO()
+// pip internally use environment variable `HOME`
+pb.environment().put("HOME", System.getProperty("user.home"))
--- End diff --

This should implicitly be propagated, or is it for windows support ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13599: [SPARK-13587] [PYSPARK] Support virtualenv in pys...

2016-10-31 Thread mridulm

Github user mridulm commented on a diff in the pull request:

https://github.com/apache/spark/pull/13599#discussion_r85859906
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala ---
@@ -69,6 +84,66 @@ private[spark] class PythonWorkerFactory(pythonExec: 
String, envVars: Map[String
   }
 
   /**
+   * Create virtualenv using native virtualenv or conda
+   *
+   * Native Virtualenv:
+   *   -  Execute command: virtualenv -p pythonExec --no-site-packages 
virtualenvName
+   *   -  Execute command: python -m pip --cache-dir cache-dir install -r 
requirement_file
+   *
+   * Conda
+   *   -  Execute command: conda create --prefix prefix --file 
requirement_file -y
+   *
+   */
+  def setupVirtualEnv(): Unit = {
+logDebug("Start to setup virtualenv...")
+logDebug("user.dir=" + System.getProperty("user.dir"))
+logDebug("user.home=" + System.getProperty("user.home"))
+
+require(virtualEnvType == "native" || virtualEnvType == "conda",
+  s"VirtualEnvType: ${virtualEnvType} is not supported" )
+virtualEnvName = "virtualenv_" + conf.getAppId + "_" + 
VIRTUALENV_ID.getAndIncrement()
+// use the absolute path when it is local mode otherwise just use 
filename as it would be
+// fetched from FileServer
+val pyspark_requirements =
+  if (Utils.isLocalMaster(conf)) {
+conf.get("spark.pyspark.virtualenv.requirements")
+  } else {
+conf.get("spark.pyspark.virtualenv.requirements").split("/").last
+  }
+
+val createEnvCommand =
+  if (virtualEnvType == "native") {
+Arrays.asList(virtualEnvPath,
+  "-p", pythonExec,
+  "--no-site-packages", virtualEnvName)
+  } else {
+Arrays.asList(virtualEnvPath,
+  "create", "--prefix", System.getProperty("user.dir") + "/" + 
virtualEnvName,
+  "--file", pyspark_requirements, "-y")
+  }
+execCommand(createEnvCommand)
+// virtualenv will be created in the working directory of Executor.
+virtualPythonExec = virtualEnvName + "/bin/python"
--- End diff --

curious how this works under windows ... not supported ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION sho...

2016-10-31 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15704#discussion_r85873381
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -631,6 +631,42 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
 client.dropPartitions(db, table, parts.map(lowerCasePartitionSpec), 
ignoreIfNotExists, purge)
   }
 
+  override def dropPartitionsByFilter(
--- End diff --

As we already have `listPartitionsByFilter`, do we need another 
`dropPartitionsByFilter`? Looks like there are many duplicate codes between 
them.

We can combine `listPartitionsByFilter` and `dropPartitions` do the same 
thing, instead of adding new API like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread eyalfa

Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/1
  
@hvanhovell please have a look.
BTW, for some reason Jenkins shows all test cases as 'sql', see 
[here](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67837/testReport/org.apache.spark.sql/SQLQueryTestSuite/)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/1
  
**[Test build #67869 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67869/consoleFull)**
 for PR 1 at commit 
[`9b89e31`](https://github.com/apache/spark/commit/9b89e315f83a792d62d02d56f46448d339a705e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15709
  
(cc @nchammas too who I also guess met this issue first. I would like to 
let him know.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15697: [SparkR][Test]:remove unnecessary suppressWarnings

2016-10-31 Thread wangmiao1981

Github user wangmiao1981 commented on the issue:

https://github.com/apache/spark/pull/15697
  
@HyukjinKwon Thanks for your time!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15709: [SPARK-18190][Build][SparkR] Fix R version to not the la...

2016-10-31 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/15709
  
Maybe, I might create a JIRA to describe the known issues with AppVeyor. 
(does it make sense?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15669: [SPARK-18160][CORE][YARN] spark.files should not be pass...

2016-10-31 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15669
  
**[Test build #67868 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67868/consoleFull)**
 for PR 15669 at commit 
[`230d56c`](https://github.com/apache/spark/commit/230d56c90ce7ff30251a03afe6c677fe9df8faca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15669: [SPARK-18160][CORE][YARN] spark.files should not ...

2016-10-31 Thread zjffdu

Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/15669#discussion_r85872343
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1716,29 +1716,12 @@ class SparkContext(config: SparkConf) extends 
Logging {
 key = uri.getScheme match {
   // A JAR file which exists only on the driver node
   case null | "file" =>
-if (master == "yarn" && deployMode == "cluster") {
-  // In order for this to work in yarn cluster mode the user 
must specify the
-  // --addJars option to the client to upload the file into 
the distributed cache
-  // of the AM to make it show up in the current working 
directory.
-  val fileName = new Path(uri.getPath).getName()
-  try {
-env.rpcEnv.fileServer.addJar(new File(fileName))
-  } catch {
-case e: Exception =>
-  // For now just log an error but allow to go through so 
spark examples work.
-  // The spark examples don't really need the jar 
distributed since its also
-  // the app jar.
-  logError("Error adding jar (" + e + "), was the 
--addJars option used?")
-  null
-  }
-} else {
-  try {
-env.rpcEnv.fileServer.addJar(new File(uri.getPath))
-  } catch {
-case exc: FileNotFoundException =>
-  logError(s"Jar not found at $path")
-  null
-  }
--- End diff --

These are obsoleted code IMO, so I remove them. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15704: [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should sup...

2016-10-31 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15704
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 601 matches

Mail list logo