[GitHub] spark issue #17732: Branch 2.0

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17732
  
ping @tangchun 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17737
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76095/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17737
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17737
  
**[Test build #76095 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76095/testReport)**
 for PR 17737 at commit 
[`2815ff1`](https://github.com/apache/spark/commit/2815ff167b0ce9f6e0d2d6ae9f3d4fb0f3ce94d2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL] Can't use same regex pattern between ...

2017-04-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/17736
  
cc @hvanhovell for review ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17739
  
**[Test build #76096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76096/testReport)**
 for PR 17739 at commit 
[`78e060e`](https://github.com/apache/spark/commit/78e060e3455ecdc95fdedb6adccc0a375188e2d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17222: [SPARK-19439][PYSPARK][SQL] PySpark's registerJavaFuncti...

2017-04-23 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/17222
  
@zjffdu - how would you feel about putting the return back, and just 
plumbing it through as required? It seems like it would be useful to have users 
able to programmatically do this (I find my self effectively doing this in some 
of my own personnel notebooks)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17739: [SPARK-20443][MLLIB][ML] set ALS blockify size

2017-04-23 Thread mpjlu
GitHub user mpjlu opened a pull request:

https://github.com/apache/spark/pull/17739

[SPARK-20443][MLLIB][ML] set ALS blockify size

## What changes were proposed in this pull request?


The blockSize of MLLIB ALS is very important for ALS performance.
In our test, when the blockSize is 128, the performance is about 4X 
comparing with the blockSize is 4096 (default value).
The following are our test results:
BlockSize(recommendationForAll time)
128(124s), 256(160s), 512(184s), 1024(244s), 2048(332s), 4096(488s), 
8192(OOM)

The Test Environment:
3 workers: each work 10 core, each work 30G memory, each work 1 executor.
The Data: User 48W, and Item 1.7W


## How was this patch tested?
The existing UT


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mpjlu/spark setAlsBlockSize

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17739.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17739


commit 78e060e3455ecdc95fdedb6adccc0a375188e2d5
Author: Peng 
Date:   2017-04-24T05:01:13Z

set ALS blockify size




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17736
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76094/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17736
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17736
  
**[Test build #76094 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76094/testReport)**
 for PR 17736 at commit 
[`f295782`](https://github.com/apache/spark/commit/f29578219d6eebc9913c359a360ff9eafcb513fc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17737
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17737
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76093/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17737
  
**[Test build #76093 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76093/testReport)**
 for PR 17737 at commit 
[`af8ac74`](https://github.com/apache/spark/commit/af8ac74b624d54b16339083319e33e8af098655e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17737
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76092/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17737
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17737
  
**[Test build #76092 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76092/testReport)**
 for PR 17737 at commit 
[`bb5de1f`](https://github.com/apache/spark/commit/bb5de1f2ef66a4775c8d8bc4f632535d45b3f0b4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/17736
  
LGTM. Thanks. @cloud-fan @rxin this fixes our production jobs when we port 
our applications from 1.6 to 2.0. I think it's a important bug fix. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17736
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76091/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17736
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17736
  
**[Test build #76091 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76091/testReport)**
 for PR 17736 at commit 
[`a0f4a13`](https://github.com/apache/spark/commit/a0f4a13763c077e57c2dcb5fff12d81f3bb2ceb9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-23 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
I scanned split critical of sklearn and xgboost.

1. sklearn
count all continuous values and split at mean value.

commit 5147fd09c6a063188efde444f47bd006fa5f95f0
sklearn/tree/_splitter.pyx: 484:
```python
current.threshold = (Xf[p - 1] + Xf[p]) / 2.0
```

2. xgboost: 
commit 49bdb5c97fccd81b1fdf032eab4599a065c6c4f6

+ If all continuous values are used as candidate, it uses mean value.

   src/tree/updater_colmaker.cc: 555:
   ```c++
   e.best.Update(loss_chg, fid, (fvalue + e.last_fvalue) * 0.5f, d_step 
== -1);
   ```
+ If continuous feature are quantized, it uses `cut`. I'm not familiar 
with C++ and update_histmaker.cc is a little complicate, hence I don't know 
what is `cut` indeed. However, it should be the same with current spark's split 
critical, I guess.

   src/tree/updater_histmaker.cc: 194:
   ```c++
   if (best->Update(static_cast(loss_chg), fid, hist.cut[i], 
false)) {
   ```

Anyway,  weighted mean is more reasonable than mean or cut value in my 
option.  And the PR is trivial enhancement for tree module, and it's not worth 
to spend much time because of obvious conclusion. 

However, we will be more confident if more feedback of experts are 
collected.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17737
  
**[Test build #76095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76095/testReport)**
 for PR 17737 at commit 
[`2815ff1`](https://github.com/apache/spark/commit/2815ff167b0ce9f6e0d2d6ae9f3d4fb0f3ce94d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17738: [SPARK-20422][Spark Core] Worker registration retries sh...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17738
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17738: [SPARK-20422][Spark Core] Worker registration ret...

2017-04-23 Thread unsleepy22
GitHub user unsleepy22 opened a pull request:

https://github.com/apache/spark/pull/17738

[SPARK-20422][Spark Core] Worker registration retries should be configurable

## What changes were proposed in this pull request?
make prolonged registration retries configurable

## How was this patch tested?
unit tests, integration tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/unsleepy22/spark SPARK-20422

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17738.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17738


commit 8bb2d4a37d4db8d8e9c78c41de3328ada30ea693
Author: Cody 
Date:   2017-04-24T04:02:43Z

make prolonged registration retries configurable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17737
  
cc @srowen, @holdenk, @felixcheung, @map222 and @zero323 who were in 
related PRs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861995
  
--- Diff: python/pyspark/sql/column.py ---
@@ -337,26 +381,39 @@ def isin(self, *cols):
 return Column(jc)
 
 # order
-asc = _unary_op("asc", "Returns a sort expression based on the"
-   " ascending order of the given column name.")
-desc = _unary_op("desc", "Returns a sort expression based on the"
- " descending order of the given column name.")
+_asc_doc = """
+Returns a sort expression based on the ascending order of the given 
column name
+
+>>> from pyspark.sql import Row
+>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), 
Row(name=u'Alice', height=None)])
+>>> df2.select(df2.name).orderBy(df2.name.asc()).collect()
+[Row(name=u'Alice'), Row(name=u'Tom')]
+"""
+_desc_doc = """
+Returns a sort expression based on the descending order of the given 
column name.
+
+>>> from pyspark.sql import Row
+>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), 
Row(name=u'Alice', height=None)])
+>>> df2.select(df2.name).orderBy(df2.name.desc()).collect()
+[Row(name=u'Tom'), Row(name=u'Alice')]
+"""
+
+asc = ignore_unicode_prefix(_unary_op("asc", _asc_doc))
+desc = ignore_unicode_prefix(_unary_op("desc", _desc_doc))
 
 _isNull_doc = """
-True if the current expression is null. Often combined with
-:func:`DataFrame.filter` to select rows with null values.
--- End diff --

`Often combined with :func:`DataFrame.filter` to select rows with null 
values.` was removed because it looks applying to many other APIs and look too 
much. It just follows Scala one now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861905
  
--- Diff: python/pyspark/sql/column.py ---
@@ -185,17 +185,52 @@ def __contains__(self, item):
  "in a string column or 'array_contains' function 
for an array column.")
 
 # bitwise operators
-bitwiseOR = _bin_op("bitwiseOR")
-bitwiseAND = _bin_op("bitwiseAND")
-bitwiseXOR = _bin_op("bitwiseXOR")
+_bitwiseOR_doc = """
+Compute bitwise OR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise or(|) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseOR(df3.b)).collect()
+[Row((a | b)=235)]
+"""
+
+_bitwiseAND_doc = """
+Compute bitwise AND of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise and(&) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseAND(df3.b)).collect()
+[Row((a & b)=10)]
+"""
+
+_bitwiseXOR_doc = """
+Compute bitwise XOR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise xor(^) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseXOR(df3.b)).collect()
+[Row((a ^ b)=225)]
+"""
--- End diff --

This is matched with Scala one.

> Compute bitwise XOR of this expression with another expression


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861876
  
--- Diff: python/pyspark/sql/column.py ---
@@ -185,17 +185,52 @@ def __contains__(self, item):
  "in a string column or 'array_contains' function 
for an array column.")
 
 # bitwise operators
-bitwiseOR = _bin_op("bitwiseOR")
-bitwiseAND = _bin_op("bitwiseAND")
-bitwiseXOR = _bin_op("bitwiseXOR")
+_bitwiseOR_doc = """
+Compute bitwise OR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise or(|) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseOR(df3.b)).collect()
+[Row((a | b)=235)]
+"""
--- End diff --

This is matched with Scala one. 

> Compute bitwise OR of this expression with another expression


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17736
  
**[Test build #76094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76094/testReport)**
 for PR 17736 at commit 
[`f295782`](https://github.com/apache/spark/commit/f29578219d6eebc9913c359a360ff9eafcb513fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861887
  
--- Diff: python/pyspark/sql/column.py ---
@@ -185,17 +185,52 @@ def __contains__(self, item):
  "in a string column or 'array_contains' function 
for an array column.")
 
 # bitwise operators
-bitwiseOR = _bin_op("bitwiseOR")
-bitwiseAND = _bin_op("bitwiseAND")
-bitwiseXOR = _bin_op("bitwiseXOR")
+_bitwiseOR_doc = """
+Compute bitwise OR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise or(|) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseOR(df3.b)).collect()
+[Row((a | b)=235)]
+"""
+
+_bitwiseAND_doc = """
+Compute bitwise AND of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise and(&) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseAND(df3.b)).collect()
+[Row((a & b)=10)]
+"""
--- End diff --

This is matched with Scala one.

> Compute bitwise AND of this expression with another expression


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112860925
  
--- Diff: python/pyspark/sql/column.py ---
@@ -185,17 +185,52 @@ def __contains__(self, item):
  "in a string column or 'array_contains' function 
for an array column.")
 
 # bitwise operators
-bitwiseOR = _bin_op("bitwiseOR")
-bitwiseAND = _bin_op("bitwiseAND")
-bitwiseXOR = _bin_op("bitwiseXOR")
+_bitwiseOR_doc = """
+Compute bitwise OR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise or(|) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseOR(df3.b)).collect()
+[Row((a | b)=235)]
+"""
+
+_bitwiseAND_doc = """
+Compute bitwise AND of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise and(&) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseAND(df3.b)).collect()
+[Row((a & b)=10)]
+"""
--- End diff --

![2017-04-24 12 43 
26](https://cloud.githubusercontent.com/assets/6477701/25321715/b64d798a-28eb-11e7-9e0f-96563c9717b4.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861613
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/bitwiseExpressions.scala
 ---
@@ -86,7 +86,7 @@ case class BitwiseOr(left: Expression, right: Expression) 
extends BinaryArithmet
 }
 
 /**
- * A function that calculates bitwise xor of two numbers.
+ * A function that calculates bitwise xor({@literal ^}) of two numbers.
--- End diff --

Matching it up with `BitwiseAnd` and `BitwiseOr` where

> A function that calculates bitwise and(&) of two numbers.

> A function that calculates bitwise or(|) of two numbers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112860980
  
--- Diff: python/pyspark/sql/column.py ---
@@ -251,15 +286,16 @@ def __iter__(self):
 
 # string methods
 _rlike_doc = """
-Return a Boolean :class:`Column` based on a regex match.
+SQL RLIKE expression (LIKE with Regex). Returns a boolean 
:class:`Column` based on a regex
+match.
 
 :param other: an extended regex expression
 
 >>> df.filter(df.name.rlike('ice$')).collect()
 [Row(age=2, name=u'Alice')]
 """
--- End diff --

![2017-04-24 12 44 
44](https://cloud.githubusercontent.com/assets/6477701/25321726/ce6f8f62-28eb-11e7-9fbe-dc6321e00e77.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861744
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -1008,7 +1009,7 @@ class Column(val expr: Expression) extends Logging {
   def cast(to: String): Column = cast(CatalystSqlParser.parseDataType(to))
 
   /**
-   * Returns an ordering used in sorting.
+   * Returns a sort expression based on the descending order of the column.
--- End diff --

This and the similar instances below are matched with `functions.scala`. 
They look calling the same ones.

> Returns a sort expression based on the descending order of the column.

> Returns a sort expression based on the descending order of the column,
> and null values appear before non-null values.

> Returns a sort expression based on the descending order of the column,
> and null values appear after non-null values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861531
  
--- Diff: python/pyspark/sql/column.py ---
@@ -337,26 +381,39 @@ def isin(self, *cols):
 return Column(jc)
 
 # order
-asc = _unary_op("asc", "Returns a sort expression based on the"
-   " ascending order of the given column name.")
-desc = _unary_op("desc", "Returns a sort expression based on the"
- " descending order of the given column name.")
+_asc_doc = """
+Returns a sort expression based on the ascending order of the given 
column name
+
+>>> from pyspark.sql import Row
+>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), 
Row(name=u'Alice', height=None)])
+>>> df2.select(df2.name).orderBy(df2.name.asc()).collect()
+[Row(name=u'Alice'), Row(name=u'Tom')]
+"""
--- End diff --

![2017-04-24 12 54 
55](https://cloud.githubusercontent.com/assets/6477701/25321941/5903bdbe-28ed-11e7-8d08-5fbab1411f02.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861035
  
--- Diff: python/pyspark/sql/column.py ---
@@ -288,8 +324,16 @@ def __iter__(self):
 >>> df.filter(df.name.endswith('ice$')).collect()
 []
 """
+_contains_doc = """
+Contains the other element. Returns a boolean :class:`Column` based on 
a string match.
+
+:param other: string in line
+
+>>> df.filter(df.name.contains('o')).collect()
+[Row(age=5, name=u'Bob')]
+"""
--- End diff --

![2017-04-24 12 45 
57](https://cloud.githubusercontent.com/assets/6477701/25321748/fba744ca-28eb-11e7-9e40-534cda541f90.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861532
  
--- Diff: python/pyspark/sql/column.py ---
@@ -337,26 +381,39 @@ def isin(self, *cols):
 return Column(jc)
 
 # order
-asc = _unary_op("asc", "Returns a sort expression based on the"
-   " ascending order of the given column name.")
-desc = _unary_op("desc", "Returns a sort expression based on the"
- " descending order of the given column name.")
+_asc_doc = """
+Returns a sort expression based on the ascending order of the given 
column name
+
+>>> from pyspark.sql import Row
+>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), 
Row(name=u'Alice', height=None)])
+>>> df2.select(df2.name).orderBy(df2.name.asc()).collect()
+[Row(name=u'Alice'), Row(name=u'Tom')]
+"""
+_desc_doc = """
+Returns a sort expression based on the descending order of the given 
column name.
+
+>>> from pyspark.sql import Row
+>>> df2 = spark.createDataFrame([Row(name=u'Tom', height=80), 
Row(name=u'Alice', height=None)])
+>>> df2.select(df2.name).orderBy(df2.name.desc()).collect()
+[Row(name=u'Tom'), Row(name=u'Alice')]
+"""
--- End diff --

![2017-04-24 12 55 
17](https://cloud.githubusercontent.com/assets/6477701/25321944/5d1dfa4a-28ed-11e7-9fa3-e8741e492b36.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861399
  
--- Diff: python/pyspark/sql/column.py ---
@@ -269,17 +305,17 @@ def __iter__(self):
 [Row(age=2, name=u'Alice')]
 """
 _startswith_doc = """
-Return a Boolean :class:`Column` based on a string match.
+String starts with. Returns a boolean :class:`Column` based on a 
string match.
 
-:param other: string at end of line (do not use a regex `^`)
+:param other: string at start of line (do not use a regex `^`)
 
 >>> df.filter(df.name.startswith('Al')).collect()
 [Row(age=2, name=u'Alice')]
 >>> df.filter(df.name.startswith('^Al')).collect()
 []
 """
 _endswith_doc = """
-Return a Boolean :class:`Column` based on matching end of string.
+String ends with. Returns a boolean :class:`Column` based on a string 
match.
--- End diff --

![2017-04-24 12 45 
36](https://cloud.githubusercontent.com/assets/6477701/25321740/edfb521c-28eb-11e7-833d-975bf59091bf.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112861566
  
--- Diff: python/pyspark/sql/column.py ---
@@ -527,7 +584,7 @@ def _test():
 .appName("sql.column tests")\
 .getOrCreate()
 sc = spark.sparkContext
-globs['sc'] = sc
+globs['spark'] = spark
--- End diff --

I removed `sc` and replaced it to `spark` as I think we promote this way up 
to my knowledge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112860981
  
--- Diff: python/pyspark/sql/column.py ---
@@ -269,17 +305,17 @@ def __iter__(self):
 [Row(age=2, name=u'Alice')]
 """
--- End diff --

![2017-04-24 12 45 
10](https://cloud.githubusercontent.com/assets/6477701/25321732/de5784ca-28eb-11e7-8084-ac5a26b6a5a6.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112860906
  
--- Diff: python/pyspark/sql/column.py ---
@@ -185,17 +185,52 @@ def __contains__(self, item):
  "in a string column or 'array_contains' function 
for an array column.")
 
 # bitwise operators
-bitwiseOR = _bin_op("bitwiseOR")
-bitwiseAND = _bin_op("bitwiseAND")
-bitwiseXOR = _bin_op("bitwiseXOR")
+_bitwiseOR_doc = """
+Compute bitwise OR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise or(|) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseOR(df3.b)).collect()
+[Row((a | b)=235)]
+"""
--- End diff --

![2017-04-24 12 43 
22](https://cloud.githubusercontent.com/assets/6477701/25321711/abaf659c-28eb-11e7-9289-e548489e0b27.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r112860927
  
--- Diff: python/pyspark/sql/column.py ---
@@ -185,17 +185,52 @@ def __contains__(self, item):
  "in a string column or 'array_contains' function 
for an array column.")
 
 # bitwise operators
-bitwiseOR = _bin_op("bitwiseOR")
-bitwiseAND = _bin_op("bitwiseAND")
-bitwiseXOR = _bin_op("bitwiseXOR")
+_bitwiseOR_doc = """
+Compute bitwise OR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise or(|) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseOR(df3.b)).collect()
+[Row((a | b)=235)]
+"""
+
+_bitwiseAND_doc = """
+Compute bitwise AND of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise and(&) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseAND(df3.b)).collect()
+[Row((a & b)=10)]
+"""
+
+_bitwiseXOR_doc = """
+Compute bitwise XOR of this expression with another expression.
+
+:param other: a value or :class:`Column` to calculate bitwise xor(^) 
against
+  this :class:`Column`.
+
+>>> from pyspark.sql import Row
+>>> df3 = spark.createDataFrame([Row(a=170, b=75)])
+>>> df3.select(df3.a.bitwiseXOR(df3.b)).collect()
+[Row((a ^ b)=225)]
+"""
--- End diff --

![2017-04-24 12 43 
31](https://cloud.githubusercontent.com/assets/6477701/25321719/bac73726-28eb-11e7-829a-1675f51dd6b6.png)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17737
  
**[Test build #76093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76093/testReport)**
 for PR 17737 at commit 
[`af8ac74`](https://github.com/apache/spark/commit/af8ac74b624d54b16339083319e33e8af098655e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...

2017-04-23 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17649
  
@gatorsmile Hive treats comment simply as a key in the string-string 
parameter map, while spark extracts comment from the map as a field in 
`CatalogTable`. So the question is, should spark consider both `comment` and 
`COMMENT` as table comment?

Here's the results of hive:
```
0: jdbc:hive2://.../> create table src (key int , value string) comment 
"initial comment";
No rows affected (1.055 seconds)
0: jdbc:hive2://.../> desc formatted src;

+---+---+---+--+
|   col_name|   data_type   
|comment|

+---+---+---+--+
| # col_name| data_type 
| comment   |
|   | NULL  
| NULL  |
| key   | int   
|   |
| value | string
|   |
|   | NULL  
| NULL  |
| # Detailed Table Information  | NULL  
| NULL  |
| Database: | wzh   
| NULL  |
| Owner:| spark 
| NULL  |
| CreateTime:   | Mon Apr 24 11:43:40 CST 2017  
| NULL  |
| LastAccessTime:   | UNKNOWN   
| NULL  |
| Protect Mode: | None  
| NULL  |
| Retention:| 0 
| NULL  |
| Location: | 
hdfs://hacluster/user/hive/warehouse/wzh.db/src   | NULL  |
| Table Type:   | MANAGED_TABLE 
| NULL  |
| Table Parameters: | NULL  
| NULL  |
|   | comment   
| initial comment   |
|   | transient_lastDdlTime 
| 1493005420|
|   | NULL  
| NULL  |
| # Storage Information | NULL  
| NULL  |
| SerDe Library:| 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe  | NULL  |
| InputFormat:  | 
org.apache.hadoop.hive.ql.io.RCFileInputFormat| NULL  |
| OutputFormat: | 
org.apache.hadoop.hive.ql.io.RCFileOutputFormat   | NULL  |
| Compressed:   | No
| NULL  |
| Num Buckets:  | -1
| NULL  |
| Bucket Columns:   | []
| NULL  |
| Sort Columns: | []
| NULL  |
| Storage Desc Params:  | NULL  
| NULL  |
|   | serialization.format  
| 1 |

+---+---+---+--+
28 rows selected (0.525 seconds)
0: jdbc:hive2://.../> alter table src set tblproperties("comment"="new 
comment", "COMMENT"="NEW COMMENT");
No rows affected (0.62 seconds)
0: jdbc:hive2://.../> desc formatted src;

+---+---+---+--+
|   col_name|   data_type   
|comment|

+---+---+---+--+
| # col_name   

[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17737
  
**[Test build #76092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76092/testReport)**
 for PR 17737 at commit 
[`bb5de1f`](https://github.com/apache/spark/commit/bb5de1f2ef66a4775c8d8bc4f632535d45b3f0b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-23 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/17737

[SPARK-20442][PYTHON][DOCS] Fill up documentations for functions in Column 
API in PySpark

## What changes were proposed in this pull request?

This PR proposes to fill up the documentation with examples for 
`bitwiseOR`, `bitwiseAND`, `bitwiseXOR`. `contains`, `asc`, `desc` in `Column` 
API.

Also, this PR fixes minor types in the documentation and matches some of 
the contents between Scala doc and Python doc.

Lastly, this PR suggest to use `spark` rather than `sc` in doc tests.

## How was this patch tested?

Doc tests were added and manually tested with the commands below:

`./python/run-tests.py --module pyspark-sql`
`./dev/lint-python`

Output was checked via `make html` under `./python/docs`. The snapshots 
will be left on the codes with comments.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-20442

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17737.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17737


commit bb5de1f2ef66a4775c8d8bc4f632535d45b3f0b4
Author: hyukjinkwon 
Date:   2017-04-24T01:48:06Z

Fill up documentations for functions in Column API in PySpark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17480
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76089/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17480
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17480
  
**[Test build #76089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76089/testReport)**
 for PR 17480 at commit 
[`d3e69cf`](https://github.com/apache/spark/commit/d3e69cf66d77ba02cfa13e8e27273e59248885f1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76088/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15125
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #76088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76088/testReport)**
 for PR 15125 at commit 
[`ec62659`](https://github.com/apache/spark/commit/ec6265986cb91585c0a6fdbc0c9675ec9fbba613).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17736
  
**[Test build #76091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76091/testReport)**
 for PR 17736 at commit 
[`a0f4a13`](https://github.com/apache/spark/commit/a0f4a13763c077e57c2dcb5fff12d81f3bb2ceb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17736
  
Let's see if it breaks any existing tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17736: [SPARK-20399][SQL][WIP] Can't use same regex pattern bet...

2017-04-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17736
  
cc @dbtsai @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17736: [SPARK-20399][SQL][WIP] Can't use same regex patt...

2017-04-23 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/17736

[SPARK-20399][SQL][WIP] Can't use same regex pattern between 1.6 and 2.x 
due to unescaped sql string in parser

## What changes were proposed in this pull request?

The new SQL parser is introduced into Spark 2.0. Seems it bring an issue 
regarding the regex pattern string.

The following codes can reproduce it:

val data = Seq("\u0020\u0021\u0023", "abc")
val df = data.toDF()

// 1st usage: let parser parse pattern string: works in 1.6
val rlike1 = df.filter("value rlike '^\\x20[\\x20-\\x23]+$'")
// 2nd usage: call Column.rlike so the pattern string is a literal 
which doesn't go through parser
val rlike2 = df.filter($"value".rlike("^\\x20[\\x20-\\x23]+$"))  // 2: 
works in 1.6, 2.x

// To make 1st usage work, we need to add backslashes like this in 2.x:
val rlike3 = df.filter("value rlike '^x20[x20-x23]+$'")

Due to unescaping SQL String in parser, the first usage working in 1.6 
can't work in 2.0. To make it work, we need to add additional backslashes.

It is quite weird that we can't use the same regex pattern string in the 2 
usages. We should not do unescaping on regex pattern string.

## How was this patch tested?

Jenkins tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 rlike-regex

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17736.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17736


commit a0f4a13763c077e57c2dcb5fff12d81f3bb2ceb9
Author: Liang-Chi Hsieh 
Date:   2017-04-19T01:49:47Z

Don't unescape regex pattern string.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17708: [SPARK-20413] Add new query hint NO_COLLAPSE.

2017-04-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/17708
  
I have the same question as Reynold asked in the mailing list. Doesn't 
common sub expression elimination already address this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-23 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17540
  
This PR will change the Spark UI.
For a simple query `Seq(1 -> "a").toDF("i", "j").write.parquet("/tmp/a")`, 
previously the SQL tab of Spark UI will show
https://cloud.githubusercontent.com/assets/3182036/25320581/fd74467e-28da-11e7-80ec-efb4af8a2cdb.png";>
After this PR it's
https://cloud.githubusercontent.com/assets/3182036/25320591/116864a8-28db-11e7-9115-cf0bac552fdf.png";>

I'm not sure which one is better, it depends on how users expect the Spark 
SQL UI for a write operation. cc @zsxwing 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17733
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76087/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17733
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17733
  
**[Test build #76087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76087/testReport)**
 for PR 17733 at commit 
[`ca8bfbd`](https://github.com/apache/spark/commit/ca8bfbd4f55962773b037c804f827d4f06d95cdd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17623: [SPARK-20292][SQL] Clean up string representation...

2017-04-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/17623#discussion_r112856025
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 ---
@@ -111,6 +111,11 @@ case class GetStructField(child: Expression, ordinal: 
Int, name: Option[String]
   override def dataType: DataType = childSchema(ordinal).dataType
   override def nullable: Boolean = child.nullable || 
childSchema(ordinal).nullable
 
+  override def verboseString: String = {
--- End diff --

We rarely call directly `Expression.verboseString`. It is mostly called by 
`treeString` to show individual nodes in this tree.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17623: [SPARK-20292][SQL] Clean up string representation...

2017-04-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17623#discussion_r112854488
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 ---
@@ -111,6 +111,11 @@ case class GetStructField(child: Expression, ordinal: 
Int, name: Option[String]
   override def dataType: DataType = childSchema(ordinal).dataType
   override def nullable: Boolean = child.nullable || 
childSchema(ordinal).nullable
 
+  override def verboseString: String = {
--- End diff --

I don't think the `verboseString` here provides better string 
representation than `toString`, when will we call `verboseString`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17728
  
**[Test build #76090 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76090/testReport)**
 for PR 17728 at commit 
[`a320327`](https://github.com/apache/spark/commit/a3203272c5ce9dc1a9f923180dcfe00e6665d102).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76090/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17730: [SPARK-20439] [SQL] Fix Catalog API listTables an...

2017-04-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17730#discussion_r112854341
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala ---
@@ -197,7 +211,11 @@ class CatalogImpl(sparkSession: SparkSession) extends 
Catalog {
* `AnalysisException` when no `Table` can be found.
*/
   override def getTable(dbName: String, tableName: String): Table = {
-makeTable(TableIdentifier(tableName, Option(dbName)))
+if (tableExists(dbName, tableName)) {
+  makeTable(TableIdentifier(tableName, Option(dbName)))
+} else {
+  throw new AnalysisException(s"Table or view '$tableName' not found 
in database '$dbName'")
--- End diff --

The doc says `This throws an AnalysisException when no Table can be 
found.`, I think we should not change this behavior


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-23 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112853834
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -1546,6 +1546,40 @@ test_that("string operators", {
   expect_equal(collect(select(df3, substring_index(df3$a, ".", 2)))[1, 1], 
"a.b")
   expect_equal(collect(select(df3, substring_index(df3$a, ".", -3)))[1, 
1], "b.c.d")
   expect_equal(collect(select(df3, translate(df3$a, "bc", "12")))[1, 1], 
"a.1.2.d")
+
+  l4 <- list(list(a = "a.b@c.d   1\\b"))
+  df4 <- createDataFrame(l4)
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\s+")))[1, 1],
+list(list("a.b@c.d", "1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "\\.")))[1, 1],
+list(list("a", "b@c", "d   1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "@")))[1, 1],
+list(list("a.b", "c.d   1\\b"))
+  )
+  expect_equal(
+collect(select(df4, split_string(df4$a, "")))[1, 1],
+list(list("a.b@c.d   1", "b"))
+  )
+
+  l5 <- list(list(a = "abc"))
+  df5 <- createDataFrame(l5)
+  expect_equal(
+collect(select(df5, repeat_string(df5$a, 1L)))[1, 1],
+"abc"
+  )
+  expect_equal(
+collect(select(df5, repeat_string(df5$a, 3)))[1, 1],
+"abcabcabc"
+  )
+  expect_equal(
+collect(select(df5, repeat_string(df5$a, -1)))[1, 1],
--- End diff --

Right? I think we should keep it this way to avoid any confusion when users 
switch between SQL and DSL. If anything changes it will cause test failure and 
then we can add R side checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-23 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112853719
  
--- Diff: R/pkg/R/functions.R ---
@@ -3745,3 +3745,55 @@ setMethod("collect_set",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"collect_set", x@jc)
 column(jc)
   })
+
+#' split_string
+#'
+#' Splits string on regular expression.
+#'
+#' @param x Column to compute on
+#' @param pattern Java regular expression
+#'
+#' @rdname split_string
+#' @family string_funcs
+#' @aliases split_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- read.text("README.md")
+#'
+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function
+setMethod("split_string",
+  signature(x = "Column", pattern = "character"),
+  function(x, pattern) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+column(jc)
+  })
+
+#' repeat_string
+#'
+#' Repeats string n times.
+#'
+#' @param x Column to compute on
+#' @param n Number of repetitions
+#'
+#' @rdname repeat_string
+#' @family string_funcs
+#' @aliases repeat_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- createDataFame(data.frame(
+#'   text = c("foo", "bar")
+#' ))
+#'
+#' head(select(repeat_string(df$text, 3)))
+#' }
+#' @note repeat_string 2.3.0
+#' @note equivalent to \code{repeat} SQL function
+setMethod("repeat_string",
+  signature(x = "Column", n = "numeric"),
+  function(x, n) {
+jc <- callJStatic("org.apache.spark.sql.functions", "repeat", 
x@jc, as.integer(n))
--- End diff --

That's useful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-23 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112853686
  
--- Diff: R/pkg/R/functions.R ---
@@ -3745,3 +3745,55 @@ setMethod("collect_set",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"collect_set", x@jc)
 column(jc)
   })
+
+#' split_string
+#'
+#' Splits string on regular expression.
+#'
+#' @param x Column to compute on
+#' @param pattern Java regular expression
+#'
+#' @rdname split_string
+#' @family string_funcs
+#' @aliases split_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- read.text("README.md")
+#'
+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function
--- End diff --

Thats cool :) I am not convince about the linking though. Scala docs are 
not very useful. 

I considered adding `expr` or `selectExpr` version to examples:

```r
selectExpr(df, "split(value, '@')")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17729: [SPARK-20438][R] SparkR wrappers for split and re...

2017-04-23 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17729#discussion_r112853256
  
--- Diff: R/pkg/R/functions.R ---
@@ -3745,3 +3745,55 @@ setMethod("collect_set",
 jc <- callJStatic("org.apache.spark.sql.functions", 
"collect_set", x@jc)
 column(jc)
   })
+
+#' split_string
+#'
+#' Splits string on regular expression.
+#'
+#' @param x Column to compute on
+#' @param pattern Java regular expression
+#'
+#' @rdname split_string
+#' @family string_funcs
+#' @aliases split_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- read.text("README.md")
+#'
+#' head(select(split_string(df$value, "\\s+")))
+#' }
+#' @note split_string 2.3.0
+#' @note equivalent to \code{split} SQL function
+setMethod("split_string",
+  signature(x = "Column", pattern = "character"),
+  function(x, pattern) {
+jc <- callJStatic("org.apache.spark.sql.functions", "split", 
x@jc, pattern)
+column(jc)
+  })
+
+#' repeat_string
+#'
+#' Repeats string n times.
+#'
+#' @param x Column to compute on
+#' @param n Number of repetitions
+#'
+#' @rdname repeat_string
+#' @family string_funcs
+#' @aliases repeat_string,Column-method
+#' @export
+#' @examples \dontrun{
+#' df <- createDataFame(data.frame(
+#'   text = c("foo", "bar")
+#' ))
--- End diff --

I thought about this but it is hard to find a good source at hand. We could 
use `data/streaming/AFINN-111.txt` which has nice and short lines, or 
`README.md` and just take `head(., 1)` (the rest is empty or  longish.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17728
  
**[Test build #76090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76090/testReport)**
 for PR 17728 at commit 
[`a320327`](https://github.com/apache/spark/commit/a3203272c5ce9dc1a9f923180dcfe00e6665d102).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17463: [SPARK-20131][DStream][Test] Flaky Test: org.apac...

2017-04-23 Thread uncleGen
Github user uncleGen closed the pull request at:

https://github.com/apache/spark/pull/17463


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17480: [SPARK-20079][Core][yarn] Re registration of AM hangs sp...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17480
  
**[Test build #76089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76089/testReport)**
 for PR 17480 at commit 
[`d3e69cf`](https://github.com/apache/spark/commit/d3e69cf66d77ba02cfa13e8e27273e59248885f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15125
  
**[Test build #76088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76088/testReport)**
 for PR 15125 at commit 
[`ec62659`](https://github.com/apache/spark/commit/ec6265986cb91585c0a6fdbc0c9675ec9fbba613).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112849184
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator(
 endpointUrl: String,
 regionId: String,
 range: SequenceNumberRange,
-retryTimeoutMs: Int) extends NextIterator[Record] with Logging {
+retryTimeoutMs: Int,
+sparkConf: SparkConf) extends NextIterator[Record] with Logging {
--- End diff --

I prefer the latter. Create it in `KinesisInputDStream` and pass it down to 
`KinesisBackedBlockRDD`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17733
  
**[Test build #76087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76087/testReport)**
 for PR 17733 at commit 
[`ca8bfbd`](https://github.com/apache/spark/commit/ca8bfbd4f55962773b037c804f827d4f06d95cdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread yssharma
Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112848855
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator(
 endpointUrl: String,
 regionId: String,
 range: SequenceNumberRange,
-retryTimeoutMs: Int) extends NextIterator[Record] with Logging {
+retryTimeoutMs: Int,
+sparkConf: SparkConf) extends NextIterator[Record] with Logging {
--- End diff --

Or we can pass then via spark conf and construct the 
KinesisReadConfigurations object in `KinesisInputDStream` and pass it down to 
`KinesisBackedBlockRDD `.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread yssharma
Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112848762
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator(
 endpointUrl: String,
 regionId: String,
 range: SequenceNumberRange,
-retryTimeoutMs: Int) extends NextIterator[Record] with Logging {
+retryTimeoutMs: Int,
+sparkConf: SparkConf) extends NextIterator[Record] with Logging {
--- End diff --

And would you expect it to be passed directly to the`KinesisInputDStream` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112848595
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator(
 endpointUrl: String,
 regionId: String,
 range: SequenceNumberRange,
-retryTimeoutMs: Int) extends NextIterator[Record] with Logging {
+retryTimeoutMs: Int,
+sparkConf: SparkConf) extends NextIterator[Record] with Logging {
--- End diff --

I would prefer a specialized case class,
something like:
```scala
case class KinesisReadConfigurations(
  maxRetries: Int,
  retryWaitTimeMs: Long,
  retryTimeoutMs: Long)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread yssharma
Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112848401
  
--- Diff: 
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala
 ---
@@ -101,6 +103,36 @@ abstract class 
KinesisBackedBlockRDDTests(aggregateTestData: Boolean)
 }
   }
 
+  testIfEnabled("Basic reading from Kinesis with modified configurations") 
{
--- End diff --

I wasn't able to test the actual waiting of Kinesis. I haven't looked at 
the `PrivateMethodTester ` yet and check how that can help us to test how the 
vars are picked.
I used this testcase to debug and verify that all the values are passed 
correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread yssharma
Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112848373
  
--- Diff: 
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala
 ---
@@ -101,6 +103,36 @@ abstract class 
KinesisBackedBlockRDDTests(aggregateTestData: Boolean)
 }
   }
 
+  testIfEnabled("Basic reading from Kinesis with modified configurations") 
{
+// Add Kinesis retry configurations
+sc.conf.set(RETRY_WAIT_TIME_KEY, "1000ms")
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread yssharma
Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112848363
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator(
 endpointUrl: String,
 regionId: String,
 range: SequenceNumberRange,
-retryTimeoutMs: Int) extends NextIterator[Record] with Logging {
+retryTimeoutMs: Int,
+sparkConf: SparkConf) extends NextIterator[Record] with Logging {
--- End diff --

@brkyvz - I was thinking not to pass individual configs to the constructor 
because that would just cause the list to grow. Using SparkConf or a Map would 
enable us to add new configs without any code changes. I was using a Map 
earlier for this so that its easy to pass more configs. 
What are your thoughts on Map vs Case class ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17695
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17695
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76086/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17695
  
**[Test build #76086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76086/testReport)**
 for PR 17695 at commit 
[`e74c2d6`](https://github.com/apache/spark/commit/e74c2d6bcb2f8a2dc841b8b79d9200710f0dbd4c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17695
  
**[Test build #76086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76086/testReport)**
 for PR 17695 at commit 
[`e74c2d6`](https://github.com/apache/spark/commit/e74c2d6bcb2f8a2dc841b8b79d9200710f0dbd4c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17695: [SPARK-20400][DOCS] Remove References to 3rd Party Vendo...

2017-04-23 Thread anabranch
Github user anabranch commented on the issue:

https://github.com/apache/spark/pull/17695
  
Thanks for the info @srowen - this should be better now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112845286
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
--- End diff --

hmm, it's a bit odd to call rollup or cube that way but ok if other 
languages leave that open too. but I'd say we should add a line to explain 
"rollup or cube without column is the same as group_by" (or something better)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112844527
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -308,6 +308,21 @@ numCyl <- summarize(groupBy(carsDF, carsDF$cyl), count 
= n(carsDF$cyl))
 head(numCyl)
 ```
 
+`groupBy` can be replaced with `cube` or `rollup` to compute subtotals 
across multiple dimensions.
--- End diff --

I keep forgetting there is one. I think we can add a few lines. This is 
actually a pretty neat feature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112844471
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
+jcol <- lapply(cols, function(x) if (is.character(x)) 
column(x)@jc else x@jc)
+sgd <- callJMethod(x@sdf, "cube", jcol)
+groupedData(sgd)
+  })
+
+#' rollup
+#'
+#' Create a multi-dimensional rollup for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
--- End diff --

Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-23 Thread zero323
Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17728#discussion_r112844452
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -3642,3 +3642,58 @@ setMethod("checkpoint",
 df <- callJMethod(x@sdf, "checkpoint", as.logical(eager))
 dataFrame(df)
   })
+
+
+#' cube
+#'
+#' Create a multi-dimensional cube for the SparkDataFrame using the 
specified columns.
+#'
+#' @param x a SparkDataFrame.
+#' @param ... variable(s) (character names(s) or Column(s)) to group on.
+#' @return A GroupedData.
+#' @family SparkDataFrame functions
+#' @aliases cube,SparkDataFrame-method
+#' @rdname cube
+#' @name cube
+#' @export
+#' @examples
+#' \dontrun{
+#' df <- createDataFrame(mtcars)
+#' mean(cube(df, "cyl", "gear", "am"), "mpg")
+#' }
+#' @note cube since 2.3.0
+setMethod("cube",
+  signature(x = "SparkDataFrame"),
+  function(x, ...) {
+cols <- list(...)
--- End diff --

If think we can skip that. `rollup(df)` and `cube(df)` are valid function 
calls equivalent to `group_by(df)` and arguably can be useful in some cases 
(like aggregations based on user input).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17649
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76085/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...

2017-04-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17649
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...

2017-04-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17649
  
**[Test build #76085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76085/testReport)**
 for PR 17649 at commit 
[`50deed9`](https://github.com/apache/spark/commit/50deed9959da1ae5d4f7ce647248e2f8c813e125).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112841794
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
 ---
@@ -135,7 +139,8 @@ class KinesisSequenceRangeIterator(
 endpointUrl: String,
 regionId: String,
 range: SequenceNumberRange,
-retryTimeoutMs: Int) extends NextIterator[Record] with Logging {
+retryTimeoutMs: Int,
+sparkConf: SparkConf) extends NextIterator[Record] with Logging {
--- End diff --

I wouldn't pass in the `SparkConf` all the way in here. See how 
`retryTimeoutMs` has been passed in specifically above. You can do two things:
 1. Pass each of them one by one
 2. Evaluate all the configurations in `KinesisBackedBlockRDD` or one level 
higher and use a `case class` such as `KinesisReadConfigurations`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112841862
  
--- Diff: 
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala
 ---
@@ -101,6 +103,36 @@ abstract class 
KinesisBackedBlockRDDTests(aggregateTestData: Boolean)
 }
   }
 
+  testIfEnabled("Basic reading from Kinesis with modified configurations") 
{
--- End diff --

I don't see how this test actually tests the configuration setting. It just 
tests if things work, not that the configurations are actually picked up.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112841727
  
--- Diff: docs/streaming-kinesis-integration.md ---
@@ -216,3 +216,7 @@ de-aggregate records during consumption.
 - If no Kinesis checkpoint info exists when the input DStream starts, it 
will start either from the oldest record available 
(`InitialPositionInStream.TRIM_HORIZON`) or from the latest tip 
(`InitialPositionInStream.LATEST`).  This is configurable.
   - `InitialPositionInStream.LATEST` could lead to missed records if data 
is added to the stream while no input DStreams are running (and no checkpoint 
info is being stored).
   - `InitialPositionInStream.TRIM_HORIZON` may lead to duplicate 
processing of records where the impact is dependent on checkpoint frequency and 
processing idempotency.
+
+ Kinesis retry configurations
+ - `spark.streaming.kinesis.retry.waitTime` : SparkConf for wait time 
between Kinesis retries (in milliseconds). Default is "100ms".
--- End diff --

Example: `Wait time between Kinesis retries as a duration string. When 
reading from Amazon Kinesis, users may hit 'ThroughputExceededExceptions', when 
consuming faster than 2 mb/s. This configuration can be tweaked to increase the 
sleep between fetches when a fetch fails to reduce these exceptions.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17467: [SPARK-20140][DStream] Remove hardcoded kinesis r...

2017-04-23 Thread brkyvz
Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/17467#discussion_r112841869
  
--- Diff: 
external/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDDSuite.scala
 ---
@@ -101,6 +103,36 @@ abstract class 
KinesisBackedBlockRDDTests(aggregateTestData: Boolean)
 }
   }
 
+  testIfEnabled("Basic reading from Kinesis with modified configurations") 
{
+// Add Kinesis retry configurations
+sc.conf.set(RETRY_WAIT_TIME_KEY, "1000ms")
--- End diff --

we need to clean these up after the test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >