date:20170605

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-05 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18211
  
In this pr:
1. Instead of `chunkIndex`, fetch chunk by `String chunkId`. Server doesn't 
cache the blocks list.
2. In `OpenBlocks`, only metadata(e.g. appId, executorId) of the stream is 
send. Thus client doesn't need to send the metadata in following fetching.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18211
  
**[Test build #77767 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77767/testReport)**
 for PR 18211 at commit 
[`883089a`](https://github.com/apache/spark/commit/883089aa824dabfb9b82a17546a953f1f0a22be4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...

2017-06-05 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18211
  
In my cluster, we are suffering from OOM of shuffle-service.
We found that a lot of executors are fetching blocks from a single 
shuffle-service. Analyzing the memory, we found that the 
blockIds(shuffle_shuffleId_mapId_reduceId) takes about 1.5GBytes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18210: [SPARK-20993][CORE]The configuration item about 'Spark.b...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18210
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18210: [SPARK-20993][CORE]The configuration item about '...

2017-06-05 Thread guoxiaolongzte

GitHub user guoxiaolongzte opened a pull request:

https://github.com/apache/spark/pull/18210

[SPARK-20993][CORE]The configuration item about 'Spark.blacklist.enabled' 
need to set the default value 'false'

## What changes were proposed in this pull request?

The configuration item's default value about 'Spark.blacklist.enabled' is 
'false'.


![1](https://cloud.githubusercontent.com/assets/26266482/26817014/40469250-4ac7-11e7-96e6-617bfb93dd26.png)

So, when the spark code to get the value of the configuration item about 
'Spark.blacklist.enabled', you should specify the default value of 'false'.
## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guoxiaolongzte/spark SPARK-20993

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18210.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18210


commit d383efba12c66addb17006dea107bb0421d50bc3
Author: éå°é¾ 10207633 
Date:   2017-03-31T13:57:09Z

[SPARK-20177]Document about compression way has some little detail changes.

commit 3059013e9d2aec76def14eb314b6761bea0e7ca0
Author: éå°é¾ 10207633 
Date:   2017-04-01T01:38:02Z

[SPARK-20177] event log add a space

commit 555cef88fe09134ac98fd0ad056121c7df2539aa
Author: guoxiaolongzte 
Date:   2017-04-02T00:16:08Z

'/applications/[app-id]/jobs' in rest api,status should be 
[running|succeeded|failed|unknown]

commit 46bb1ad3ddd9fb55b5607ac4f20213a90186cfe9
Author: éå°é¾ 10207633 
Date:   2017-04-05T03:16:50Z

Merge branch 'master' of https://github.com/apache/spark into SPARK-20177

commit 0efb0dd9e404229cce638fe3fb0c966276784df7
Author: éå°é¾ 10207633 
Date:   2017-04-05T03:47:53Z

[SPARK-20218]'/applications/[app-id]/stages' in REST API,add description.

commit 0e37fdeee28e31fc97436dabd001d3c85c5a7794
Author: éå°é¾ 10207633 
Date:   2017-04-05T05:22:54Z

[SPARK-20218] '/applications/[app-id]/stages/[stage-id]' in REST API,remove 
redundant description.

commit 52641bb01e55b48bd9e8579fea217439d14c7dc7
Author: éå°é¾ 10207633 
Date:   2017-04-07T06:24:58Z

Merge branch 'SPARK-20218'

commit d3977c9cab0722d279e3fae7aacbd4eb944c22f6
Author: éå°é¾ 10207633 
Date:   2017-04-08T07:13:02Z

Merge branch 'master' of https://github.com/apache/spark

commit 137b90e5a85cde7e9b904b3e5ea0bb52518c4716
Author: éå°é¾ 10207633 
Date:   2017-04-10T05:13:40Z

Merge branch 'master' of https://github.com/apache/spark

commit 0fe5865b8022aeacdb2d194699b990d8467f7a0a
Author: éå°é¾ 10207633 
Date:   2017-04-10T10:25:22Z

Merge branch 'SPARK-20190' of https://github.com/guoxiaolongzte/spark

commit cf6f42ac84466960f2232c025b8faeb5d7378fe1
Author: éå°é¾ 10207633 
Date:   2017-04-10T10:26:27Z

Merge branch 'master' of https://github.com/apache/spark

commit 685cd6b6e3799c7be65674b2670159ba725f0b8f
Author: éå°é¾ 10207633 
Date:   2017-04-14T01:12:41Z

Merge branch 'master' of https://github.com/apache/spark

commit c716a9231e9ab117d2b03ba67a1c8903d8d9da93
Author: guoxiaolong 
Date:   2017-04-17T06:57:21Z

Merge branch 'master' of https://github.com/apache/spark

commit 679cec36a968fbf995b567ca5f6f8cbd8e32673f
Author: guoxiaolong 
Date:   2017-04-19T07:20:08Z

Merge branch 'master' of https://github.com/apache/spark

commit 3c9387af84a8f39cf8c1ce19e15de99dfcaf0ca5
Author: guoxiaolong 
Date:   2017-04-19T08:15:26Z

Merge branch 'master' of https://github.com/apache/spark

commit cb71f4462a0889cbb0843875b1e4cf14bcb0d020
Author: guoxiaolong 
Date:   2017-04-20T05:52:06Z

Merge branch 'master' of https://github.com/apache/spark

commit ce92a7415a2026f5bf909820110a13750a0949e1
Author: guoxiaolong 
Date:   2017-04-21T05:21:48Z

Merge branch 'master' of https://github.com/apache/spark

commit dd64342206041a8c3a282459e5f2b898dc558d89
Author: guoxiaolong 
Date:   2017-04-21T08:44:25Z

Merge branch 'master' of https://github.com/apache/spark

commit bffd2bd00c6b0e20313756e133adca4c97707c67
Author: guoxiaolong 
Date:   2017-04-28T01:36:29Z

Merge branch 'master' of https://github.com/apache/spark

commit 588d42a382345a071532ace1eab5457911f6aa46
Author: guoxiaolong 
Date:   2017-04-28T05:02:36Z

Merge branch 'master' of https://github.com/apache/spark

commit 4bbeee1231275d1afa0775dbb61fcc5817f6e57c
Author: guoxiaolong 
Date:   2017-05-02T02:30:52Z

Merge branch 'master' of https://github.com/apache/spark

commit 362e5ad12bfe013a7780d81b5067c2ff644efa05
Author: guoxiaolong 
Date:   2017-05-03T06:47:54Z

Merge branch 'master' of https://github.com/apache/spark

commit 4ed5e00e784ab3c31e1ba69f06fd64520c9d32e4
Author: guoxiaolong

[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...

2017-06-05 Thread jinxing64

GitHub user jinxing64 opened a pull request:

https://github.com/apache/spark/pull/18211

[WIP][SPARK-20994] Alleviate memory pressure in StreamManager

## What changes were proposed in this pull request?

In current code, chunks are fetched from shuffle service in two steps:
Step-1. Send `OpenBlocks`, which contains the blocks list to to fetch;
Step-2. Fetch the consecutive chunks from shuffle-service by `streamId` and 
`chunkIndex`
Conceptually, there is no need to send the blocks list in step-1. Client 
can send the blockId in Step-2. Receiving `ChunkFetchRequest`, server can check 
if the chunkId is in local block manager and send back response. 
Thus memory cost can be improved.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jinxing64/spark SPARK-20994

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18211


commit 883089aa824dabfb9b82a17546a953f1f0a22be4
Author: jinxing 
Date:   2017-06-05T09:19:18Z

[SPARK-20994] Alleviate memory pressure in StreamManager




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18148: [SPARK-20926][SQL] Removing exposures to guava li...

2017-06-05 Thread rezasafi

GitHub user rezasafi reopened a pull request:

https://github.com/apache/spark/pull/18148

[SPARK-20926][SQL] Removing exposures to guava library caused by directly 
accessing  SessionCatalog's tableRelationCache


There could be test failures because DataStorageStrategy, 
HiveMetastoreCatalog and also HiveSchemaInferenceSuite were exposed to guava 
library by directly accessing SessionCatalog's tableRelationCacheg. These 
failures occur when guava shading is in place. 

## What changes were proposed in this pull request?
This change removes those guava exposures by introducing new methods in 
SessionCatalog and also changing DataStorageStrategy, HiveMetastoreCatalog and 
HiveSchemaInferenceSuite so that they use those proxy methods.

## How was this patch tested?

Unit tests passed after applying these changes. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rezasafi/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18148


commit 8253bbe36d551f11d8e48ab92444977ac5b0776a
Author: Reza Safi 
Date:   2017-05-30T21:58:39Z

[SPARK-20926][SQL] Removing exposures to guava library through directly 
accessing  SessionCatalog's tableRelationCache

There were test failures because DataStorageStrategy, HiveMetastoreCatalog 
and also HiveSchemaInferenceSuite were exposed to the shaded Guava library. 
This change removes those exposures by introducing new methods in 
SessionCatalog.

commit 9821ea191d63b327663f29adb04b48c856c550ff
Author: Reza Safi 
Date:   2017-06-02T01:36:05Z

Making tableRelationCache private and updating the comments.

commit 942137299dc03de53ce3e7120ac052f5764c14dc
Author: Reza Safi 
Date:   2017-06-02T03:44:57Z

Fixing scalastyle check errors

commit 2832253afe2a48daae3f78568315b19a5aeb045f
Author: Reza Safi 
Date:   2017-06-02T23:49:49Z

Changing the names for two of the methods.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18148: [SPARK-20926][SQL] Removing exposures to guava li...

2017-06-05 Thread rezasafi

Github user rezasafi closed the pull request at:

https://github.com/apache/spark/pull/18148


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...

2017-06-05 Thread pralabhkumar

Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
Have done the changes suggested by @mpjlu  . 

Please find some time to review the pull request . 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #7379: [SPARK-8682][SQL][WIP] Range Join

2017-06-05 Thread IceMan81

Github user IceMan81 commented on the issue:

https://github.com/apache/spark/pull/7379
  
@zzeekk Would you mind explaining how your workaround works. 

> A Workaround is to build blocks and add them as equi-join condition
Not sure I understand what you are suggesting here. 

@marmbrus Inability to do range join efficiently results in very poor 
performance. Are there plans on addressing this directly in an upcoming 
release? I've scenarios where the optimizer sorts the results into the single 
partition for the join (all other partitions are empty) because the sort does 
not include the columns in the range condition. And this task will run for more 
than a day which a forced broadcast version of it will run in 3 hours. And here 
I'm only able to do the boradcast because I'm using a smaller data set on one 
side of the join.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17953
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17953
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77766/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17953
  
**[Test build #77766 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77766/testReport)**
 for PR 17953 at commit 
[`1e86674`](https://github.com/apache/spark/commit/1e866745b3639248a237c285479aa5fb72b3c8df).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18108: [SPARK-20884] Spark' masters will be both standby due to...

2017-06-05 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/spark/pull/18108
  
@HyukjinKwon 
Yes,I didn't found  any problems when I compiled and used it in my local.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120269932
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -578,39 +583,29 @@ case class StringTrim(children: Seq[Expression])
 val getTrimFunction = if (children.size == 1) {
   s"UTF8String ${ev.value} = ${inputs(0)}.trim();"
 } else {
-  s"UTF8String ${ev.value} = 
${inputs(1)}.trim(${inputs(0)});".stripMargin
+  s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});"
--- End diff --

ok, I can try to change it during the AstBuilder.scala--> visitFunctionCall 
time (sql path) and funcitions.scala (dataframe path).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18159
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18159
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77765/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18159
  
**[Test build #77765 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77765/testReport)**
 for PR 18159 at commit 
[`0af718d`](https://github.com/apache/spark/commit/0af718d15ed9c6bcf4e8de19528affdc492d1257).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-05 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18159
  
@cloud-fan The screenshot looks like:

https://cloud.githubusercontent.com/assets/68855/26815029/614f13ba-4abc-11e7-9fbf-2248f0b7211d.png";>





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120265375
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -578,39 +583,29 @@ case class StringTrim(children: Seq[Expression])
 val getTrimFunction = if (children.size == 1) {
   s"UTF8String ${ev.value} = ${inputs(0)}.trim();"
 } else {
-  s"UTF8String ${ev.value} = 
${inputs(1)}.trim(${inputs(0)});".stripMargin
+  s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});"
--- End diff --

Can't we just change the input order of `StringTrim`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120261787
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -1105,19 +1105,26 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   }
 
   /**
-   * Create a name LTRIM for TRIM(Leading), RTRIM for TRIM(Trailing), TRIM 
for TRIM(BOTH)
+   * Create a function name LTRIM for TRIM(Leading), RTRIM for 
TRIM(Trailing), TRIM for TRIM(BOTH),
+   * otherwise, returnthe original funcID.
--- End diff --

will change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-06-05 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/17953
  
@cloud-fan Do you think it should be done in this pull? And where should 
add the filter, `CalalogImpl.createTable()` or `ExternalCatalog.createTable()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18108: [SPARK-20884] Spark' masters will be both standby due to...

2017-06-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18108
  
@liu-zhaokun, I would request another test after manually checking the test 
failure at least. Does this succeed in your local without a problem? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18108: [SPARK-20884] Spark' masters will be both standby due to...

2017-06-05 Thread liu-zhaokun

Github user liu-zhaokun commented on the issue:

https://github.com/apache/spark/pull/18108
  
@srowen 
First,I think the tests which are related to hive went to fail doesn't my 
business,right? And then ,the test of 
"org.apache.spark.deploy.master.PersistenceEngineSuite" says  
"java.lang.NoSuchMethodError: 
org.apache.curator.utils.ZKPaths.fixForNamespace",but I found there is almost 
no difference between the two version about these code by having compared these 
code again,so I think we don't have an API compatibility problem here, and I 
doubt there are some problems in Jenkins,could you test this PR again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18207
  
Yep. Right. Then, could you officially resolve 
[SPARK-12661](https://issues.apache.org/jira/browse/SPARK-12661), too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17953
  
**[Test build #77766 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77766/testReport)**
 for PR 17953 at commit 
[`1e86674`](https://github.com/apache/spark/commit/1e866745b3639248a237c285479aa5fb72b3c8df).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...

2017-06-05 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/17953
  
Ahh, found it. Re-generated the golden files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18207
  
OK great then we have officially deprecated it, haven't we?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18207
  
@rxin, as of #17355 Jenkins is using Python 2.7 sourced from a virtualenv 
instead of Python 2.6. That patch was merged into master before branch-2.2 was 
cut.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18205
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18205
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77763/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18205
  
**[Test build #77763 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77763/testReport)**
 for PR 18205 at commit 
[`c53a0c7`](https://github.com/apache/spark/commit/c53a0c7a304e6a12548a047fd08786a174ed1479).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-05 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18159
  
@adrian-ionescu Thanks for the review. I've addressed the above comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77762/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18199
  
**[Test build #77762 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77762/testReport)**
 for PR 18199 at commit 
[`240c27b`](https://github.com/apache/spark/commit/240c27b8386ed929625f5817c32f04b5c100e4b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18209
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18159
  
**[Test build #77765 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77765/testReport)**
 for PR 18159 at commit 
[`0af718d`](https://github.com/apache/spark/commit/0af718d15ed9c6bcf4e8de19528affdc492d1257).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18207
  
Jenkins runs with `['python2.7', 'python3.4', 'pypy']` only, doesn't it? 
Also, this is a major release cycle with the other big changes. For me, 
removing Python 2.6 is not proper with subsequent minor version release cycles 
like 2.2.1 and 2.2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18209: [SPARK-20992][Scheduler] Add support for Nomad as...

2017-06-05 Thread barnardb

GitHub user barnardb opened a pull request:

https://github.com/apache/spark/pull/18209

[SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend

## What changes were proposed in this pull request?

Adds support for [Nomad](https://github.com/hashicorp/nomad) as a scheduler 
backend. Nomad is a cluster manager designed for both long lived services and 
short lived batch processing workloads.

The integration supports client and cluster mode, dynamic allocation 
(increasing only), has basic support for python and R applications, and works 
with applications packaged either as JARs or as docker images.

Documentation is in 
[docs/running-on-nomad.md](https://github.com/barnardb/spark/blob/nomad/docs/running-on-nomad.md).

This will be [presented at Spark Summit 
2017](https://spark-summit.org/2017/events/homologous-apache-spark-clusters-using-nomad/).

A build of the pull request with Nomad support is at available 
[here](https://www.dropbox.com/s/llcv388yl5hweje/spark-2.3.0-SNAPSHOT-bin-nomad.tgz?dl=0).

Feedback would be much appreciated.

## How was this patch tested?

This patch was tested with Integration and manual tests, and a load test 
was performed to ensure it doesn't have worse performance than the YARN 
integration.

The feature was developed and tested against Nomad 0.5.6 (current stable 
version)
on Spark 2.1.0, rebased to 2.1.1 and retested, and finally rebased to 
master and retested.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/barnardb/spark nomad

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18209


commit c762194188e64cccff8a9758885b45f9d395cced
Author: Ben Barnard 
Date:   2017-06-06T01:19:35Z

Add support for Nomad as a scheduler backend




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...

2017-06-05 Thread yangw1234

Github user yangw1234 commented on the issue:

https://github.com/apache/spark/pull/16820
  
Sorry I could not find time to finish this pr recently. Close it for now. 
If you need this fix, please feel free to base on it and finish it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16820: [SPARK-19471] AggregationIterator does not initia...

2017-06-05 Thread yangw1234

Github user yangw1234 closed the pull request at:

https://github.com/apache/spark/pull/16820


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18193
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18193
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77764/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18193
  
**[Test build #77764 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77764/testReport)**
 for PR 18193 at commit 
[`171a9e6`](https://github.com/apache/spark/commit/171a9e66d2ceaeae87ced754be49554ce602930b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18207
  
I believe we still support Python 2.6, given Jenkins runs 2.6... There 
seems to be no point in removing that support this late in the release cycle.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...

2017-06-05 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18192
  
I guess it is discouraged to move codes alone as explained but wouldn't it 
be better to merge this rather than close if this looks better in any way and 
the change is safe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...

2017-06-05 Thread zhengcanbin

Github user zhengcanbin commented on the issue:

https://github.com/apache/spark/pull/18192
  
@jerryshao Should I close this issue ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77761/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17723
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #77761 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77761/testReport)**
 for PR 17723 at commit 
[`1479c60`](https://github.com/apache/spark/commit/1479c60b3059e17a29e23a309f1b38e364bb2451).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...

2017-06-05 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18192
  
The change should be safe, but usually we don't do such code structure 
refactoring alone without a strong reason, so I'm neutral of this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...

2017-06-05 Thread zhengcanbin

Github user zhengcanbin commented on the issue:

https://github.com/apache/spark/pull/18192
  
@jerryshao It's a tiny change for more reasonable code structure.  There 
exists three `ShuffleWriter ` implementations, we first use the helper method 
`SortShuffleWriter#shouldBypassMergeSort` to determine whether a shuffle should 
use `BypassMergeSort` pathï¼and then use another helper method 
`SortShuffleManager#canUseSerializedShuffle` for deciding `UnsafeShuffleWriter` 
path. From view of code structure consistencyï¼method `shouldBypassMergeSort` 
should not belong to `SortShuffleWriter`ï¼it should be included in 
`BypassMergeSortShuffleWriter` or `SortShuffleManager`, and better for the 
later one to put the two helper methods together. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18193
  
**[Test build #77764 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77764/testReport)**
 for PR 18193 at commit 
[`171a9e6`](https://github.com/apache/spark/commit/171a9e66d2ceaeae87ced754be49554ce602930b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18205
  
**[Test build #77763 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77763/testReport)**
 for PR 18205 at commit 
[`c53a0c7`](https://github.com/apache/spark/commit/c53a0c7a304e6a12548a047fd08786a174ed1479).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...

2017-06-05 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/18199#discussion_r120244321
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala
 ---
@@ -199,13 +199,52 @@ class RateStreamSource(
 }
 
 val localStartTimeMs = startTimeMs + 
TimeUnit.SECONDS.toMillis(startSeconds)
-val relativeMsPerValue =
-  TimeUnit.SECONDS.toMillis(endSeconds - startSeconds) / (rangeEnd - 
rangeStart)
--- End diff --

I thought that you would only change `TimeUnit.SECONDS.toMillis(endSeconds 
- startSeconds).toDouble`. Wasn't expecting all this change!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18205: [SPARK-20986] [SQL] Reset table's statistics afte...

2017-06-05 Thread lianhuiwang

Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18205#discussion_r120244237
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala
 ---
@@ -66,4 +67,33 @@ class PruneFileSourcePartitionsSuite extends QueryTest 
with SQLTestUtils with Te
   }
 }
   }
+
+  test("SPARK-20986 Reset table's statistics after 
PruneFileSourcePartitions rule") {
+withTempView("tempTbl", "partTbl") {
+  spark.range(1000).selectExpr("id").createOrReplaceTempView("tempTbl")
+  sql("CREATE TABLE partTbl (id INT) PARTITIONED BY (part INT) STORED 
AS parquet")
--- End diff --

Yes, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120243896
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 
---
@@ -730,4 +730,58 @@ public void testToLong() throws IOException {
   assertFalse(negativeInput, 
UTF8String.fromString(negativeInput).toLong(wrapper));
 }
   }
+
+  @Test
+  public void trim() {
--- End diff --

sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120243912
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -502,69 +503,232 @@ case class FindInSet(left: Expression, right: 
Expression) extends BinaryExpressi
   override def prettyName: String = "find_in_set"
 }
 
+trait String2TrimExpression extends ImplicitCastInputTypes {
+  self: Expression =>
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.size)(StringType)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def sql: String = {
+if (children.size == 1) {
+  val childrenSQL = children.map(_.sql).mkString(", ")
+  s"$prettyName($childrenSQL)"
+} else {
+  val trimSQL = children(0).map(_.sql).mkString(", ")
+  val tarSQL = children(1).map(_.sql).mkString(", ")
+  s"$prettyName($trimSQL, $tarSQL)"
+}
+  }
+}
+
 /**
- * A function that trim the spaces from both ends for the specified string.
- */
+ * A function that takes a character string, removes the leading and/or 
trailing characters matching with the characters
+ * in the trim string, returns the new string. If BOTH and trimStr 
keywords are not specified, it defaults to remove
+ * space character from both ends.
+ * trimStr: A character string to be trimmed from the source string, if it 
has multiple characters, the function
+ * searches for each character in the source string, removes the 
characters from the source string until it
+ * encounters the first non-match character.
+ * BOTH: removes any characters from both ends of the source string that 
matches characters in the trim string.
+  */
 @ExpressionDescription(
-  usage = "_FUNC_(str) - Removes the leading and trailing space characters 
from `str`.",
+  usage = """
+_FUNC_(str) - Removes the leading and trailing space characters from 
`str`.
+_FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing 
trimString from `str`
+  """,
   extended = """
+Arguments:
+  str - a string expression
+  trimString - the trim string
+  BOTH, FROM - these are keyword to specify for trim string from both 
ends of the string
 Examples:
   > SELECT _FUNC_('SparkSQL   ');
SparkSQL
+  > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS');
+   parkSQ
   """)
-case class StringTrim(child: Expression)
-  extends UnaryExpression with String2StringExpression {
+case class StringTrim(children: Seq[Expression])
+  extends Expression with String2TrimExpression {
 
-  def convert(v: UTF8String): UTF8String = v.trim()
+  require(children.size <= 2 && children.nonEmpty,
+s"$prettyName requires at least one argument and no more than two.")
 
   override def prettyName: String = "trim"
 
+  // trim function can take one or two arguments.
+  // Specify one child, it is for the trim space function.
+  // Specify the two children, it is for the trim function with BOTH 
option.
+  override def eval(input: InternalRow): Any = {
+val inputs = children.map(_.eval(input).asInstanceOf[UTF8String])
+if (inputs(0) != null) {
+  if (children.size == 1) {
+return inputs(0).trim()
+  } else if (inputs(1) != null) {
+return inputs(1).trim(inputs(0))
+  }
+}
+null
+  }
+
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-defineCodeGen(ctx, ev, c => s"($c).trim()")
+if (children.size == 2 && !children(0).isInstanceOf[Literal]) {
+  throw new AnalysisException(s"The trimming parameter should be 
Literal.")}
+
+val evals = children.map(_.genCode(ctx))
+val inputs = evals.map { eval =>
+  s"${eval.isNull} ? null : ${eval.value}"
+}
+val getTrimFunction = if (children.size == 1) {
+  s"UTF8String ${ev.value} = ${inputs(0)}.trim();"
+} else {
+  s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});"
+}
+ev.copy(evals.map(_.code).mkString("\n") + s"""
+  boolean ${ev.isNull} = false;
+  $getTrimFunction
+  if (${ev.value} == null) {
+${ev.isNull} = true;
+  }
+""")
   }
 }
 
 /**
- * A function that trim the spaces from left end for given string.
+ * A function that trims the characters from left end for a given string, 
If LEADING and trimStr keywords are not
+ * specified, it defaults to remove space character from the left end.

[GitHub] spark pull request #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should ...

2017-06-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18208


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120243671
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -578,39 +583,29 @@ case class StringTrim(children: Seq[Expression])
 val getTrimFunction = if (children.size == 1) {
   s"UTF8String ${ev.value} = ${inputs(0)}.trim();"
 } else {
-  s"UTF8String ${ev.value} = 
${inputs(1)}.trim(${inputs(0)});".stripMargin
+  s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});"
--- End diff --

I can add something like this: 
`val inputs = evals.map { eval =>
  s"${eval.isNull} ? null : ${eval.value}"
}.reverse`
there are couple places I will add the reverse in each trim function, what 
do you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...

2017-06-05 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/18208
  
Thanks. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18199
  
**[Test build #77762 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77762/testReport)**
 for PR 18199 at commit 
[`240c27b`](https://github.com/apache/spark/commit/240c27b8386ed929625f5817c32f04b5c100e4b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...

2017-06-05 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18199#discussion_r120243433
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.io._
+import java.nio.charset.StandardCharsets
+import java.util.concurrent.TimeUnit
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, 
DateTimeUtils}
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{ManualClock, SystemClock}
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `tuplesPerSecond`. Using finer granularities than seconds 
will be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated tuples. The source will try its best to reach 
`tuplesPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateSourceProvider extends StreamSourceProvider with 
DataSourceRegister {
+
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) =
+(shortName(), RateSourceProvider.SCHEMA)
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+val params = CaseInsensitiveMap(parameters)
+
+val tuplesPerSecond = 
params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L)
+if (tuplesPerSecond <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("tuplesPerSecond")}'. The option 
'tuplesPerSecond' " +
+  "must be positive")
+}
+
+val rampUpTimeSeconds =
+  
params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L)
+if (rampUpTimeSeconds < 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' 
" +
+  "must not be negative")
+}
+
+val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse(
+  sqlContext.sparkContext.defaultParallelism)
+if (numPartitions <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("numPartitions")}'. The option 
'numPartitions' " +
+  "must be positive")
+}
+
+new RateStreamSource(
+  sqlContext,
+  metadataPath,
+  tuplesPerSecond,
+  rampUpTimeSeconds,
+  numPartitions,
+  params.get("useManualClock").map(_.toBoolean).getOrElse(false) // 
Only for testing
+)
+  }
+  override def shortName(): String = "rate"
+}
+
+object RateSourceProvider {
+  val SCHEMA =
+StructType(StructField("timestamp", TimestampType) :: 
StructField("value", LongType) :: Nil)
+
+  val VERSION

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120243013
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -503,58 +503,63 @@ case class FindInSet(left: Expression, right: 
Expression) extends BinaryExpressi
   override def prettyName: String = "find_in_set"
 }
 
+trait String2TrimExpression extends ImplicitCastInputTypes {
+  self: Expression =>
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.size)(StringType)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def sql: String = {
+if (children.size == 1) {
+  val childrenSQL = children.map(_.sql).mkString(", ")
+  s"$prettyName($childrenSQL)"
+} else {
+  val trimSQL = children(0).map(_.sql).mkString(", ")
+  val tarSQL = children(1).map(_.sql).mkString(", ")
+  s"$prettyName($trimSQL, $tarSQL)"
+}
+  }
+}
+
 /**
  * A function that takes a character string, removes the leading and/or 
trailing characters matching with the characters
- * in the trim string, returns the new string. If LEADING/TRAILING/BOTH 
and trimStr keywords are not specified, it
- * defaults to remove space character from both ends.
+ * in the trim string, returns the new string. If BOTH and trimStr 
keywords are not specified, it defaults to remove
+ * space character from both ends.
  * trimStr: A character string to be trimmed from the source string, if it 
has multiple characters, the function
  * searches for each character in the source string, removes the 
characters from the source string until it
  * encounters the first non-match character.
- * LEADING: removes any characters from the left end of the source string 
that matches characters in the trim string.
- * TRAILING: removes any characters from the right end of the source 
string that matches characters in the trim string.
  * BOTH: removes any characters from both ends of the source string that 
matches characters in the trim string.
   */
 @ExpressionDescription(
   usage = """
 _FUNC_(str) - Removes the leading and trailing space characters from 
`str`.
 _FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing 
trimString from `str`
-_FUNC_(LEADING trimStr FROM str) - Remove the leading trimString from 
`str`
-_FUNC_(TRAILING trimStr FROM str) - Remove the trailing trimString 
from `str`
   """,
   extended = """
 Arguments:
   str - a string expression
   trimString - the trim string
   BOTH, FROM - these are keyword to specify for trim string from both 
ends of the string
-  LEADING, FROM - these are keyword to specify for trim string from 
left end of the string
-  TRAILING, FROM - these are keyword to specify for trim string from 
right end of the string
 Examples:
   > SELECT _FUNC_('SparkSQL   ');
SparkSQL
   > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS');
parkSQ
-  > SELECT _FUNC_(LEADING 'paS' FROM 'SSparkSQLS');
-   rkSQLS
-  > SELECT _FUNC_(TRAILING 'SLQ' FROM 'SSparkSQLS');
-   SSparkS
   """)
 case class StringTrim(children: Seq[Expression])
-  extends Expression with ImplicitCastInputTypes {
+  extends Expression with String2TrimExpression {
 
   require(children.size <= 2 && children.nonEmpty,
 s"$prettyName requires at least one argument and no more than two.")
 
-  override def dataType: DataType = StringType
-  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.size)(StringType)
-
-  override def nullable: Boolean = children.exists(_.nullable)
-  override def foldable: Boolean = children.forall(_.foldable)
-
   override def prettyName: String = "trim"
 
   // trim function can take one or two arguments.
-  // For one argument(children size is 1), it is the trim space function.
-  // For two arguments(children size is 2), it is the trim function with 
one of these options: BOTH/LEADING/TRAILING.
+  // Specify one child, it is for the trim space function.
+  // Specify the two children, it is for the trim function with BOTH 
option.
--- End diff --

np, I made the changes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18207
  
Thank you for confirming, @JoshRosen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18207
  
As of #17355 we no longer test against Python 2.6. That doesn't mean that 
2.6 won't work today, but there's nothing stopping 2.6 support from breaking in 
a future 2.2.x release because we are no longer testing against that release.

#17355 replaced our Python 2.6 testing environment with a Python 2.7 
release, so we can now begin to use language features and libraries which are 
only available from 2.7 onwards (such as set and dictionary comprehensions).

Therefore, this documentation change looks correct to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...

2017-06-05 Thread marmbrus

Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/18207
  
/cc @joshrosen


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...

2017-06-05 Thread yssharma

Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r120236128
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -193,6 +197,21 @@ object KinesisInputDStream {
 }
 
 /**
+ * Sets the Kinesis initial position data to the provided timestamp.
+ * Sets InitialPositionInStream to 
[[InitialPositionInStream.AT_TIMESTAMP]]
+ * and the timestamp to the provided value.
+ *
+ * @param timestamp Timestamp to resume the Kinesis stream from a 
provided
+ *  timestamp.
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def withTimestampAtInitialPositionInStream(timestamp: Date) : Builder 
= {
--- End diff --

Got it now. Read your new comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...

2017-06-05 Thread yssharma

Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r120236059
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -100,6 +103,7 @@ object KinesisInputDStream {
 private var endpointUrl: Option[String] = None
 private var regionName: Option[String] = None
 private var initialPositionInStream: Option[InitialPositionInStream] = 
None
+private var initialPositionInStreamTimestamp: Option[Date] = None
--- End diff --

Ah alright, so you're asking to get another 
`initialPositionInStreamTimestamp`. Thats similar to the 
`withInitialPositionAtTimestamp`.  Can rename that to suit this purpose.

Another question, The InitialPosition gets passed to the KinesisReceiver. I 
was passing a timestamp along with the Initial position at the moment. Are we 
planning to pass the `KinesisClientLibConfiguration` to the `KinesisReceiver` 
now ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...

2017-06-05 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r120235938
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -193,6 +197,21 @@ object KinesisInputDStream {
 }
 
 /**
+ * Sets the Kinesis initial position data to the provided timestamp.
+ * Sets InitialPositionInStream to 
[[InitialPositionInStream.AT_TIMESTAMP]]
+ * and the timestamp to the provided value.
+ *
+ * @param timestamp Timestamp to resume the Kinesis stream from a 
provided
+ *  timestamp.
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def withTimestampAtInitialPositionInStream(timestamp: Date) : Builder 
= {
--- End diff --

I just suggested renaming it. Sorry for the confusion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...

2017-06-05 Thread yssharma

Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r120235619
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -193,6 +197,21 @@ object KinesisInputDStream {
 }
 
 /**
+ * Sets the Kinesis initial position data to the provided timestamp.
+ * Sets InitialPositionInStream to 
[[InitialPositionInStream.AT_TIMESTAMP]]
+ * and the timestamp to the provided value.
+ *
+ * @param timestamp Timestamp to resume the Kinesis stream from a 
provided
+ *  timestamp.
+ * @return Reference to this [[KinesisInputDStream.Builder]]
+ */
+def withTimestampAtInitialPositionInStream(timestamp: Date) : Builder 
= {
--- End diff --

@brkyvz 
`withInitialPositionAtTimestamp` is an enhancer method for the 
InitialPositionAtTimestamp. If provided It will set the timestamp value along 
with the InitialPosition.AT_TIMESTAMP.

Its optional, hence the `initialPositionInStream` can still be used. This 
will not introduce and incompatibilities in usage. 
Thoughts ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17955
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77759/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17955
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17955
  
**[Test build #77759 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77759/testReport)**
 for PR 17955 at commit 
[`4550f61`](https://github.com/apache/spark/commit/4550f616a4f9c144a2da49a31ef3eaa19a0eeea8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...

2017-06-05 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r120234538
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -100,6 +103,7 @@ object KinesisInputDStream {
 private var endpointUrl: Option[String] = None
 private var regionName: Option[String] = None
 private var initialPositionInStream: Option[InitialPositionInStream] = 
None
+private var initialPositionInStreamTimestamp: Option[Date] = None
--- End diff --

I'm hoping we won't have to take both `initialPositionInStream` and 
`initialPositionInStreamTimestamp`. The builder is internal APIs, therefore we 
can definitely change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...

2017-06-05 Thread yssharma

Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r120234200
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala
 ---
@@ -100,6 +103,7 @@ object KinesisInputDStream {
 private var endpointUrl: Option[String] = None
 private var regionName: Option[String] = None
 private var initialPositionInStream: Option[InitialPositionInStream] = 
None
+private var initialPositionInStreamTimestamp: Option[Date] = None
--- End diff --

@brkyvz Where exactly are we planning to add these changes. Are you 
proposing to change the type of 
`private var initialPositionInStreamTimestamp: Option[Date] = None`

That would introduce a backward incompatibility on the current builder ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77760/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18199
  
**[Test build #77760 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77760/testReport)**
 for PR 18199 at commit 
[`ad32a7f`](https://github.com/apache/spark/commit/ad32a7ffc68266f08ad95f37874159fadc906a9e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18083
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77756/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18083
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18083
  
**[Test build #77756 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77756/testReport)**
 for PR 18083 at commit 
[`4a083de`](https://github.com/apache/spark/commit/4a083decb7e817fab49f25f4f0fe119352525aa7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18083
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77758/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18083
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17723
  
**[Test build #77761 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77761/testReport)**
 for PR 17723 at commit 
[`1479c60`](https://github.com/apache/spark/commit/1479c60b3059e17a29e23a309f1b38e364bb2451).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18083
  
**[Test build #77758 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77758/testReport)**
 for PR 18083 at commit 
[`d1a5e99`](https://github.com/apache/spark/commit/d1a5e991fb7fc3e7f93090c23d8088be8b650f61).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18098: [SPARK-16944][Mesos] Improve data locality when l...

2017-06-05 Thread gpang

Github user gpang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18098#discussion_r120231509
  
--- Diff: 
resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala
 ---
@@ -502,6 +521,25 @@ private[spark] class 
MesosCoarseGrainedSchedulerBackend(
 )
   }
 
+  private def satisfiesLocality(offerHostname: String): Boolean = {
+if (hostToLocalTaskCount.nonEmpty) {
--- End diff --

@mgummelt Thanks for the thoughtful response. Sorry for the delay.

I am not entirely sure how multi-stage jobs would work, but in the current 
PR, after all the executors are started for a stage, the delay timeout resets 
for the next "stage". So, if Spark needs 3 executors, and 3 executors 
eventually start, the next time Spark needs more executors, the delay timeout 
would start fresh. However, if the next stage is requested before the previous 
stage is fully allocated, then the scenario you described happens. I had made 
the assumption that stages would be fully allocated before requesting 
additional executors for the next stage. Do you have any insights into how 
executors in stages are allocated?

I will also look into per-host delay timeouts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18203: [SPARK-20954][SQL] Simple `DESCRIBE` result should be co...

2017-06-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18203
  
Hi, @gatorsmile and @cloud-fan .
Could you review this PR when you have sometime?
This will recover the incompatible changes at Spark 2.2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...

2017-06-05 Thread kevinyu98

Github user kevinyu98 commented on a diff in the pull request:

https://github.com/apache/spark/pull/12646#discussion_r120228866
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -503,58 +503,63 @@ case class FindInSet(left: Expression, right: 
Expression) extends BinaryExpressi
   override def prettyName: String = "find_in_set"
 }
 
+trait String2TrimExpression extends ImplicitCastInputTypes {
+  self: Expression =>
+
+  override def dataType: DataType = StringType
+  override def inputTypes: Seq[AbstractDataType] = 
Seq.fill(children.size)(StringType)
+
+  override def nullable: Boolean = children.exists(_.nullable)
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  override def sql: String = {
+if (children.size == 1) {
+  val childrenSQL = children.map(_.sql).mkString(", ")
+  s"$prettyName($childrenSQL)"
+} else {
+  val trimSQL = children(0).map(_.sql).mkString(", ")
+  val tarSQL = children(1).map(_.sql).mkString(", ")
+  s"$prettyName($trimSQL, $tarSQL)"
+}
+  }
+}
+
 /**
  * A function that takes a character string, removes the leading and/or 
trailing characters matching with the characters
- * in the trim string, returns the new string. If LEADING/TRAILING/BOTH 
and trimStr keywords are not specified, it
- * defaults to remove space character from both ends.
+ * in the trim string, returns the new string. If BOTH and trimStr 
keywords are not specified, it defaults to remove
+ * space character from both ends.
  * trimStr: A character string to be trimmed from the source string, if it 
has multiple characters, the function
  * searches for each character in the source string, removes the 
characters from the source string until it
  * encounters the first non-match character.
- * LEADING: removes any characters from the left end of the source string 
that matches characters in the trim string.
- * TRAILING: removes any characters from the right end of the source 
string that matches characters in the trim string.
  * BOTH: removes any characters from both ends of the source string that 
matches characters in the trim string.
   */
 @ExpressionDescription(
   usage = """
 _FUNC_(str) - Removes the leading and trailing space characters from 
`str`.
 _FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing 
trimString from `str`
-_FUNC_(LEADING trimStr FROM str) - Remove the leading trimString from 
`str`
-_FUNC_(TRAILING trimStr FROM str) - Remove the trailing trimString 
from `str`
   """,
   extended = """
 Arguments:
   str - a string expression
   trimString - the trim string
   BOTH, FROM - these are keyword to specify for trim string from both 
ends of the string
-  LEADING, FROM - these are keyword to specify for trim string from 
left end of the string
-  TRAILING, FROM - these are keyword to specify for trim string from 
right end of the string
 Examples:
   > SELECT _FUNC_('SparkSQL   ');
SparkSQL
   > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS');
parkSQ
-  > SELECT _FUNC_(LEADING 'paS' FROM 'SSparkSQLS');
-   rkSQLS
-  > SELECT _FUNC_(TRAILING 'SLQ' FROM 'SSparkSQLS');
-   SSparkS
   """)
 case class StringTrim(children: Seq[Expression])
-  extends Expression with ImplicitCastInputTypes {
+  extends Expression with String2TrimExpression {
--- End diff --

sure, will change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18208
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77755/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18208
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18208
  
**[Test build #77755 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77755/testReport)**
 for PR 18208 at commit 
[`8a2d37a`](https://github.com/apache/spark/commit/8a2d37a10cd6eb36403006b99a33a7d057905e6e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77757/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...

2017-06-05 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18199
  
**[Test build #77757 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77757/testReport)**
 for PR 18199 at commit 
[`3a95b55`](https://github.com/apache/spark/commit/3a95b550fdea231790c11df5324d5f965d6a4552).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...

2017-06-05 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/18199#discussion_r120225704
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.io._
+import java.nio.charset.StandardCharsets
+import java.util.concurrent.TimeUnit
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, 
DateTimeUtils}
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{ManualClock, SystemClock}
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `tuplesPerSecond`. Using finer granularities than seconds 
will be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated tuples. The source will try its best to reach 
`tuplesPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateSourceProvider extends StreamSourceProvider with 
DataSourceRegister {
+
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) =
+(shortName(), RateSourceProvider.SCHEMA)
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+val params = CaseInsensitiveMap(parameters)
+
+val tuplesPerSecond = 
params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L)
+if (tuplesPerSecond <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("tuplesPerSecond")}'. The option 
'tuplesPerSecond' " +
+  "must be positive")
+}
+
+val rampUpTimeSeconds =
+  
params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L)
+if (rampUpTimeSeconds < 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' 
" +
+  "must not be negative")
+}
+
+val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse(
+  sqlContext.sparkContext.defaultParallelism)
+if (numPartitions <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("numPartitions")}'. The option 
'numPartitions' " +
+  "must be positive")
+}
+
+new RateStreamSource(
+  sqlContext,
+  metadataPath,
+  tuplesPerSecond,
+  rampUpTimeSeconds,
+  numPartitions,
+  params.get("useManualClock").map(_.toBoolean).getOrElse(false) // 
Only for testing
+)
+  }
+  override def shortName(): String = "rate"
+}
+
+object RateSourceProvider {
+  val SCHEMA =
+StructType(StructField("timestamp", TimestampType) :: 
StructField("value", LongType) :: Nil)
+
+  val VERSION =

[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...

2017-06-05 Thread ash211

Github user ash211 commented on the issue:

https://github.com/apache/spark/pull/17935
  
@JoshRosen what was the other type of database you were using?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...

2017-06-05 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18199#discussion_r120222903
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.io._
+import java.nio.charset.StandardCharsets
+import java.util.concurrent.TimeUnit
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, 
DateTimeUtils}
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{ManualClock, SystemClock}
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `tuplesPerSecond`. Using finer granularities than seconds 
will be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated tuples. The source will try its best to reach 
`tuplesPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateSourceProvider extends StreamSourceProvider with 
DataSourceRegister {
+
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) =
+(shortName(), RateSourceProvider.SCHEMA)
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+val params = CaseInsensitiveMap(parameters)
+
+val tuplesPerSecond = 
params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L)
+if (tuplesPerSecond <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("tuplesPerSecond")}'. The option 
'tuplesPerSecond' " +
+  "must be positive")
+}
+
+val rampUpTimeSeconds =
+  
params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L)
+if (rampUpTimeSeconds < 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' 
" +
+  "must not be negative")
+}
+
+val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse(
+  sqlContext.sparkContext.defaultParallelism)
+if (numPartitions <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("numPartitions")}'. The option 
'numPartitions' " +
+  "must be positive")
+}
+
+new RateStreamSource(
+  sqlContext,
+  metadataPath,
+  tuplesPerSecond,
+  rampUpTimeSeconds,
+  numPartitions,
+  params.get("useManualClock").map(_.toBoolean).getOrElse(false) // 
Only for testing
+)
+  }
+  override def shortName(): String = "rate"
+}
+
+object RateSourceProvider {
+  val SCHEMA =
+StructType(StructField("timestamp", TimestampType) :: 
StructField("value", LongType) :: Nil)
+
+  val VERSION

[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...

2017-06-05 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/18199#discussion_r120222158
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala
 ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming
+
+import java.io._
+import java.nio.charset.StandardCharsets
+import java.util.concurrent.TimeUnit
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.{DataFrame, SQLContext}
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, 
DateTimeUtils}
+import org.apache.spark.sql.sources.{DataSourceRegister, 
StreamSourceProvider}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{ManualClock, SystemClock}
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `tuplesPerSecond`. Using finer granularities than seconds 
will be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated tuples. The source will try its best to reach 
`tuplesPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateSourceProvider extends StreamSourceProvider with 
DataSourceRegister {
+
+  override def sourceSchema(
+  sqlContext: SQLContext,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): (String, StructType) =
+(shortName(), RateSourceProvider.SCHEMA)
+
+  override def createSource(
+  sqlContext: SQLContext,
+  metadataPath: String,
+  schema: Option[StructType],
+  providerName: String,
+  parameters: Map[String, String]): Source = {
+val params = CaseInsensitiveMap(parameters)
+
+val tuplesPerSecond = 
params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L)
+if (tuplesPerSecond <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("tuplesPerSecond")}'. The option 
'tuplesPerSecond' " +
+  "must be positive")
+}
+
+val rampUpTimeSeconds =
+  
params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L)
+if (rampUpTimeSeconds < 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' 
" +
+  "must not be negative")
+}
+
+val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse(
+  sqlContext.sparkContext.defaultParallelism)
+if (numPartitions <= 0) {
+  throw new IllegalArgumentException(
+s"Invalid value '${params("numPartitions")}'. The option 
'numPartitions' " +
+  "must be positive")
+}
+
+new RateStreamSource(
+  sqlContext,
+  metadataPath,
+  tuplesPerSecond,
+  rampUpTimeSeconds,
+  numPartitions,
+  params.get("useManualClock").map(_.toBoolean).getOrElse(false) // 
Only for testing
+)
+  }
+  override def shortName(): String = "rate"
+}
+
+object RateSourceProvider {
+  val SCHEMA =
+StructType(StructField("timestamp", TimestampType) :: 
StructField("value", LongType) :: Nil)
+
+  val VERSION =

1 2 3 >

1 - 100 of 298 matches

Mail list logo