[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 In this pr: 1. Instead of `chunkIndex`, fetch chunk by `String chunkId`. Server doesn't cache the blocks list. 2. In `OpenBlocks`, only metadata(e.g. appId, executorId) of the stream is send. Thus client doesn't need to send the metadata in following fetching. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18211 **[Test build #77767 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77767/testReport)** for PR 18211 at commit [`883089a`](https://github.com/apache/spark/commit/883089aa824dabfb9b82a17546a953f1f0a22be4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18211: [WIP][SPARK-20994] Alleviate memory pressure in StreamMa...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18211 In my cluster, we are suffering from OOM of shuffle-service. We found that a lot of executors are fetching blocks from a single shuffle-service. Analyzing the memory, we found that the blockIds(shuffle_shuffleId_mapId_reduceId) takes about 1.5GBytes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18210: [SPARK-20993][CORE]The configuration item about 'Spark.b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18210 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18210: [SPARK-20993][CORE]The configuration item about '...
GitHub user guoxiaolongzte opened a pull request: https://github.com/apache/spark/pull/18210 [SPARK-20993][CORE]The configuration item about 'Spark.blacklist.enabled' need to set the default value 'false' ## What changes were proposed in this pull request? The configuration item's default value about 'Spark.blacklist.enabled' is 'false'. ![1](https://cloud.githubusercontent.com/assets/26266482/26817014/40469250-4ac7-11e7-96e6-617bfb93dd26.png) So, when the spark code to get the value of the configuration item about 'Spark.blacklist.enabled', you should specify the default value of 'false'. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guoxiaolongzte/spark SPARK-20993 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18210.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18210 commit d383efba12c66addb17006dea107bb0421d50bc3 Author: éå°é¾ 10207633 Date: 2017-03-31T13:57:09Z [SPARK-20177]Document about compression way has some little detail changes. commit 3059013e9d2aec76def14eb314b6761bea0e7ca0 Author: éå°é¾ 10207633 Date: 2017-04-01T01:38:02Z [SPARK-20177] event log add a space commit 555cef88fe09134ac98fd0ad056121c7df2539aa Author: guoxiaolongzte Date: 2017-04-02T00:16:08Z '/applications/[app-id]/jobs' in rest api,status should be [running|succeeded|failed|unknown] commit 46bb1ad3ddd9fb55b5607ac4f20213a90186cfe9 Author: éå°é¾ 10207633 Date: 2017-04-05T03:16:50Z Merge branch 'master' of https://github.com/apache/spark into SPARK-20177 commit 0efb0dd9e404229cce638fe3fb0c966276784df7 Author: éå°é¾ 10207633 Date: 2017-04-05T03:47:53Z [SPARK-20218]'/applications/[app-id]/stages' in REST API,add description. commit 0e37fdeee28e31fc97436dabd001d3c85c5a7794 Author: éå°é¾ 10207633 Date: 2017-04-05T05:22:54Z [SPARK-20218] '/applications/[app-id]/stages/[stage-id]' in REST API,remove redundant description. commit 52641bb01e55b48bd9e8579fea217439d14c7dc7 Author: éå°é¾ 10207633 Date: 2017-04-07T06:24:58Z Merge branch 'SPARK-20218' commit d3977c9cab0722d279e3fae7aacbd4eb944c22f6 Author: éå°é¾ 10207633 Date: 2017-04-08T07:13:02Z Merge branch 'master' of https://github.com/apache/spark commit 137b90e5a85cde7e9b904b3e5ea0bb52518c4716 Author: éå°é¾ 10207633 Date: 2017-04-10T05:13:40Z Merge branch 'master' of https://github.com/apache/spark commit 0fe5865b8022aeacdb2d194699b990d8467f7a0a Author: éå°é¾ 10207633 Date: 2017-04-10T10:25:22Z Merge branch 'SPARK-20190' of https://github.com/guoxiaolongzte/spark commit cf6f42ac84466960f2232c025b8faeb5d7378fe1 Author: éå°é¾ 10207633 Date: 2017-04-10T10:26:27Z Merge branch 'master' of https://github.com/apache/spark commit 685cd6b6e3799c7be65674b2670159ba725f0b8f Author: éå°é¾ 10207633 Date: 2017-04-14T01:12:41Z Merge branch 'master' of https://github.com/apache/spark commit c716a9231e9ab117d2b03ba67a1c8903d8d9da93 Author: guoxiaolong Date: 2017-04-17T06:57:21Z Merge branch 'master' of https://github.com/apache/spark commit 679cec36a968fbf995b567ca5f6f8cbd8e32673f Author: guoxiaolong Date: 2017-04-19T07:20:08Z Merge branch 'master' of https://github.com/apache/spark commit 3c9387af84a8f39cf8c1ce19e15de99dfcaf0ca5 Author: guoxiaolong Date: 2017-04-19T08:15:26Z Merge branch 'master' of https://github.com/apache/spark commit cb71f4462a0889cbb0843875b1e4cf14bcb0d020 Author: guoxiaolong Date: 2017-04-20T05:52:06Z Merge branch 'master' of https://github.com/apache/spark commit ce92a7415a2026f5bf909820110a13750a0949e1 Author: guoxiaolong Date: 2017-04-21T05:21:48Z Merge branch 'master' of https://github.com/apache/spark commit dd64342206041a8c3a282459e5f2b898dc558d89 Author: guoxiaolong Date: 2017-04-21T08:44:25Z Merge branch 'master' of https://github.com/apache/spark commit bffd2bd00c6b0e20313756e133adca4c97707c67 Author: guoxiaolong Date: 2017-04-28T01:36:29Z Merge branch 'master' of https://github.com/apache/spark commit 588d42a382345a071532ace1eab5457911f6aa46 Author: guoxiaolong Date: 2017-04-28T05:02:36Z Merge branch 'master' of https://github.com/apache/spark commit 4bbeee1231275d1afa0775dbb61fcc5817f6e57c Author: guoxiaolong Date: 2017-05-02T02:30:52Z Merge branch 'master' of https://github.com/apache/spark commit 362e5ad12bfe013a7780d81b5067c2ff644efa05 Author: guoxiaolong Date: 2017-05-03T06:47:54Z Merge branch 'master' of https://github.com/apache/spark commit 4ed5e00e784ab3c31e1ba69f06fd64520c9d32e4 Author: guoxiaolong
[GitHub] spark pull request #18211: [WIP][SPARK-20994] Alleviate memory pressure in S...
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/18211 [WIP][SPARK-20994] Alleviate memory pressure in StreamManager ## What changes were proposed in this pull request? In current code, chunks are fetched from shuffle service in two steps: Step-1. Send `OpenBlocks`, which contains the blocks list to to fetch; Step-2. Fetch the consecutive chunks from shuffle-service by `streamId` and `chunkIndex` Conceptually, there is no need to send the blocks list in step-1. Client can send the blockId in Step-2. Receiving `ChunkFetchRequest`, server can check if the chunkId is in local block manager and send back response. Thus memory cost can be improved. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jinxing64/spark SPARK-20994 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18211.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18211 commit 883089aa824dabfb9b82a17546a953f1f0a22be4 Author: jinxing Date: 2017-06-05T09:19:18Z [SPARK-20994] Alleviate memory pressure in StreamManager --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18148: [SPARK-20926][SQL] Removing exposures to guava li...
GitHub user rezasafi reopened a pull request: https://github.com/apache/spark/pull/18148 [SPARK-20926][SQL] Removing exposures to guava library caused by directly accessing SessionCatalog's tableRelationCache There could be test failures because DataStorageStrategy, HiveMetastoreCatalog and also HiveSchemaInferenceSuite were exposed to guava library by directly accessing SessionCatalog's tableRelationCacheg. These failures occur when guava shading is in place. ## What changes were proposed in this pull request? This change removes those guava exposures by introducing new methods in SessionCatalog and also changing DataStorageStrategy, HiveMetastoreCatalog and HiveSchemaInferenceSuite so that they use those proxy methods. ## How was this patch tested? Unit tests passed after applying these changes. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rezasafi/spark branch-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18148.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18148 commit 8253bbe36d551f11d8e48ab92444977ac5b0776a Author: Reza Safi Date: 2017-05-30T21:58:39Z [SPARK-20926][SQL] Removing exposures to guava library through directly accessing SessionCatalog's tableRelationCache There were test failures because DataStorageStrategy, HiveMetastoreCatalog and also HiveSchemaInferenceSuite were exposed to the shaded Guava library. This change removes those exposures by introducing new methods in SessionCatalog. commit 9821ea191d63b327663f29adb04b48c856c550ff Author: Reza Safi Date: 2017-06-02T01:36:05Z Making tableRelationCache private and updating the comments. commit 942137299dc03de53ce3e7120ac052f5764c14dc Author: Reza Safi Date: 2017-06-02T03:44:57Z Fixing scalastyle check errors commit 2832253afe2a48daae3f78568315b19a5aeb045f Author: Reza Safi Date: 2017-06-02T23:49:49Z Changing the names for two of the methods. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18148: [SPARK-20926][SQL] Removing exposures to guava li...
Github user rezasafi closed the pull request at: https://github.com/apache/spark/pull/18148 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18118: SPARK-20199 : Provided featureSubsetStrategy to GBTClass...
Github user pralabhkumar commented on the issue: https://github.com/apache/spark/pull/18118 Have done the changes suggested by @mpjlu . Please find some time to review the pull request . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #7379: [SPARK-8682][SQL][WIP] Range Join
Github user IceMan81 commented on the issue: https://github.com/apache/spark/pull/7379 @zzeekk Would you mind explaining how your workaround works. > A Workaround is to build blocks and add them as equi-join condition Not sure I understand what you are suggesting here. @marmbrus Inability to do range join efficiently results in very poor performance. Are there plans on addressing this directly in an upcoming release? I've scenarios where the optimizer sorts the results into the single partition for the join (all other partitions are empty) because the sort does not include the columns in the range condition. And this task will run for more than a day which a forced broadcast version of it will run in 3 hours. And here I'm only able to do the boradcast because I'm using a smaller data set on one side of the join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17953 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77766/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17953 **[Test build #77766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77766/testReport)** for PR 17953 at commit [`1e86674`](https://github.com/apache/spark/commit/1e866745b3639248a237c285479aa5fb72b3c8df). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18108: [SPARK-20884] Spark' masters will be both standby due to...
Github user liu-zhaokun commented on the issue: https://github.com/apache/spark/pull/18108 @HyukjinKwon Yes,I didn't found any problems when I compiled and used it in my local. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120269932 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -578,39 +583,29 @@ case class StringTrim(children: Seq[Expression]) val getTrimFunction = if (children.size == 1) { s"UTF8String ${ev.value} = ${inputs(0)}.trim();" } else { - s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});".stripMargin + s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});" --- End diff -- ok, I can try to change it during the AstBuilder.scala--> visitFunctionCall time (sql path) and funcitions.scala (dataframe path). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18159 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18159 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77765/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18159 **[Test build #77765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77765/testReport)** for PR 18159 at commit [`0af718d`](https://github.com/apache/spark/commit/0af718d15ed9c6bcf4e8de19528affdc492d1257). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18159 @cloud-fan The screenshot looks like: https://cloud.githubusercontent.com/assets/68855/26815029/614f13ba-4abc-11e7-9fbf-2248f0b7211d.png";> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120265375 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -578,39 +583,29 @@ case class StringTrim(children: Seq[Expression]) val getTrimFunction = if (children.size == 1) { s"UTF8String ${ev.value} = ${inputs(0)}.trim();" } else { - s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});".stripMargin + s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});" --- End diff -- Can't we just change the input order of `StringTrim`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120261787 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala --- @@ -1105,19 +1105,26 @@ class AstBuilder(conf: SQLConf) extends SqlBaseBaseVisitor[AnyRef] with Logging } /** - * Create a name LTRIM for TRIM(Leading), RTRIM for TRIM(Trailing), TRIM for TRIM(BOTH) + * Create a function name LTRIM for TRIM(Leading), RTRIM for TRIM(Trailing), TRIM for TRIM(BOTH), + * otherwise, returnthe original funcID. --- End diff -- will change --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/17953 @cloud-fan Do you think it should be done in this pull? And where should add the filter, `CalalogImpl.createTable()` or `ExternalCatalog.createTable()` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18108: [SPARK-20884] Spark' masters will be both standby due to...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18108 @liu-zhaokun, I would request another test after manually checking the test failure at least. Does this succeed in your local without a problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18108: [SPARK-20884] Spark' masters will be both standby due to...
Github user liu-zhaokun commented on the issue: https://github.com/apache/spark/pull/18108 @srowen First,I think the tests which are related to hive went to fail doesn't my business,right? And then ,the test of "org.apache.spark.deploy.master.PersistenceEngineSuite" says "java.lang.NoSuchMethodError: org.apache.curator.utils.ZKPaths.fixForNamespace",but I found there is almost no difference between the two version about these code by having compared these code again,so I think we don't have an API compatibility problem here, and I doubt there are some problems in Jenkins,could you test this PR again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18207 Yep. Right. Then, could you officially resolve [SPARK-12661](https://issues.apache.org/jira/browse/SPARK-12661), too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17953 **[Test build #77766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77766/testReport)** for PR 17953 at commit [`1e86674`](https://github.com/apache/spark/commit/1e866745b3639248a237c285479aa5fb72b3c8df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17953: [SPARK-20680][SQL] Spark-sql do not support for void col...
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/17953 Ahh, found it. Re-generated the golden files. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18207 OK great then we have officially deprecated it, haven't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18207 @rxin, as of #17355 Jenkins is using Python 2.7 sourced from a virtualenv instead of Python 2.6. That patch was merged into master before branch-2.2 was cut. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18205 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18205 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77763/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18205 **[Test build #77763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77763/testReport)** for PR 18205 at commit [`c53a0c7`](https://github.com/apache/spark/commit/c53a0c7a304e6a12548a047fd08786a174ed1479). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18159 @adrian-ionescu Thanks for the review. I've addressed the above comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77762/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18199 **[Test build #77762 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77762/testReport)** for PR 18199 at commit [`240c27b`](https://github.com/apache/spark/commit/240c27b8386ed929625f5817c32f04b5c100e4b8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18209: [SPARK-20992][Scheduler] Add support for Nomad as a sche...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18209 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18159 **[Test build #77765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77765/testReport)** for PR 18159 at commit [`0af718d`](https://github.com/apache/spark/commit/0af718d15ed9c6bcf4e8de19528affdc492d1257). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18207 Jenkins runs with `['python2.7', 'python3.4', 'pypy']` only, doesn't it? Also, this is a major release cycle with the other big changes. For me, removing Python 2.6 is not proper with subsequent minor version release cycles like 2.2.1 and 2.2.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18209: [SPARK-20992][Scheduler] Add support for Nomad as...
GitHub user barnardb opened a pull request: https://github.com/apache/spark/pull/18209 [SPARK-20992][Scheduler] Add support for Nomad as a scheduler backend ## What changes were proposed in this pull request? Adds support for [Nomad](https://github.com/hashicorp/nomad) as a scheduler backend. Nomad is a cluster manager designed for both long lived services and short lived batch processing workloads. The integration supports client and cluster mode, dynamic allocation (increasing only), has basic support for python and R applications, and works with applications packaged either as JARs or as docker images. Documentation is in [docs/running-on-nomad.md](https://github.com/barnardb/spark/blob/nomad/docs/running-on-nomad.md). This will be [presented at Spark Summit 2017](https://spark-summit.org/2017/events/homologous-apache-spark-clusters-using-nomad/). A build of the pull request with Nomad support is at available [here](https://www.dropbox.com/s/llcv388yl5hweje/spark-2.3.0-SNAPSHOT-bin-nomad.tgz?dl=0). Feedback would be much appreciated. ## How was this patch tested? This patch was tested with Integration and manual tests, and a load test was performed to ensure it doesn't have worse performance than the YARN integration. The feature was developed and tested against Nomad 0.5.6 (current stable version) on Spark 2.1.0, rebased to 2.1.1 and retested, and finally rebased to master and retested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/barnardb/spark nomad Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18209 commit c762194188e64cccff8a9758885b45f9d395cced Author: Ben Barnard Date: 2017-06-06T01:19:35Z Add support for Nomad as a scheduler backend --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16820: [SPARK-19471] AggregationIterator does not initialize th...
Github user yangw1234 commented on the issue: https://github.com/apache/spark/pull/16820 Sorry I could not find time to finish this pr recently. Close it for now. If you need this fix, please feel free to base on it and finish it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16820: [SPARK-19471] AggregationIterator does not initia...
Github user yangw1234 closed the pull request at: https://github.com/apache/spark/pull/16820 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18193 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18193 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77764/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18193 **[Test build #77764 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77764/testReport)** for PR 18193 at commit [`171a9e6`](https://github.com/apache/spark/commit/171a9e66d2ceaeae87ced754be49554ce602930b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18207 I believe we still support Python 2.6, given Jenkins runs 2.6... There seems to be no point in removing that support this late in the release cycle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18192 I guess it is discouraged to move codes alone as explained but wouldn't it be better to merge this rather than close if this looks better in any way and the change is safe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...
Github user zhengcanbin commented on the issue: https://github.com/apache/spark/pull/18192 @jerryshao Should I close this issue ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77761/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17723 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #77761 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77761/testReport)** for PR 17723 at commit [`1479c60`](https://github.com/apache/spark/commit/1479c60b3059e17a29e23a309f1b38e364bb2451). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18192 The change should be safe, but usually we don't do such code structure refactoring alone without a strong reason, so I'm neutral of this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18192: [SPARK-20944][SHUFFLE] Move shouldBypassMergeSort from S...
Github user zhengcanbin commented on the issue: https://github.com/apache/spark/pull/18192 @jerryshao It's a tiny change for more reasonable code structure. There exists three `ShuffleWriter ` implementations, we first use the helper method `SortShuffleWriter#shouldBypassMergeSort` to determine whether a shuffle should use `BypassMergeSort` pathï¼and then use another helper method `SortShuffleManager#canUseSerializedShuffle` for deciding `UnsafeShuffleWriter` path. From view of code structure consistencyï¼method `shouldBypassMergeSort` should not belong to `SortShuffleWriter`ï¼it should be included in `BypassMergeSortShuffleWriter` or `SortShuffleManager`, and better for the later one to put the two helper methods together. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18193 **[Test build #77764 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77764/testReport)** for PR 18193 at commit [`171a9e6`](https://github.com/apache/spark/commit/171a9e66d2ceaeae87ced754be49554ce602930b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18205: [SPARK-20986] [SQL] Reset table's statistics after Prune...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18205 **[Test build #77763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77763/testReport)** for PR 18205 at commit [`c53a0c7`](https://github.com/apache/spark/commit/c53a0c7a304e6a12548a047fd08786a174ed1479). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/18199#discussion_r120244321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala --- @@ -199,13 +199,52 @@ class RateStreamSource( } val localStartTimeMs = startTimeMs + TimeUnit.SECONDS.toMillis(startSeconds) -val relativeMsPerValue = - TimeUnit.SECONDS.toMillis(endSeconds - startSeconds) / (rangeEnd - rangeStart) --- End diff -- I thought that you would only change `TimeUnit.SECONDS.toMillis(endSeconds - startSeconds).toDouble`. Wasn't expecting all this change! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18205: [SPARK-20986] [SQL] Reset table's statistics afte...
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/18205#discussion_r120244237 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/PruneFileSourcePartitionsSuite.scala --- @@ -66,4 +67,33 @@ class PruneFileSourcePartitionsSuite extends QueryTest with SQLTestUtils with Te } } } + + test("SPARK-20986 Reset table's statistics after PruneFileSourcePartitions rule") { +withTempView("tempTbl", "partTbl") { + spark.range(1000).selectExpr("id").createOrReplaceTempView("tempTbl") + sql("CREATE TABLE partTbl (id INT) PARTITIONED BY (part INT) STORED AS parquet") --- End diff -- Yes, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120243896 --- Diff: common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java --- @@ -730,4 +730,58 @@ public void testToLong() throws IOException { assertFalse(negativeInput, UTF8String.fromString(negativeInput).toLong(wrapper)); } } + + @Test + public void trim() { --- End diff -- sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120243912 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -502,69 +503,232 @@ case class FindInSet(left: Expression, right: Expression) extends BinaryExpressi override def prettyName: String = "find_in_set" } +trait String2TrimExpression extends ImplicitCastInputTypes { + self: Expression => + + override def dataType: DataType = StringType + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.size)(StringType) + + override def nullable: Boolean = children.exists(_.nullable) + override def foldable: Boolean = children.forall(_.foldable) + + override def sql: String = { +if (children.size == 1) { + val childrenSQL = children.map(_.sql).mkString(", ") + s"$prettyName($childrenSQL)" +} else { + val trimSQL = children(0).map(_.sql).mkString(", ") + val tarSQL = children(1).map(_.sql).mkString(", ") + s"$prettyName($trimSQL, $tarSQL)" +} + } +} + /** - * A function that trim the spaces from both ends for the specified string. - */ + * A function that takes a character string, removes the leading and/or trailing characters matching with the characters + * in the trim string, returns the new string. If BOTH and trimStr keywords are not specified, it defaults to remove + * space character from both ends. + * trimStr: A character string to be trimmed from the source string, if it has multiple characters, the function + * searches for each character in the source string, removes the characters from the source string until it + * encounters the first non-match character. + * BOTH: removes any characters from both ends of the source string that matches characters in the trim string. + */ @ExpressionDescription( - usage = "_FUNC_(str) - Removes the leading and trailing space characters from `str`.", + usage = """ +_FUNC_(str) - Removes the leading and trailing space characters from `str`. +_FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing trimString from `str` + """, extended = """ +Arguments: + str - a string expression + trimString - the trim string + BOTH, FROM - these are keyword to specify for trim string from both ends of the string Examples: > SELECT _FUNC_('SparkSQL '); SparkSQL + > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS'); + parkSQ """) -case class StringTrim(child: Expression) - extends UnaryExpression with String2StringExpression { +case class StringTrim(children: Seq[Expression]) + extends Expression with String2TrimExpression { - def convert(v: UTF8String): UTF8String = v.trim() + require(children.size <= 2 && children.nonEmpty, +s"$prettyName requires at least one argument and no more than two.") override def prettyName: String = "trim" + // trim function can take one or two arguments. + // Specify one child, it is for the trim space function. + // Specify the two children, it is for the trim function with BOTH option. + override def eval(input: InternalRow): Any = { +val inputs = children.map(_.eval(input).asInstanceOf[UTF8String]) +if (inputs(0) != null) { + if (children.size == 1) { +return inputs(0).trim() + } else if (inputs(1) != null) { +return inputs(1).trim(inputs(0)) + } +} +null + } + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { -defineCodeGen(ctx, ev, c => s"($c).trim()") +if (children.size == 2 && !children(0).isInstanceOf[Literal]) { + throw new AnalysisException(s"The trimming parameter should be Literal.")} + +val evals = children.map(_.genCode(ctx)) +val inputs = evals.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +} +val getTrimFunction = if (children.size == 1) { + s"UTF8String ${ev.value} = ${inputs(0)}.trim();" +} else { + s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});" +} +ev.copy(evals.map(_.code).mkString("\n") + s""" + boolean ${ev.isNull} = false; + $getTrimFunction + if (${ev.value} == null) { +${ev.isNull} = true; + } +""") } } /** - * A function that trim the spaces from left end for given string. + * A function that trims the characters from left end for a given string, If LEADING and trimStr keywords are not + * specified, it defaults to remove space character from the left end.
[GitHub] spark pull request #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18208 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120243671 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -578,39 +583,29 @@ case class StringTrim(children: Seq[Expression]) val getTrimFunction = if (children.size == 1) { s"UTF8String ${ev.value} = ${inputs(0)}.trim();" } else { - s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});".stripMargin + s"UTF8String ${ev.value} = ${inputs(1)}.trim(${inputs(0)});" --- End diff -- I can add something like this: `val inputs = evals.map { eval => s"${eval.isNull} ? null : ${eval.value}" }.reverse` there are couple places I will add the reverse in each trim function, what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18208 Thanks. Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18199 **[Test build #77762 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77762/testReport)** for PR 18199 at commit [`240c27b`](https://github.com/apache/spark/commit/240c27b8386ed929625f5817c32f04b5c100e4b8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18199#discussion_r120243433 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import java.io._ +import java.nio.charset.StandardCharsets +import java.util.concurrent.TimeUnit + +import org.apache.commons.io.IOUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.JavaUtils +import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils} +import org.apache.spark.sql.sources.{DataSourceRegister, StreamSourceProvider} +import org.apache.spark.sql.types._ +import org.apache.spark.util.{ManualClock, SystemClock} + +/** + * A source that generates increment long values with timestamps. Each generated row has two + * columns: a timestamp column for the generated time and an auto increment long column starting + * with 0L. + * + * This source supports the following options: + * - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be generated per second. + * - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the generating speed + *becomes `tuplesPerSecond`. Using finer granularities than seconds will be truncated to integer + *seconds. + * - `numPartitions` (e.g. 10, default: Spark's default parallelism): The partition number for the + *generated tuples. The source will try its best to reach `tuplesPerSecond`, but the query may + *be resource constrained, and `numPartitions` can be tweaked to help reach the desired speed. + */ +class RateSourceProvider extends StreamSourceProvider with DataSourceRegister { + + override def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) = +(shortName(), RateSourceProvider.SCHEMA) + + override def createSource( + sqlContext: SQLContext, + metadataPath: String, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): Source = { +val params = CaseInsensitiveMap(parameters) + +val tuplesPerSecond = params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L) +if (tuplesPerSecond <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("tuplesPerSecond")}'. The option 'tuplesPerSecond' " + + "must be positive") +} + +val rampUpTimeSeconds = + params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L) +if (rampUpTimeSeconds < 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' " + + "must not be negative") +} + +val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse( + sqlContext.sparkContext.defaultParallelism) +if (numPartitions <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("numPartitions")}'. The option 'numPartitions' " + + "must be positive") +} + +new RateStreamSource( + sqlContext, + metadataPath, + tuplesPerSecond, + rampUpTimeSeconds, + numPartitions, + params.get("useManualClock").map(_.toBoolean).getOrElse(false) // Only for testing +) + } + override def shortName(): String = "rate" +} + +object RateSourceProvider { + val SCHEMA = +StructType(StructField("timestamp", TimestampType) :: StructField("value", LongType) :: Nil) + + val VERSION
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120243013 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -503,58 +503,63 @@ case class FindInSet(left: Expression, right: Expression) extends BinaryExpressi override def prettyName: String = "find_in_set" } +trait String2TrimExpression extends ImplicitCastInputTypes { + self: Expression => + + override def dataType: DataType = StringType + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.size)(StringType) + + override def nullable: Boolean = children.exists(_.nullable) + override def foldable: Boolean = children.forall(_.foldable) + + override def sql: String = { +if (children.size == 1) { + val childrenSQL = children.map(_.sql).mkString(", ") + s"$prettyName($childrenSQL)" +} else { + val trimSQL = children(0).map(_.sql).mkString(", ") + val tarSQL = children(1).map(_.sql).mkString(", ") + s"$prettyName($trimSQL, $tarSQL)" +} + } +} + /** * A function that takes a character string, removes the leading and/or trailing characters matching with the characters - * in the trim string, returns the new string. If LEADING/TRAILING/BOTH and trimStr keywords are not specified, it - * defaults to remove space character from both ends. + * in the trim string, returns the new string. If BOTH and trimStr keywords are not specified, it defaults to remove + * space character from both ends. * trimStr: A character string to be trimmed from the source string, if it has multiple characters, the function * searches for each character in the source string, removes the characters from the source string until it * encounters the first non-match character. - * LEADING: removes any characters from the left end of the source string that matches characters in the trim string. - * TRAILING: removes any characters from the right end of the source string that matches characters in the trim string. * BOTH: removes any characters from both ends of the source string that matches characters in the trim string. */ @ExpressionDescription( usage = """ _FUNC_(str) - Removes the leading and trailing space characters from `str`. _FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing trimString from `str` -_FUNC_(LEADING trimStr FROM str) - Remove the leading trimString from `str` -_FUNC_(TRAILING trimStr FROM str) - Remove the trailing trimString from `str` """, extended = """ Arguments: str - a string expression trimString - the trim string BOTH, FROM - these are keyword to specify for trim string from both ends of the string - LEADING, FROM - these are keyword to specify for trim string from left end of the string - TRAILING, FROM - these are keyword to specify for trim string from right end of the string Examples: > SELECT _FUNC_('SparkSQL '); SparkSQL > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS'); parkSQ - > SELECT _FUNC_(LEADING 'paS' FROM 'SSparkSQLS'); - rkSQLS - > SELECT _FUNC_(TRAILING 'SLQ' FROM 'SSparkSQLS'); - SSparkS """) case class StringTrim(children: Seq[Expression]) - extends Expression with ImplicitCastInputTypes { + extends Expression with String2TrimExpression { require(children.size <= 2 && children.nonEmpty, s"$prettyName requires at least one argument and no more than two.") - override def dataType: DataType = StringType - override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.size)(StringType) - - override def nullable: Boolean = children.exists(_.nullable) - override def foldable: Boolean = children.forall(_.foldable) - override def prettyName: String = "trim" // trim function can take one or two arguments. - // For one argument(children size is 1), it is the trim space function. - // For two arguments(children size is 2), it is the trim function with one of these options: BOTH/LEADING/TRAILING. + // Specify one child, it is for the trim space function. + // Specify the two children, it is for the trim function with BOTH option. --- End diff -- np, I made the changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- -
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18207 Thank you for confirming, @JoshRosen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user JoshRosen commented on the issue: https://github.com/apache/spark/pull/18207 As of #17355 we no longer test against Python 2.6. That doesn't mean that 2.6 won't work today, but there's nothing stopping 2.6 support from breaking in a future 2.2.x release because we are no longer testing against that release. #17355 replaced our Python 2.6 testing environment with a Python 2.7 release, so we can now begin to use language features and libraries which are only available from 2.7 onwards (such as set and dictionary comprehensions). Therefore, this documentation change looks correct to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18207: [MINOR][DOC] Update deprecation notes on Python/Hadoop/S...
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/18207 /cc @joshrosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r120236128 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -193,6 +197,21 @@ object KinesisInputDStream { } /** + * Sets the Kinesis initial position data to the provided timestamp. + * Sets InitialPositionInStream to [[InitialPositionInStream.AT_TIMESTAMP]] + * and the timestamp to the provided value. + * + * @param timestamp Timestamp to resume the Kinesis stream from a provided + * timestamp. + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def withTimestampAtInitialPositionInStream(timestamp: Date) : Builder = { --- End diff -- Got it now. Read your new comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r120236059 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -100,6 +103,7 @@ object KinesisInputDStream { private var endpointUrl: Option[String] = None private var regionName: Option[String] = None private var initialPositionInStream: Option[InitialPositionInStream] = None +private var initialPositionInStreamTimestamp: Option[Date] = None --- End diff -- Ah alright, so you're asking to get another `initialPositionInStreamTimestamp`. Thats similar to the `withInitialPositionAtTimestamp`. Can rename that to suit this purpose. Another question, The InitialPosition gets passed to the KinesisReceiver. I was passing a timestamp along with the Initial position at the moment. Are we planning to pass the `KinesisClientLibConfiguration` to the `KinesisReceiver` now ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r120235938 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -193,6 +197,21 @@ object KinesisInputDStream { } /** + * Sets the Kinesis initial position data to the provided timestamp. + * Sets InitialPositionInStream to [[InitialPositionInStream.AT_TIMESTAMP]] + * and the timestamp to the provided value. + * + * @param timestamp Timestamp to resume the Kinesis stream from a provided + * timestamp. + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def withTimestampAtInitialPositionInStream(timestamp: Date) : Builder = { --- End diff -- I just suggested renaming it. Sorry for the confusion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r120235619 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -193,6 +197,21 @@ object KinesisInputDStream { } /** + * Sets the Kinesis initial position data to the provided timestamp. + * Sets InitialPositionInStream to [[InitialPositionInStream.AT_TIMESTAMP]] + * and the timestamp to the provided value. + * + * @param timestamp Timestamp to resume the Kinesis stream from a provided + * timestamp. + * @return Reference to this [[KinesisInputDStream.Builder]] + */ +def withTimestampAtInitialPositionInStream(timestamp: Date) : Builder = { --- End diff -- @brkyvz `withInitialPositionAtTimestamp` is an enhancer method for the InitialPositionAtTimestamp. If provided It will set the timestamp value along with the InitialPosition.AT_TIMESTAMP. Its optional, hence the `initialPositionInStream` can still be used. This will not introduce and incompatibilities in usage. Thoughts ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17955 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77759/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17955 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17955: [SPARK-20715] Store MapStatuses only in MapOutputTracker...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17955 **[Test build #77759 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77759/testReport)** for PR 17955 at commit [`4550f61`](https://github.com/apache/spark/commit/4550f616a4f9c144a2da49a31ef3eaa19a0eeea8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r120234538 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -100,6 +103,7 @@ object KinesisInputDStream { private var endpointUrl: Option[String] = None private var regionName: Option[String] = None private var initialPositionInStream: Option[InitialPositionInStream] = None +private var initialPositionInStreamTimestamp: Option[Date] = None --- End diff -- I'm hoping we won't have to take both `initialPositionInStream` and `initialPositionInStreamTimestamp`. The builder is internal APIs, therefore we can definitely change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18029: [SPARK-20168][WIP][DStream] Add changes to use ki...
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r120234200 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisInputDStream.scala --- @@ -100,6 +103,7 @@ object KinesisInputDStream { private var endpointUrl: Option[String] = None private var regionName: Option[String] = None private var initialPositionInStream: Option[InitialPositionInStream] = None +private var initialPositionInStreamTimestamp: Option[Date] = None --- End diff -- @brkyvz Where exactly are we planning to add these changes. Are you proposing to change the type of `private var initialPositionInStreamTimestamp: Option[Date] = None` That would introduce a backward incompatibility on the current builder ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77760/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18199 **[Test build #77760 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77760/testReport)** for PR 18199 at commit [`ad32a7f`](https://github.com/apache/spark/commit/ad32a7ffc68266f08ad95f37874159fadc906a9e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77756/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18083 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18083 **[Test build #77756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77756/testReport)** for PR 18083 at commit [`4a083de`](https://github.com/apache/spark/commit/4a083decb7e817fab49f25f4f0fe119352525aa7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18083 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77758/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18083 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17723: [SPARK-20434][YARN][CORE] Move kerberos delegation token...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17723 **[Test build #77761 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77761/testReport)** for PR 17723 at commit [`1479c60`](https://github.com/apache/spark/commit/1479c60b3059e17a29e23a309f1b38e364bb2451). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18083: [SPARK-20863] Add metrics/instrumentation to LiveListene...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18083 **[Test build #77758 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77758/testReport)** for PR 18083 at commit [`d1a5e99`](https://github.com/apache/spark/commit/d1a5e991fb7fc3e7f93090c23d8088be8b650f61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18098: [SPARK-16944][Mesos] Improve data locality when l...
Github user gpang commented on a diff in the pull request: https://github.com/apache/spark/pull/18098#discussion_r120231509 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -502,6 +521,25 @@ private[spark] class MesosCoarseGrainedSchedulerBackend( ) } + private def satisfiesLocality(offerHostname: String): Boolean = { +if (hostToLocalTaskCount.nonEmpty) { --- End diff -- @mgummelt Thanks for the thoughtful response. Sorry for the delay. I am not entirely sure how multi-stage jobs would work, but in the current PR, after all the executors are started for a stage, the delay timeout resets for the next "stage". So, if Spark needs 3 executors, and 3 executors eventually start, the next time Spark needs more executors, the delay timeout would start fresh. However, if the next stage is requested before the previous stage is fully allocated, then the scenario you described happens. I had made the assumption that stages would be fully allocated before requesting additional executors for the next stage. Do you have any insights into how executors in stages are allocated? I will also look into per-host delay timeouts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18203: [SPARK-20954][SQL] Simple `DESCRIBE` result should be co...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18203 Hi, @gatorsmile and @cloud-fan . Could you review this PR when you have sometime? This will recover the incompatible changes at Spark 2.2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #12646: [SPARK-14878][SQL] Trim characters string functio...
Github user kevinyu98 commented on a diff in the pull request: https://github.com/apache/spark/pull/12646#discussion_r120228866 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -503,58 +503,63 @@ case class FindInSet(left: Expression, right: Expression) extends BinaryExpressi override def prettyName: String = "find_in_set" } +trait String2TrimExpression extends ImplicitCastInputTypes { + self: Expression => + + override def dataType: DataType = StringType + override def inputTypes: Seq[AbstractDataType] = Seq.fill(children.size)(StringType) + + override def nullable: Boolean = children.exists(_.nullable) + override def foldable: Boolean = children.forall(_.foldable) + + override def sql: String = { +if (children.size == 1) { + val childrenSQL = children.map(_.sql).mkString(", ") + s"$prettyName($childrenSQL)" +} else { + val trimSQL = children(0).map(_.sql).mkString(", ") + val tarSQL = children(1).map(_.sql).mkString(", ") + s"$prettyName($trimSQL, $tarSQL)" +} + } +} + /** * A function that takes a character string, removes the leading and/or trailing characters matching with the characters - * in the trim string, returns the new string. If LEADING/TRAILING/BOTH and trimStr keywords are not specified, it - * defaults to remove space character from both ends. + * in the trim string, returns the new string. If BOTH and trimStr keywords are not specified, it defaults to remove + * space character from both ends. * trimStr: A character string to be trimmed from the source string, if it has multiple characters, the function * searches for each character in the source string, removes the characters from the source string until it * encounters the first non-match character. - * LEADING: removes any characters from the left end of the source string that matches characters in the trim string. - * TRAILING: removes any characters from the right end of the source string that matches characters in the trim string. * BOTH: removes any characters from both ends of the source string that matches characters in the trim string. */ @ExpressionDescription( usage = """ _FUNC_(str) - Removes the leading and trailing space characters from `str`. _FUNC_(BOTH trimStr FROM str) - Remove the leading and trailing trimString from `str` -_FUNC_(LEADING trimStr FROM str) - Remove the leading trimString from `str` -_FUNC_(TRAILING trimStr FROM str) - Remove the trailing trimString from `str` """, extended = """ Arguments: str - a string expression trimString - the trim string BOTH, FROM - these are keyword to specify for trim string from both ends of the string - LEADING, FROM - these are keyword to specify for trim string from left end of the string - TRAILING, FROM - these are keyword to specify for trim string from right end of the string Examples: > SELECT _FUNC_('SparkSQL '); SparkSQL > SELECT _FUNC_(BOTH 'SL' FROM 'SSparkSQLS'); parkSQ - > SELECT _FUNC_(LEADING 'paS' FROM 'SSparkSQLS'); - rkSQLS - > SELECT _FUNC_(TRAILING 'SLQ' FROM 'SSparkSQLS'); - SSparkS """) case class StringTrim(children: Seq[Expression]) - extends Expression with ImplicitCastInputTypes { + extends Expression with String2TrimExpression { --- End diff -- sure, will change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77755/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18208 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18208: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a Ti...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18208 **[Test build #77755 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77755/testReport)** for PR 18208 at commit [`8a2d37a`](https://github.com/apache/spark/commit/8a2d37a10cd6eb36403006b99a33a7d057905e6e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77757/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18199 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18199: [SPARK-20979][SS]Add RateSource to generate values for t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18199 **[Test build #77757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77757/testReport)** for PR 18199 at commit [`3a95b55`](https://github.com/apache/spark/commit/3a95b550fdea231790c11df5324d5f965d6a4552). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/18199#discussion_r120225704 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import java.io._ +import java.nio.charset.StandardCharsets +import java.util.concurrent.TimeUnit + +import org.apache.commons.io.IOUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.JavaUtils +import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils} +import org.apache.spark.sql.sources.{DataSourceRegister, StreamSourceProvider} +import org.apache.spark.sql.types._ +import org.apache.spark.util.{ManualClock, SystemClock} + +/** + * A source that generates increment long values with timestamps. Each generated row has two + * columns: a timestamp column for the generated time and an auto increment long column starting + * with 0L. + * + * This source supports the following options: + * - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be generated per second. + * - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the generating speed + *becomes `tuplesPerSecond`. Using finer granularities than seconds will be truncated to integer + *seconds. + * - `numPartitions` (e.g. 10, default: Spark's default parallelism): The partition number for the + *generated tuples. The source will try its best to reach `tuplesPerSecond`, but the query may + *be resource constrained, and `numPartitions` can be tweaked to help reach the desired speed. + */ +class RateSourceProvider extends StreamSourceProvider with DataSourceRegister { + + override def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) = +(shortName(), RateSourceProvider.SCHEMA) + + override def createSource( + sqlContext: SQLContext, + metadataPath: String, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): Source = { +val params = CaseInsensitiveMap(parameters) + +val tuplesPerSecond = params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L) +if (tuplesPerSecond <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("tuplesPerSecond")}'. The option 'tuplesPerSecond' " + + "must be positive") +} + +val rampUpTimeSeconds = + params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L) +if (rampUpTimeSeconds < 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' " + + "must not be negative") +} + +val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse( + sqlContext.sparkContext.defaultParallelism) +if (numPartitions <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("numPartitions")}'. The option 'numPartitions' " + + "must be positive") +} + +new RateStreamSource( + sqlContext, + metadataPath, + tuplesPerSecond, + rampUpTimeSeconds, + numPartitions, + params.get("useManualClock").map(_.toBoolean).getOrElse(false) // Only for testing +) + } + override def shortName(): String = "rate" +} + +object RateSourceProvider { + val SCHEMA = +StructType(StructField("timestamp", TimestampType) :: StructField("value", LongType) :: Nil) + + val VERSION =
[GitHub] spark issue #17935: [SPARK-20690][SQL] Subqueries in FROM should have alias ...
Github user ash211 commented on the issue: https://github.com/apache/spark/pull/17935 @JoshRosen what was the other type of database you were using? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18199#discussion_r120222903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import java.io._ +import java.nio.charset.StandardCharsets +import java.util.concurrent.TimeUnit + +import org.apache.commons.io.IOUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.JavaUtils +import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils} +import org.apache.spark.sql.sources.{DataSourceRegister, StreamSourceProvider} +import org.apache.spark.sql.types._ +import org.apache.spark.util.{ManualClock, SystemClock} + +/** + * A source that generates increment long values with timestamps. Each generated row has two + * columns: a timestamp column for the generated time and an auto increment long column starting + * with 0L. + * + * This source supports the following options: + * - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be generated per second. + * - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the generating speed + *becomes `tuplesPerSecond`. Using finer granularities than seconds will be truncated to integer + *seconds. + * - `numPartitions` (e.g. 10, default: Spark's default parallelism): The partition number for the + *generated tuples. The source will try its best to reach `tuplesPerSecond`, but the query may + *be resource constrained, and `numPartitions` can be tweaked to help reach the desired speed. + */ +class RateSourceProvider extends StreamSourceProvider with DataSourceRegister { + + override def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) = +(shortName(), RateSourceProvider.SCHEMA) + + override def createSource( + sqlContext: SQLContext, + metadataPath: String, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): Source = { +val params = CaseInsensitiveMap(parameters) + +val tuplesPerSecond = params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L) +if (tuplesPerSecond <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("tuplesPerSecond")}'. The option 'tuplesPerSecond' " + + "must be positive") +} + +val rampUpTimeSeconds = + params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L) +if (rampUpTimeSeconds < 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' " + + "must not be negative") +} + +val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse( + sqlContext.sparkContext.defaultParallelism) +if (numPartitions <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("numPartitions")}'. The option 'numPartitions' " + + "must be positive") +} + +new RateStreamSource( + sqlContext, + metadataPath, + tuplesPerSecond, + rampUpTimeSeconds, + numPartitions, + params.get("useManualClock").map(_.toBoolean).getOrElse(false) // Only for testing +) + } + override def shortName(): String = "rate" +} + +object RateSourceProvider { + val SCHEMA = +StructType(StructField("timestamp", TimestampType) :: StructField("value", LongType) :: Nil) + + val VERSION
[GitHub] spark pull request #18199: [SPARK-20979][SS]Add RateSource to generate value...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/18199#discussion_r120222158 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/RateSourceProvider.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming + +import java.io._ +import java.nio.charset.StandardCharsets +import java.util.concurrent.TimeUnit + +import org.apache.commons.io.IOUtils + +import org.apache.spark.internal.Logging +import org.apache.spark.network.util.JavaUtils +import org.apache.spark.sql.{DataFrame, SQLContext} +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{CaseInsensitiveMap, DateTimeUtils} +import org.apache.spark.sql.sources.{DataSourceRegister, StreamSourceProvider} +import org.apache.spark.sql.types._ +import org.apache.spark.util.{ManualClock, SystemClock} + +/** + * A source that generates increment long values with timestamps. Each generated row has two + * columns: a timestamp column for the generated time and an auto increment long column starting + * with 0L. + * + * This source supports the following options: + * - `tuplesPerSecond` (e.g. 100, default: 1): How many tuples should be generated per second. + * - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the generating speed + *becomes `tuplesPerSecond`. Using finer granularities than seconds will be truncated to integer + *seconds. + * - `numPartitions` (e.g. 10, default: Spark's default parallelism): The partition number for the + *generated tuples. The source will try its best to reach `tuplesPerSecond`, but the query may + *be resource constrained, and `numPartitions` can be tweaked to help reach the desired speed. + */ +class RateSourceProvider extends StreamSourceProvider with DataSourceRegister { + + override def sourceSchema( + sqlContext: SQLContext, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): (String, StructType) = +(shortName(), RateSourceProvider.SCHEMA) + + override def createSource( + sqlContext: SQLContext, + metadataPath: String, + schema: Option[StructType], + providerName: String, + parameters: Map[String, String]): Source = { +val params = CaseInsensitiveMap(parameters) + +val tuplesPerSecond = params.get("tuplesPerSecond").map(_.toLong).getOrElse(1L) +if (tuplesPerSecond <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("tuplesPerSecond")}'. The option 'tuplesPerSecond' " + + "must be positive") +} + +val rampUpTimeSeconds = + params.get("rampUpTime").map(JavaUtils.timeStringAsSec(_)).getOrElse(0L) +if (rampUpTimeSeconds < 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("rampUpTime")}'. The option 'rampUpTime' " + + "must not be negative") +} + +val numPartitions = params.get("numPartitions").map(_.toInt).getOrElse( + sqlContext.sparkContext.defaultParallelism) +if (numPartitions <= 0) { + throw new IllegalArgumentException( +s"Invalid value '${params("numPartitions")}'. The option 'numPartitions' " + + "must be positive") +} + +new RateStreamSource( + sqlContext, + metadataPath, + tuplesPerSecond, + rampUpTimeSeconds, + numPartitions, + params.get("useManualClock").map(_.toBoolean).getOrElse(false) // Only for testing +) + } + override def shortName(): String = "rate" +} + +object RateSourceProvider { + val SCHEMA = +StructType(StructField("timestamp", TimestampType) :: StructField("value", LongType) :: Nil) + + val VERSION =