[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/19819 I've seen your PR: https://github.com/apache/spark/pull/20997, a good solution @gaborgsomogyi --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 Execute me ,has the concept of default value been introduce to schema in master branch? @gatorsmile thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20356: [SPARK-23185][SQL] Make the configuration "spark.default...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/20356 Think you very much for your review. I see the discussion, your pr and learn a lot. But I just want to solve the problem when execute "insert into ... values ...", which not involves in file source. May be we can solve this first which trouble my team for a long time? @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20356: [SPARK-23185][SQL] Make the configuration "spark....
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/20356 [SPARK-23185][SQL] Make the configuration "spark.default.parallelism" can be changed on each SQL session to decrease empty files ## What changes were proposed in this pull request? Make the configuration "spark.default.parallelism" can be changed on each SQL session to decrease empty files. When execute "insert into ... values ...", many empty files will be generated.We can change the configuration "spark.default.parallelism" to decrease the number of empty files.But there are many occasions that we want to chang the configuration during each session so as not to influence other sql sentences, like we may use thrift server to excute many sql sentences on a SQL session. ## How was this patch tested? unit tests, manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK-23185 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20356.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20356 commit 01af8ce69afeade8bb034c6965de0f3738f12fd5 Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application has been successfully created. commit b6daeec664d757999e257e56fed3844db51515e2 Author: lvdongr Date: 2017-03-11T06:35:57Z Merge remote-tracking branch 'apache/master' commit e0e47b1da93b90210e44abc6e90655d3028555ec Author: lvdongr Date: 2017-04-12T07:20:01Z Merge remote-tracking branch 'apache/master' commit f4ab88111c5b8e9700eacc1acfa3858aed45124e Author: lvdongr Date: 2017-07-27T01:54:56Z isklakldsng branch 'apache/master' commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb Author: lvdongr Date: 2017-07-27T12:09:47Z Merge remote-tracking branch 'apache/master' commit 0e1b7f6d8e436ca243f78e3cbf064f591557b6c0 Author: lvdongr Date: 2017-07-28T01:34:48Z Merge remote-tracking branch 'apache/master' commit 9a9972125ae8f7d90f5567f5b561f2c0ca16cfe7 Author: lvdongr Date: 2017-07-28T02:50:23Z refresh the master branch for kafkaconsumer commit 637900b576b8c4d9e04a808a078e481a99751d03 Author: lvdongr Date: 2017-07-31T03:08:29Z Merge remote-tracking branch 'apache/master' commit 04aafed076cb704a100eb7dc45b5cfda6438193b Author: lvdongr Date: 2017-08-17T11:41:58Z Merge remote-tracking branch 'apache/master' commit 9f90ab5356b74dfc63dc9c80ff336ef2c2847e72 Author: root Date: 2017-11-10T03:32:54Z Merge branch 'master' of https://github.com/apache/spark commit 8b94711b7fb6cfa72aa06d9e009b73b73ccda36f Author: root Date: 2017-11-13T00:56:22Z Merge branch 'master' of https://github.com/apache/spark commit 70699e3d80d853f7105d967544378c5c342d2ce6 Author: 10171592 Date: 2017-12-07T03:13:24Z Merge remote-tracking branch 'apache/master' commit 9e7c0c7d0f8bae30bc07abbedf4c110ec82f1cf3 Author: root Date: 2017-12-07T05:48:04Z Merge remote-tracking branch 'apache/master' commit 393730415bcebdef125364be3eb3a64320cac3c9 Author: root Date: 2018-01-09T03:16:36Z Merge branch 'master' of https://github.com/lvdongr/spark commit 5db407930d4802b6075036961688192a3039d95a Author: root Date: 2018-01-09T03:30:43Z Merge branch 'master' of https://github.com/apache/spark commit 46672ddaf53b9ed1e97e404753fa14bd3406821a Author: 10171592 Date: 2018-01-22T08:20:47Z Merge remote-tracking branch 'apache/master' commit 884eaee9f2d7782bceae73806da9b65f1119977e Author: 10171592 Date: 2018-01-22T09:30:14Z Merge branch 'master' of https://github.com/lvdongr/spark commit 49641920727f426e88ac32a9c1381f7876eaf7c9 Author: 10171592 Date: 2018-01-23T02:57:54Z Merge remote-tracking branch 'apache/master' commit e1aeff8c0cb1358d0c77b0e729ecdfd1a07313dc Author: lvdongr Date: 2018-01-23T03:18:28Z [SPARK-23185][SQL] Make the configuration "spark.default.parallelism" can be changed on each SQL session to decrease empty files --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/19819 Will the cached consumer to the same partition increase , when different tasks consume the same partition and no place to remove? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18987: [SPARK-21775][Core]Dynamic Log Level Settings for execut...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18987 ok. Thank you all the same for your review @srowen @jerryshao @ajbozarth . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18987: [SPARK-21775][Core]Dynamic Log Level Settings for...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/18987 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18987: [SPARK-21775][Core]Dynamic Log Level Settings for execut...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18987 The log level setting is a very useful function.Our team is doing a spark application and when we want to see the debug log, we have to restart the application every time. So we develop this function. The complexity lays on the ui display and set the log level. But we can choose not to show the setting Button on UI, and just give a api or restful interface for user to access this function.Then the change on spark is no much. And Storm (http://storm.apache.org/) has the same function,too. @srowen @jerryshao @ajbozarth --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18987: [SPARK-21775][Core]Dynamic Log Level Settings for...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/18987 [SPARK-21775][Core]Dynamic Log Level Settings for executors ## What changes were proposed in this pull request? Someimes we want to change the log level of executor when our application has already deployed, to see detail infomation or decrease the log items. Changing the log4j configure file is not convenient,so We add the ability to set log level settings for a running executor. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK-21775 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18987.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18987 commit 01af8ce69afeade8bb034c6965de0f3738f12fd5 Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application has been successfully created. commit b6daeec664d757999e257e56fed3844db51515e2 Author: lvdongr Date: 2017-03-11T06:35:57Z Merge remote-tracking branch 'apache/master' commit e0e47b1da93b90210e44abc6e90655d3028555ec Author: lvdongr Date: 2017-04-12T07:20:01Z Merge remote-tracking branch 'apache/master' commit f4ab88111c5b8e9700eacc1acfa3858aed45124e Author: lvdongr Date: 2017-07-27T01:54:56Z isklakldsng branch 'apache/master' commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb Author: lvdongr Date: 2017-07-27T12:09:47Z Merge remote-tracking branch 'apache/master' commit 0e1b7f6d8e436ca243f78e3cbf064f591557b6c0 Author: lvdongr Date: 2017-07-28T01:34:48Z Merge remote-tracking branch 'apache/master' commit 9a9972125ae8f7d90f5567f5b561f2c0ca16cfe7 Author: lvdongr Date: 2017-07-28T02:50:23Z refresh the master branch for kafkaconsumer commit 637900b576b8c4d9e04a808a078e481a99751d03 Author: lvdongr Date: 2017-07-31T03:08:29Z Merge remote-tracking branch 'apache/master' commit 04aafed076cb704a100eb7dc45b5cfda6438193b Author: lvdongr Date: 2017-08-17T11:41:58Z Merge remote-tracking branch 'apache/master' commit f88debcfccf9d1cd5c436321ff8cf444539dfd6c Author: lvdongr Date: 2017-08-18T03:21:50Z [SPARK-21775][Core]Dynamic Log Level Settings for executors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 ok, I will solve the problems left first, and hold this PR @gatorsmile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 You mean we can provide the different type of values with different default values? like int with 0 ,and string with "" ?Or we set the default values when define the table? @gatorsmile @maropu I set the default to Null ,because the "insert into ..." sentence in hive handle in this way, and I want to correspond with Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 You can see this picture,my table has three columns,and I insert only two columns, then the last column is null. @maropu @gatorsmile ![insertinto](https://user-images.githubusercontent.com/25652150/29109253-f9b852a8-7d14-11e7-9a9c-b6aa76314a04.PNG) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 The target of this pr is support to insert into specified columnsï¼ all columns is no need ï¼ like insert into t(a, c) values (1, 0.8) . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18756: [SPARK-21548][SQL] "Support insert into serial columns o...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/18756 Thank you for review, I will finish the tests as soon as possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18753: [SPARK-21548] [SQL] "Support insert into serial c...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/18753 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18756: [SPARK-21548][SQL] "Support insert into serial co...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/18756 [SPARK-21548][SQL] "Support insert into serial columns of table" ## What changes were proposed in this pull request? When we use the 'insert into ...' statement we can only insert all the columns into table.But int some cases,our table has many columns and we are only interest in some of them.So we want to support the statement "insert into table tbl (column1, column2,...) values (value1, value2, value3,...)". https://issues.apache.org/jira/browse/SPARK-21548 ## How was this patch tested? unit tests, integration tests, manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK-21548 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18756.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18756 commit 01af8ce69afeade8bb034c6965de0f3738f12fd5 Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application has been successfully created. commit b6daeec664d757999e257e56fed3844db51515e2 Author: lvdongr Date: 2017-03-11T06:35:57Z Merge remote-tracking branch 'apache/master' commit e0e47b1da93b90210e44abc6e90655d3028555ec Author: lvdongr Date: 2017-04-12T07:20:01Z Merge remote-tracking branch 'apache/master' commit f4ab88111c5b8e9700eacc1acfa3858aed45124e Author: lvdongr Date: 2017-07-27T01:54:56Z isklakldsng branch 'apache/master' commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb Author: lvdongr Date: 2017-07-27T12:09:47Z Merge remote-tracking branch 'apache/master' commit 2a40d64bcad6613892a54bc3052a634f59c14c65 Author: lvdongr Date: 2017-07-28T06:56:15Z [SPARK-21548][SQL]Support insert into serial columns of table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18751: [SPARK-21548][SQL]Support insert into serial colu...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/18751 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18753: [SPARK-21548] [SQL] Support insert into serial co...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/18753 [SPARK-21548] [SQL] Support insert into serial columns of table ## What changes were proposed in this pull request? When we use the 'insert into ...' statement we can only insert all the columns into table.But int some cases,our table has many columns and we are only interest in some of them.So we want to support the statement "insert into table tbl (column1, column2,...) values (value1, value2, value3,...)". ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK--21548 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18753.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18753 commit 01af8ce69afeade8bb034c6965de0f3738f12fd5 Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application has been successfully created. commit b6daeec664d757999e257e56fed3844db51515e2 Author: lvdongr Date: 2017-03-11T06:35:57Z Merge remote-tracking branch 'apache/master' commit e0e47b1da93b90210e44abc6e90655d3028555ec Author: lvdongr Date: 2017-04-12T07:20:01Z Merge remote-tracking branch 'apache/master' commit f4ab88111c5b8e9700eacc1acfa3858aed45124e Author: lvdongr Date: 2017-07-27T01:54:56Z isklakldsng branch 'apache/master' commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb Author: lvdongr Date: 2017-07-27T12:09:47Z Merge remote-tracking branch 'apache/master' commit da882ea569d451b3f2af550b0976a6a059900f6a Author: lvdongr Date: 2017-07-28T02:56:23Z [SPARK-21548][SQL]Support insert into serial columns of table commit a65be1605865a1159532ba148434d3bb207da64c Author: lvdongr Date: 2017-07-28T03:03:23Z refresh last commit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17203: [SPARK-19863][DStream] Whether or not use CachedK...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/17203 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18751: [SPARK-21548][SQL]Support insert into serial colu...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/18751 [SPARK-21548][SQL]Support insert into serial columns of table ## What changes were proposed in this pull request? When we use the 'insert into ...' statement we can only insert all the columns into table.But int some cases,our table has many columns and we are only interest in some of them.So we want to support the statement "insert into table tbl (column1, column2,...) values (value1, value2, value3,...)". https://issues.apache.org/jira/browse/SPARK-21548 ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark spark21548 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18751.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18751 commit 01af8ce69afeade8bb034c6965de0f3738f12fd5 Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application has been successfully created. commit b6daeec664d757999e257e56fed3844db51515e2 Author: lvdongr Date: 2017-03-11T06:35:57Z Merge remote-tracking branch 'apache/master' commit e0e47b1da93b90210e44abc6e90655d3028555ec Author: lvdongr Date: 2017-04-12T07:20:01Z Merge remote-tracking branch 'apache/master' commit f4ab88111c5b8e9700eacc1acfa3858aed45124e Author: lvdongr Date: 2017-07-27T01:54:56Z isklakldsng branch 'apache/master' commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb Author: lvdongr Date: 2017-07-27T12:09:47Z Merge remote-tracking branch 'apache/master' commit 0be180991d87a82d3075b6d63f28486799fc872d Author: lvdongr Date: 2017-07-27T13:25:24Z [SPARK-21548][SQL]Support insert into serial columns of table --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17620: [SPARK-20305][Spark Core]Master may keep in the s...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/17620 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17620: [SPARK-20305][Spark Core]Master may keep in the state of...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17620 You can see the main method in Master.scala. def main(argStrings: Array[String]) { Utils.initDaemon(log) val conf = new SparkConf val args = new MasterArguments(argStrings, conf) val (rpcEnv, _, _) = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, conf) rpcEnv.awaitTermination() } When the rpcEnv is shut down,the main method will finish,and Master process will stop as I test already. I choose this way ,because the onstop method will be called before stopping master.So the service in master will also be closed,such as webui,metrics,persistenceEngine. I think it will be safer. Thank you for your last reply @jerryshao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17620: [SPARK-20305][Spark Core]Master may keep in the state of...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17620 This happend at the time the previous master leader remove the died worker ,clear the worker's node on persistEngine(we use zookeeper),But before the worker node was removed from the zookeeper ,the leader changed. The new master leader recovery from the zookeeper ,and read the died worker node. Then the new leader find the worker died and trying to remove it ,and try to clear the node on zookeeper,but the node has been removed by the previous leader ,so an exception was throw, and the recovery fail. Then the leader will always in COMPLETING_RECOVERY state,and all the application registered cannot get resources . ![failfetchresource](https://cloud.githubusercontent.com/assets/25652150/25209181/f7e31528-25ab-11e7-9eb2-e2f15db2dcac.png) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17620: [SPARK-20305][Spark Core]Master may keep in the s...
Github user lvdongr commented on a diff in the pull request: https://github.com/apache/spark/pull/17620#discussion_r111732189 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -561,6 +561,11 @@ private[deploy] class Master( state = RecoveryState.ALIVE schedule() logInfo("Recovery complete - resuming operations!") + } catch { --- End diff -- Thank you very much, I've changed the commit, you can see if there are any other problems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17620: [SPARK-20305][Spark Core]Master may keep in the state of...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17620 Execute me, Can this issue be closed or threre are some other problem? @jerryshao --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17620: [SPARK-20305][Spark Core]Master may keep in the s...
Github user lvdongr commented on a diff in the pull request: https://github.com/apache/spark/pull/17620#discussion_r111337583 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -539,7 +539,7 @@ private[deploy] class Master( private def completeRecovery() { // Ensure "only-once" recovery semantics using a short synchronization period. -if (state != RecoveryState.RECOVERING) { return } +if (state != RecoveryState.RECOVERING && state != RecoveryState.COMPLETING_RECOVERY) { return } --- End diff -- It seems better to close the master as you say if an exception happened during recovery.So I change the last commit . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17620: [SPARK-20305][Spark Core]Master may keep in the s...
Github user lvdongr commented on a diff in the pull request: https://github.com/apache/spark/pull/17620#discussion_r111337249 --- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala --- @@ -539,7 +539,7 @@ private[deploy] class Master( private def completeRecovery() { // Ensure "only-once" recovery semantics using a short synchronization period. -if (state != RecoveryState.RECOVERING) { return } +if (state != RecoveryState.RECOVERING && state != RecoveryState.COMPLETING_RECOVERY) { return } --- End diff -- Thank you for your review and suggest. The last change cannot work as I tested today. I thought completeRecovery would be called again when some workers or drivers response to MasterChanged, then the master(state is RecoveryState.COMPLETING_RECOVERY) will have the chance to complete the completeRecovery method and change state to ALIVE . I tested , but find no called again after the exception(maybe workers or drivers already response to MasterChanged before the exception). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17620: [SPARK-20305][Spark Core]Master may keep in the s...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/17620 [SPARK-20305][Spark Core]Master may keep in the state of "COMPELETING⦠## What changes were proposed in this pull request? Master may keep in the state of "COMPELETING_RECOVERY",then all the application registered cannot get resources, when the leader master change. This happend when a exception was thrown during the Master trying to recovery(completeRecovery method in the master.scala ). Then the leader will always in COMPLETING_RECOVERY state ,for the leader can only change to alive from state of RecoveryState.RECOVERING. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK20305 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17620 commit 44b9415dd1c6ac854a9debddd67c9dcb00e8df69 Author: lvdongr Date: 2017-04-12T07:34:03Z [SPARK-20305][Spark Core]Master may keep in the state of "COMPELETING_RECOVERY",then all the application registered cannot get resources, when the leader master change. has been successfully created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17203: [SPARK-19863][DStream] Whether or not use CachedKafkaCon...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17203 You can see this issue ,and this is a problem of cached KafkaConsumer, https://issues.apache.org/jira/browse/SPARK-19185, and a commentator suggest the same method not to use cached kafka consumer. Besides ,only if users can choose different method,they can choose the best way for their own situation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17203: [SPARK-19863][DStream] Whether or not use CachedKafkaCon...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17203 In our case,we deploy a streaming application whose data source are 20 topics with 30 partitions in kafka cluster(3 brokers). Then the amount of connection with kafka is very large,up to a thousand, and the consumer will not got message from kafka sometimesï¼which may lead some jobs to fail. But when we replace the consumer with uncached ones, the number of connection decreased, then there were no jobs fail. We are still not sure if the large number of connection to kafka cause the job fail or not.But we test the result, and we want to use the uncached consumers for we can keep our streaming jobs running successfully first. So we think there are some occasions not to use the uncached consumer,and the developer can choose the way. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17203: [SPARK-19863][DStream] Whether or not use CachedK...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/17203 [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream t ## What changes were proposed in this pull request? Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream to connect the kafka in a Spark Streaming application. In Spark 2.x, the kafka consumer was replaced by CachedKafkaConsumer (some KafkaConsumer will keep establishing the kafka cluster), and cannot change the way. In fact ,The KafkaRDD(used by DirectKafkaInputDStream to connect kafka) provide the parameter useConsumerCache to choose Whether to use the CachedKafkaConsumer, but the DirectKafkaInputDStream always set the parameter true. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark SPARK-19863 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17203.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17203 commit 5d13e4e75845acabb9a11b0618669e9f51ba55fd Author: lvdongr Date: 2017-03-08T04:09:40Z [SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be configured, when you use DirectKafkaInputDStream t --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16879: [SPARK-19541][SQL] High Availability support for ...
Github user lvdongr closed the pull request at: https://github.com/apache/spark/pull/16879 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17010 Execuse me, may this issue be merged and closed ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17010: [SPARK-19673][SQL] "ThriftServer default app name is cha...
Github user lvdongr commented on the issue: https://github.com/apache/spark/pull/17010 Before spark1.4.x, the ThriftServer name is "SparkSQL:localhostname",while https://issues.apache.org/jira/browse/SPARK-8650 change the rule as a side effect. Then the ThriftServer show the class name of HiveThriftServer2, which is not appropriate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #17010: [SPARK-19673][SQL] "ThriftServer default app name...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/17010 [SPARK-19673][SQL] "ThriftServer default app name is changed wrong" ## What changes were proposed in this pull request? In spark 1.x ,the name of ThriftServer is SparkSQL:localHostName. While the ThriftServer default name is changed to the className of HiveThfift2 , which is not appropriate. ## How was this patch tested? manual tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark ThriftserverName Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/17010.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #17010 commit c4a02bca4594ca10473050a85165b4bf96a4ba4e Author: lvdongr Date: 2017-02-21T04:37:12Z [SPARK-19673][SQL] "ThriftServer default app name is changed wrong" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16879: [SPARK-19541][SQL] High Availability support for ...
GitHub user lvdongr opened a pull request: https://github.com/apache/spark/pull/16879 [SPARK-19541][SQL] High Availability support for ThriftServer JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19541 ## What changes were proposed in this pull request? Currently, We use the spark ThriftServer frequently, and there are many connects between the client and only ThriftServer.When the ThriftServer is down ,we cannot get the service again.So we need to consider the ThriftServer HA as well as master HA. For ThriftServer, we want to import the pattern of HiveServer HA to provide ThriftServer HA. Therefore, we need to start multiple thrift server which register on the zookeeper. Then the client can find the thrift server by just connecting to the zookeeper.So the beeline can get the service from other thrift server when one is down. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/lvdongr/spark spark-issue Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16879.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16879 commit cf4b67f20922c3df494ad68db75c4ace18494116 Author: lvdongr Date: 2017-02-10T01:46:51Z [SPARK-19541] - High Availability support for ThriftServer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org