[GitHub] spark pull request #22884: [SPARK-23429][CORE][FOLLOWUP] MetricGetter should...

2018-10-29 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22884 [SPARK-23429][CORE][FOLLOWUP] MetricGetter should rename to ExecutorMetricType in comments ## What changes were proposed in this pull request? MetricGetter should rename

[GitHub] spark pull request #22874: [WIP][SPARK-25865][CORE] Add GC information to Ex...

2018-10-29 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22874 [WIP][SPARK-25865][CORE] Add GC information to ExecutorMetrics ## What changes were proposed in this pull request? This PR is opened on top of the PR for #22612 since it import

[GitHub] spark pull request #22612: [SPARK-24958] Add executors' process tree total m...

2018-10-29 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22612#discussion_r228830146 --- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala --- @@ -394,9 +394,15 @@ private[spark] object JsonProtocol

[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...

2018-10-10 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22678#discussion_r224309582 --- Diff: dev/run-tests-jenkins.py --- @@ -39,7 +39,8 @@ def print_err(msg): def post_message_to_github(msg, ghprb_pull_id): print

[GitHub] spark issue #22678: [SPARK-25685][BUILD] Allow running tests in Jenkins in e...

2018-10-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22678 Sorry for closing the conversation mistakenly @dongjoon-hyun . I will update the documentation soon. --- - To unsubscribe, e

[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...

2018-10-10 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22678#discussion_r223966833 --- Diff: dev/run-tests-jenkins.py --- @@ -176,7 +177,8 @@ def main(): build_display_name = os.environ["BUILD_DISPLAY_NAME"]

[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow regression testing in ...

2018-10-09 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22678 [SPARK-25685][BUILD] Allow regression testing in enterprise Jenkins ## What changes were proposed in this pull request? Add some environment variables to allow regression testing

[GitHub] spark issue #22595: [SPARK-25577][Web UI] Add an on-off switch to display th...

2018-10-06 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22595 @srowen The checkbox is what I add in this PR to display/hidden the columns which have been hidden always. These columns are on heap memory, off heap memory. If we want to display them

[GitHub] spark issue #22595: [SPARK-25577][Web UI] Add an on-off switch to display th...

2018-10-01 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22595 cc @dongjoon-hyun @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22595: [SPARK-25577][Web UI] Add an on-off switch to display th...

2018-09-30 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22595 If this PR could be merged, #22578 could be added as an additional column as well. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22595: [SPARK-25577][Web UI] Add an on-off switch to display th...

2018-09-30 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22595 Gently ping @jerryshao @cloud-fan . Do you have a chance to review? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22595: [SPARK-25577][Web UI] Add an on-off switch to dis...

2018-09-30 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22595 [SPARK-25577][Web UI] Add an on-off switch to display the executor additional columns ## What changes were proposed in this pull request? [SPARK-17019](https://issues.apache.org/jira

[GitHub] spark pull request #22578: [SPARK-25564][CORE] Add output bytes metrics for ...

2018-09-28 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22578 [SPARK-25564][CORE] Add output bytes metrics for each Executor ## What changes were proposed in this pull request? LiveExecutor only statistics the total input bytes. And total output

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-21 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Gently ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-18 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 @cloud-fan I refactor and remove the function outputPath in ```DataWritingCommand```. Besides the unit test you could see, in my local, I added below test in ```HiveQuerySuite.scala

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Using pattern matching will face a problem. ```InsertIntoHiveDirCommand```,```CreateHiveTableAsSelectCommand``` and ```InsertIntoHiveTable``` are all in spark-hive module. SparkPlanInfo could

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Agree that. Since this field is important to us. Could I refactor it following your advice and file a discussion in another Jira

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Most of the information we wanted could be analyzed out from event log except some metrics in Executor side which doesn't heartbeat to Driver, e.g RPC count with NameNode. Another case is #21221

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 If almost implementations need to add to case statment, partten matching each implementations seems weird and easy to causes missing when adds a new implementation in future

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Isn't common? I am afraid not only one InsertIntoHadoopFsRelation need to added in case statment. --- - To unsubscribe, e

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Gently ping @dongjoon-hyun @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #22411: [SPARK-25421][SQL] Abstract an output path field ...

2018-09-13 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22411#discussion_r217584439 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala --- @@ -18,6 +18,7 @@ package org.apache.spark.sql.execution

[GitHub] spark pull request #22411: [SPARK-25421][SQL] Abstract an output path field ...

2018-09-13 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22411#discussion_r217584359 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -440,7 +440,7 @@ case class DataSource

[GitHub] spark issue #22411: [SPARK-25421][SQL] Abstract an output path field in trai...

2018-09-13 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22411 Gently ping @cloud-fan @dongjoon-hyun , would you please help to review? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22411: [SPARK-25421][SQL] Abstract an output path field ...

2018-09-13 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22411 [SPARK-25421][SQL] Abstract an output path field in trait DataWritingCommand ## What changes were proposed in this pull request? #22353 import a metadata field in ```SparkPlanInfo

[GitHub] spark pull request #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo ...

2018-09-12 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22353#discussion_r217229063 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala --- @@ -59,6 +57,12 @@ private[execution] object SparkPlanInfo

[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...

2018-09-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 ping @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...

2018-09-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 Thank you @cloud-fan for your reminding. We’ve handled the drop message case. Agree, I will update a commit tomorrow

[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...

2018-09-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 Spark driver log is always distributed on various client nodes and depends on the log4j configs. In a big company, it's hard to collect them all and I think it's better to used for debug

[GitHub] spark issue #22353: [SPARK-25357][SQL] Add metadata to SparkPlanInfo to dump...

2018-09-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 > Although event log is in JSON format, it's mostly for internal usage, to be load by history server and used to build the Spark UI. AFAIK, there are more and more projects replay event

[GitHub] spark issue #22353: [SPARK-25357][SQL] Abbreviated simpleString in DataSourc...

2018-09-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 The purpose is logging meta info like file input file path to event log. So I revert the changes about simpleString and add back the metadata to SparkPlanInfo interface. This change will log

[GitHub] spark issue #22353: [SPARK-25357][SQL] Abbreviated simpleString in DataSourc...

2018-09-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 Thanks @dongjoon-hyun . That would be a problem. Seems setting to 200 or 500 are cause a limited regression on hover text. Hard code to 500 shows: https://user

[GitHub] spark pull request #22353: [SPARK-25357][SQL] Abbreviated simpleString in Da...

2018-09-08 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22353#discussion_r216122293 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -54,7 +54,7 @@ trait DataSourceScanExec extends

[GitHub] spark pull request #22353: [SPARK-25357][SQL] Abbreviated simpleString in Da...

2018-09-08 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22353#discussion_r216122273 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -54,7 +54,7 @@ trait DataSourceScanExec extends

[GitHub] spark pull request #22353: [SPARK-25357][SQL] Abbreviated simpleString in Da...

2018-09-07 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22353#discussion_r216121128 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -54,7 +54,7 @@ trait DataSourceScanExec extends

[GitHub] spark issue #22353: [SPARK-25357][SQL] Abbreviated simpleString in DataSourc...

2018-09-07 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 A scenario here is after an application completed, there is no way to know the intact file path of File Scan Exec if the path width is longer than 100 chars

[GitHub] spark issue #22353: [SPARK-25357][SQL] Abbreviated simpleString in DataSourc...

2018-09-07 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 @wangyum @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20876: [SPARK-23653][SQL] Capture sql statements user in...

2018-09-06 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/20876 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22353: [SPARK-25357][SQL] Abbreviated metadata in DataSourceSca...

2018-09-06 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22353 @cloud-fan @gatorsmile @dongjoon-hyun , kindly help to review. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22353: [SPARK-25357][SQL] Abbreviated metadata in DataSo...

2018-09-06 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22353 [SPARK-25357][SQL] Abbreviated metadata in DataSourceScanExec results in incomplete information in event log ## What changes were proposed in this pull request? Field metadata removed

[GitHub] spark pull request #22077: [SPARK-25084][SQL][BACKPORT-2.3] "distribute by" ...

2018-08-12 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/22077 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22077: [SPARK-25084][SQL][BACKPORT-2.3] "distribute by" on mult...

2018-08-12 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22077 Thanks @wangyum for triggerring test again and again. Now all tests passed, cc @cloud-fan @gatorsmile @jerryshao

[GitHub] spark issue #22077: [SPARK-25084][SQL][BACKPORT-2.3] "distribute by" on mult...

2018-08-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22077 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22077: [SPARK-25084][SQL][BACKPORT-2.3] "distribute by" on mult...

2018-08-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22077 Seems the fails are not related to here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #22077: [SPARK-25084][SQL][BACKPORT-2.3] "distribute by" on mult...

2018-08-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22077 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22066 Thank you @yucai . New PR #22077 for branch-2.3. Cc: @cloud-fan @jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22077: [SPARK-25084][SQL] "distribute by" on multiple columns (...

2018-08-11 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22077 Thanks @yucai , please review this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #22077: [SPARK-25084][SQL] "distribute by" on multiple co...

2018-08-11 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22077 [SPARK-25084][SQL] "distribute by" on multiple columns (wrap in brack… …ets) may lead to codegen issue (branch-2.3) ## What changes were proposed in this pu

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 Seems #22066 has changed the implementation with a similar approach. I will close this one. --- - To unsubscribe, e-mail

[GitHub] spark pull request #22067: [SPARK-25084][SQL] distribute by on multiple colu...

2018-08-10 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/22067 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns m...

2018-08-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22066 Since you refactor your code copying from #22067 . Would you mind just use that? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 Add unit test with a rand() column in 'distribute by' --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 @jerryshao Could you help to trigger test build please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22067 @cloud-fan @jerryshao --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22066: [WIP][SPARK-25084][SQL] "distribute by" on multiple colu...

2018-08-10 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22066 I offer other fix way. #22067 It doesn't need "input" as a global variable (If distribute by random) --- - To u

[GitHub] spark pull request #22067: [SPARK-25084][SQL] distribute by on multiple colu...

2018-08-10 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22067 [SPARK-25084][SQL] distribute by on multiple columns may lead to code… …gen issue ## What changes were proposed in this pull request? "distribute by" on multiple c

[GitHub] spark pull request #22034: [SPARK-25054][CORE] Enable MetricsServlet sink fo...

2018-08-09 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/22034 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22034: [SPARK-25054][CORE] Enable MetricsServlet sink for Execu...

2018-08-09 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/22034 Thanks @jerryshao. Close it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22034: [SPARK-25054][CORE] Enable MetricsServlet sink fo...

2018-08-08 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/22034#discussion_r208784817 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -169,6 +171,19 @@ private[spark] class Executor

[GitHub] spark pull request #22034: [SPARK-25054][CORE] Enable MetricsServlet sink fo...

2018-08-07 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/22034 [SPARK-25054][CORE] Enable MetricsServlet sink for Executor ## What changes were proposed in this pull request? The MetricsServlet sink is added by default as a sink in the master

[GitHub] spark pull request #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to co...

2018-07-18 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/21734#discussion_r203584220 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala --- @@ -193,8 +193,7 @@ object

[GitHub] spark issue #20876: [SPARK-23653][SQL] Capture sql statements user input and...

2018-06-25 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20876 Could anyone else attend to review this? Or should it be closed? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20574: [SPARK-23385][CORE] Allow SparkUITab to be customized ad...

2018-06-25 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20574 Hi @abellina, I think all the tabs were configured this way couldn't go through in community. Even opening an interface to add customized tabs, @srowen thinks it isn't worth to do

[GitHub] spark pull request #21396: [SPARK-24349][SQL] Ignore setting token if using ...

2018-05-22 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/21396 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21396: [SPARK-24349][SQL] Ignore setting token if using ...

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/21396#discussion_r190113497 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -85,7 +85,10 @@ private[spark] class

[GitHub] spark pull request #21396: [SPARK-24349][SQL] Ignore setting token if using ...

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/21396#discussion_r190112872 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -85,7 +85,10 @@ private[spark] class

[GitHub] spark issue #21396: [SPARK-24349][SQL] Ignore setting token if using JDBC

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21396 In our current settings, when we onboard a new cluster, the default is connect to DB directly, it's much simpler than access metastore. And we are going to update to access metastore by default

[GitHub] spark pull request #21396: [SPARK-24349][SQL] Ignore setting token if using ...

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/21396#discussion_r190110376 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -85,7 +85,10 @@ private[spark] class

[GitHub] spark issue #21396: [SPARK-24349][SQL] Ignore setting token if using JDBC

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21396 Also, why still needs #20784 or #21343 to extends to #17335 may be caused by: 1. Some DDL operation in local mode is much faster than launching a AM in yarn. 2. Nodes in YARN cluster have

[GitHub] spark issue #21396: [SPARK-24349][SQL] Ignore setting token if using JDBC

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21396 @jerryshao Simply speaking, in a security environment, if we use JDBC to connect to mysql directly instead of accessing hive metastore, current implementation blocks job execution

[GitHub] spark issue #21396: [SPARK-24349][SQL] Ignore setting token if using JDBC

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21396 [#20784](https://github.com/apache/spark/pull/20784) and [#21343](https://github.com/apache/spark/pull/21343) did the same thing, but #21343 is much readable. They are all to fix the problem

[GitHub] spark pull request #21396: [SPARK-24349][SQL] Ignore setting token if using ...

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/21396#discussion_r190100259 --- Diff: core/src/main/scala/org/apache/spark/deploy/security/HiveDelegationTokenProvider.scala --- @@ -85,7 +85,10 @@ private[spark] class

[GitHub] spark issue #21396: [SPARK-24349][SQL] Ignore setting token if using JDBC

2018-05-22 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21396 Hi @vanzin @jerryshao , could you help to review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #21396: [SPARK-24349][SQL] Ignore setting token if using ...

2018-05-22 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/21396 [SPARK-24349][SQL] Ignore setting token if using JDBC ## What changes were proposed in this pull request? In [SPARK-23639](https://issues.apache.org/jira/browse/SPARK-23639), use

[GitHub] spark pull request #21343: [SPARK-24292][SQL] Proxy user cannot connect to H...

2018-05-21 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/21343 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...

2018-05-21 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21343 Cool, SPARK-23639 also works for me. Close. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...

2018-05-21 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21343 @vanzin Seems duplicated. Let me check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...

2018-05-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21343 @gatorsmile @cloud-fan Could you help to review this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...

2018-05-17 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21343 Now the test case succeeds. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...

2018-05-16 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21343 Sorry, the test case still failed. Will change it soon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21343: [SPARK-24292][SQL] Proxy user cannot connect to HiveMeta...

2018-05-16 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/21343 @jerryshao @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21343: [SPARK-24292][SQL] Proxy user cannot connect to H...

2018-05-16 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/21343 [SPARK-24292][SQL] Proxy user cannot connect to HiveMetastore in loca… …l mode ## What changes were proposed in this pull request? [#17335](https://github.com/apache/spark

[GitHub] spark issue #19293: [SPARK-22079][SQL] Serializer in HiveOutputWriter miss l...

2018-05-15 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/19293 Fixed in [#19795](https://github.com/apache/spark/pull/19795), close this. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19293: [SPARK-22079][SQL] Serializer in HiveOutputWriter...

2018-05-15 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/19293 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20876: [SPARK-23653][SQL] Capture sql statements user input and...

2018-03-29 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20876 Hi, @jerryshao @cloud-fan, may I have some update? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #20873: [SPARK-22744][CORE] Cannot get the submit hostname of ap...

2018-03-23 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20873 Close it while found a work around way. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20873: [SPARK-22744][CORE] Cannot get the submit hostnam...

2018-03-23 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/20873 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20873: [SPARK-22744][CORE] Cannot get the submit hostnam...

2018-03-23 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/20873#discussion_r176647523 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -776,6 +776,9 @@ object SparkSubmit extends CommandLineUtils with Logging

[GitHub] spark pull request #20873: [SPARK-22744][CORE] Cannot get the submit hostnam...

2018-03-23 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/20873#discussion_r176647022 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -776,6 +776,9 @@ object SparkSubmit extends CommandLineUtils with Logging

[GitHub] spark issue #20876: [SPARK-23653][SQL] Capture sql statements user input and...

2018-03-22 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20876 In https://github.com/apache/spark/pull/20803, the implementation is to bind sql text to DF. That's not good and will introduce many unexpected issues. I open this PR with new implementation

[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI

2018-03-22 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20803 Hi @jerryshao @cloud-fan @dongjoon-hyun, I would like to close this PR and open another one https://github.com/apache/spark/pull/20876, would you please move

[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...

2018-03-22 Thread LantaoJin
Github user LantaoJin closed the pull request at: https://github.com/apache/spark/pull/20803 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20876: [SPARK-23653][SQL] Capture sql statements user in...

2018-03-22 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/20876 [SPARK-23653][SQL] Capture sql statements user input and show them in… … SQL UI ## What changes were proposed in this pull request? [SPARK-4871](https://issues.apache.org

[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...

2018-03-22 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/20803#discussion_r176320495 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala --- @@ -635,7 +637,8 @@ class SparkSession private( * @since 2.0.0

[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI

2018-03-21 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20803 ![screen shot 2018-03-21 at 23 22 07](https://user-images.githubusercontent.com/1853780/37718931-ceb341c6-2d5e-11e8-8f41-4f53a7d83d99.png

[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI

2018-03-21 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20803 I have decoupled the sqlText with sql execution. In current implementation, when user invoke spark.sql(xx), it will create a new SparkListenerSQLTextCaptured event to listenerbus

[GitHub] spark pull request #20803: [SPARK-23653][SQL] Show sql statement in spark SQ...

2018-03-21 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request: https://github.com/apache/spark/pull/20803#discussion_r176102381 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -166,20 +168,28 @@ private[sql] object Dataset { class Dataset[T] private

[GitHub] spark issue #20873: [SPARK-22744][CORE] Cannot get the submit hostname of ap...

2018-03-21 Thread LantaoJin
Github user LantaoJin commented on the issue: https://github.com/apache/spark/pull/20873 @rxin @jerryshao @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20873: [SPARK-22744][CORE] Cannot get the submit hostnam...

2018-03-21 Thread LantaoJin
GitHub user LantaoJin opened a pull request: https://github.com/apache/spark/pull/20873 [SPARK-22744][CORE] Cannot get the submit hostname of application ## What changes were proposed in this pull request? In MapReduce, we can get the submit hostname via checking the value

  1   2   >