[jira] [Commented] (SPARK-13775) history server sort by completed time by default
[ https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187820#comment-15187820 ] Thomas Graves commented on SPARK-13775: --- Note those were basically rhetorical questions. You are probably right I should have waited. I thought about it but decided to merge anyway since it didn't hurt anything. > history server sort by completed time by default > > > Key: SPARK-13775 > URL: https://issues.apache.org/jira/browse/SPARK-13775 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Priority: Trivial > > The new history server ui using datatables sorts by application Id. Lets > change it to sort by completed time like it did with the old table format. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13775) history server sort by completed time by default
[ https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187804#comment-15187804 ] Thomas Graves commented on SPARK-13775: --- why does it really matter? Did the version I merged harm anything? > history server sort by completed time by default > > > Key: SPARK-13775 > URL: https://issues.apache.org/jira/browse/SPARK-13775 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Priority: Trivial > > The new history server ui using datatables sorts by application Id. Lets > change it to sort by completed time like it did with the old table format. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13642) Properly handle signal kill of ApplicationMaster
[ https://issues.apache.org/jira/browse/SPARK-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13642: -- Assignee: Saisai Shao > Properly handle signal kill of ApplicationMaster > > > Key: SPARK-13642 > URL: https://issues.apache.org/jira/browse/SPARK-13642 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: Saisai Shao >Assignee: Saisai Shao > > Currently when running Spark on Yarn with yarn cluster mode, the default > application final state is "SUCCEED", if any exception is occurred, this > final state will be changed to "FAILED" and trigger the reattempt if > possible. > This is OK in normal case, but if there's a race condition when AM received a > signal (SIGTERM) and no any exception is occurred. In this situation, > shutdown hook will be invoked and marked this application as finished with > success, and there's no another attempt. > In such situation, actually from Spark's aspect this application is failed > and need another attempt, but from Yarn's aspect the application is finished > with success. > This could happened in NM failure situation, the failure of NM will send > SIGTERM to AM, AM should mark this attempt as failure and rerun again, not > invoke unregister. > So to increase the chance of this race condition, here is the reproduced code: > {code} > val sc = ... > Thread.sleep(3L) > sc.parallelize(1 to 100).collect() > {code} > If the AM is failed in sleeping, there's no exception been thrown, so from > Yarn's point this application is finished successfully, but from Spark's > point, this application should be reattempted. > The log normally like this: > {noformat} > 16/03/03 12:44:19 INFO ContainerManagementProtocolProxy: Opening proxy : > 192.168.0.105:45454 > 16/03/03 12:44:21 INFO YarnClusterSchedulerBackend: Registered executor > NettyRpcEndpointRef(null) (192.168.0.105:57461) with ID 2 > 16/03/03 12:44:21 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.0.105:57462 with 511.1 MB RAM, BlockManagerId(2, 192.168.0.105, 57462) > 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: Registered executor > NettyRpcEndpointRef(null) (192.168.0.105:57467) with ID 1 > 16/03/03 12:44:23 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.0.105:57468 with 511.1 MB RAM, BlockManagerId(1, 192.168.0.105, 57468) > 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 > 16/03/03 12:44:23 INFO YarnClusterScheduler: > YarnClusterScheduler.postStartHook done > 16/03/03 12:44:39 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/03/03 12:44:39 INFO SparkContext: Invoking stop() from shutdown hook > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/metrics/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/api,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/static,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors/threadDump,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/environment/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/environment,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage/rdd,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/pool/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/pool,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/stage/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/stage,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletC
[jira] [Commented] (SPARK-13775) history server sort by completed time by default
[ https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187707#comment-15187707 ] Thomas Graves commented on SPARK-13775: --- As I stated in the PR, I merged that in because it was ready and better then what was there before. > history server sort by completed time by default > > > Key: SPARK-13775 > URL: https://issues.apache.org/jira/browse/SPARK-13775 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Priority: Trivial > > The new history server ui using datatables sorts by application Id. Lets > change it to sort by completed time like it did with the old table format. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13775) history server sort by completed time by default
Thomas Graves created SPARK-13775: - Summary: history server sort by completed time by default Key: SPARK-13775 URL: https://issues.apache.org/jira/browse/SPARK-13775 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.0.0 Reporter: Thomas Graves The new history server ui using datatables sorts by application Id. Lets change it to sort by completed time like it did with the old table format. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13723) YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true
[ https://issues.apache.org/jira/browse/SPARK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187424#comment-15187424 ] Thomas Graves commented on SPARK-13723: --- Warnings going by when spark submit is started are really pretty useless unless user is explicitly looking for something. Way to many things get printed there for them to notice. spark-submit -help doesn't list the behavior of num-executors now when this is on. This is probably separate bug. If its already mis-understood, which I know it is because I've had to explain to multiple people, then I don't see an argument for not changing the behavior. It really comes down to what would be the best experience for users. If we have arguments one way or another then I could be swayed. I also think its a bit confusing to look at the configs and see that dynamic allocation config is on but its not using it because --num-executors is specified. One reason to not change this is if we think Spark isn't ready. For instance spark has some know issues with scalability and so with dynamic allocations users could be getting thousands of executors vs a few or 10's and we could hit spark internal issues or require more memory for the AM by default. If that makes user experience worse that would be a reason not to do it. > YARN - Change behavior of --num-executors when > spark.dynamicAllocation.enabled true > --- > > Key: SPARK-13723 > URL: https://issues.apache.org/jira/browse/SPARK-13723 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Priority: Minor > > I think we should change the behavior when --num-executors is specified when > dynamic allocation is enabled. Currently if --num-executors is specified > dynamic allocation is disabled and it just uses a static number of executors. > I would rather see the default behavior changed in the 2.x line. If dynamic > allocation config is on then num-executors goes to max and initial # of > executors. I think this would allow users to easily cap their usage and would > still allow it to free up executors. It would also allow users doing ML start > out with a # of executors and if they are actually caching the data the > executors wouldn't be freed up. So you would get very similar behavior to if > dynamic allocation was off. > Part of the reason for this is when using a static number if generally wastes > resources, especially with people doing adhoc things with spark-shell. It > also has a big affect when people are doing MapReduce/ETL type work loads. > The problem is that people are used to specifying num-executors so if we turn > it on by default in a cluster config its just overridden. > We should also update the spark-submit --help description for --num-executors -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187141#comment-15187141 ] Thomas Graves commented on SPARK-3374: -- It seems this work is also being done under https://issues.apache.org/jira/browse/SPARK-12343 which has a pull request up for it already. > Spark on Yarn remove deprecated configs for 2.0 > --- > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Boyang Jerry Peng > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-3374: - Assignee: Boyang Jerry Peng > Spark on Yarn remove deprecated configs for 2.0 > --- > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Boyang Jerry Peng > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13675) The url link in historypage is not correct for application running in yarn cluster mode
[ https://issues.apache.org/jira/browse/SPARK-13675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13675. --- Resolution: Fixed Assignee: Saisai Shao Fix Version/s: 2.0.0 > The url link in historypage is not correct for application running in yarn > cluster mode > --- > > Key: SPARK-13675 > URL: https://issues.apache.org/jira/browse/SPARK-13675 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Saisai Shao >Assignee: Saisai Shao > Fix For: 2.0.0 > > Attachments: Screen Shot 2016-02-29 at 3.57.32 PM.png > > > Current URL for each application to access history UI is like: > http://localhost:18080/history/application_1457058760338_0016/1/jobs/ or > http://localhost:18080/history/application_1457058760338_0016/2/jobs/ > Here *1* or *2* represents the number of attempts in {{historypage.js}}, but > it will parse to attempt id in {{HistoryServer}}, while the correct attempt > id should be like "appattempt_1457058760338_0016_02", so it will fail to > parse to a correct attempt id in {{HistoryServer}}. > This is OK in yarn client mode, since we don't need this attempt id to fetch > out the app cache, but it is failed in yarn cluster mode, where attempt id > "1" or "2" is actually wrong. > So here we should fix this url to parse the correct application id and > attempt id. > This bug is newly introduced in SPARK-10873, there's no issue in branch 1.6. > Here is the screenshot: > !https://issues.apache.org/jira/secure/attachment/12791437/Screen%20Shot%202016-02-29%20at%203.57.32%20PM.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13723) YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true
Thomas Graves created SPARK-13723: - Summary: YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true Key: SPARK-13723 URL: https://issues.apache.org/jira/browse/SPARK-13723 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 2.0.0 Reporter: Thomas Graves I think we should change the behavior when --num-executors is specified when dynamic allocation is enabled. Currently if --num-executors is specified dynamic allocation is disabled and it just uses a static number of executors. I would rather see the default behavior changed in the 2.x line. If dynamic allocation config is on then num-executors goes to max and initial # of executors. I think this would allow users to easily cap their usage and would still allow it to free up executors. It would also allow users doing ML start out with a # of executors and if they are actually caching the data the executors wouldn't be freed up. So you would get very similar behavior to if dynamic allocation was off. Part of the reason for this is when using a static number if generally wastes resources, especially with people doing adhoc things with spark-shell. It also has a big affect when people are doing MapReduce/ETL type work loads. The problem is that people are used to specifying num-executors so if we turn it on by default in a cluster config its just overridden. We should also update the spark-submit --help description for --num-executors -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13723) YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true
[ https://issues.apache.org/jira/browse/SPARK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183272#comment-15183272 ] Thomas Graves commented on SPARK-13723: --- see some discussion on https://github.com/apache/spark/pull/11528 > YARN - Change behavior of --num-executors when > spark.dynamicAllocation.enabled true > --- > > Key: SPARK-13723 > URL: https://issues.apache.org/jira/browse/SPARK-13723 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.0.0 >Reporter: Thomas Graves > > I think we should change the behavior when --num-executors is specified when > dynamic allocation is enabled. Currently if --num-executors is specified > dynamic allocation is disabled and it just uses a static number of executors. > I would rather see the default behavior changed in the 2.x line. If dynamic > allocation config is on then num-executors goes to max and initial # of > executors. I think this would allow users to easily cap their usage and would > still allow it to free up executors. It would also allow users doing ML start > out with a # of executors and if they are actually caching the data the > executors wouldn't be freed up. So you would get very similar behavior to if > dynamic allocation was off. > Part of the reason for this is when using a static number if generally wastes > resources, especially with people doing adhoc things with spark-shell. It > also has a big affect when people are doing MapReduce/ETL type work loads. > The problem is that people are used to specifying num-executors so if we turn > it on by default in a cluster config its just overridden. > We should also update the spark-submit --help description for --num-executors -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13459) Separate Alive and Dead Executors in Executor Totals Table
[ https://issues.apache.org/jira/browse/SPARK-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13459. --- Resolution: Fixed Fix Version/s: 2.0.0 > Separate Alive and Dead Executors in Executor Totals Table > -- > > Key: SPARK-13459 > URL: https://issues.apache.org/jira/browse/SPARK-13459 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Alex Bozarth >Priority: Minor > Fix For: 2.0.0 > > > Now that dead executors are shown in the executors table (SPARK-7729) the > totals table added in SPARK-12716 should be updated to include the separate > totals for alive and dead executors as well as the current total. > (This improvement was originally discussed in the PR for SPARK-12716 while > SPARK-7729 was still in progress.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13459) Separate Alive and Dead Executors in Executor Totals Table
[ https://issues.apache.org/jira/browse/SPARK-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13459: -- Assignee: Alex Bozarth > Separate Alive and Dead Executors in Executor Totals Table > -- > > Key: SPARK-13459 > URL: https://issues.apache.org/jira/browse/SPARK-13459 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Alex Bozarth >Priority: Minor > > Now that dead executors are shown in the executors table (SPARK-7729) the > totals table added in SPARK-12716 should be updated to include the separate > totals for alive and dead executors as well as the current total. > (This improvement was originally discussed in the PR for SPARK-12716 while > SPARK-7729 was still in progress.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180027#comment-15180027 ] Thomas Graves commented on SPARK-3374: -- [~srowen] can you add [~jerrypeng] as a contributor so he can assign himself to jira? > Spark on Yarn remove deprecated configs for 2.0 > --- > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13642) Inconsistent finishing state between driver and AM
[ https://issues.apache.org/jira/browse/SPARK-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177905#comment-15177905 ] Thomas Graves commented on SPARK-13642: --- The problem is that you really don't want the opposite to happen... ie a success attempt to be marked as failed because then it will be retried and you could mess up good data. // We report success to avoid // retrying applications that have succeeded (System.exit(0)), which means that // applications that explicitly exit with a non-zero status will also show up as // succeeded in the RM UI. In your above case stop() was called on the SparkContext and we assume stop means the application ran til finish thus was successful from YARNs point of view. I think we would need a better way for the YARN side to really know what happened on the driver side. > Inconsistent finishing state between driver and AM > --- > > Key: SPARK-13642 > URL: https://issues.apache.org/jira/browse/SPARK-13642 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: Saisai Shao > > Currently when running Spark on Yarn with yarn cluster mode, the default > application final state is "SUCCEED", if any exception is occurred, this > final state will be changed to "FAILED" and trigger the reattempt if > possible. > This is OK in normal case, but if there's a race condition when AM received a > signal (SIGTERM) and no any exception is occurred. In this situation, > shutdown hook will be invoked and marked this application as finished with > success, and there's no another attempt. > In such situation, actually from Spark's aspect this application is failed > and need another attempt, but from Yarn's aspect the application is finished > with success. > This could happened in NM failure situation, the failure of NM will send > SIGTERM to AM, AM should mark this attempt as failure and rerun again, not > invoke unregister. > So to increase the chance of this race condition, here is the reproduced code: > {code} > val sc = ... > Thread.sleep(3L) > sc.parallelize(1 to 100).collect() > {code} > If the AM is failed in sleeping, there's no exception been thrown, so from > Yarn's point this application is finished successfully, but from Spark's > point, this application should be reattempted. > The log normally like this: > {noformat} > 16/03/03 12:44:19 INFO ContainerManagementProtocolProxy: Opening proxy : > 192.168.0.105:45454 > 16/03/03 12:44:21 INFO YarnClusterSchedulerBackend: Registered executor > NettyRpcEndpointRef(null) (192.168.0.105:57461) with ID 2 > 16/03/03 12:44:21 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.0.105:57462 with 511.1 MB RAM, BlockManagerId(2, 192.168.0.105, 57462) > 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: Registered executor > NettyRpcEndpointRef(null) (192.168.0.105:57467) with ID 1 > 16/03/03 12:44:23 INFO BlockManagerMasterEndpoint: Registering block manager > 192.168.0.105:57468 with 511.1 MB RAM, BlockManagerId(1, 192.168.0.105, 57468) > 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready > for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 > 16/03/03 12:44:23 INFO YarnClusterScheduler: > YarnClusterScheduler.postStartHook done > 16/03/03 12:44:39 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/03/03 12:44:39 INFO SparkContext: Invoking stop() from shutdown hook > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/metrics/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/api,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/static,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors/threadDump,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/executors,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/environment/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/environment,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/03/03 12:44:39 INFO ContextHandler: stopped > o.e.j.s.ServletContextH
[jira] [Commented] (SPARK-2666) Always try to cancel running tasks when a stage is marked as zombie
[ https://issues.apache.org/jira/browse/SPARK-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176149#comment-15176149 ] Thomas Graves commented on SPARK-2666: -- [~lianhuiwang] were you going to work on this? I'm running into this and I think its a bad idea to keep running the old tasks. It all depends on what and how long those tasks are running. In my case those tasks run a very long time doing an expensive shuffle. We should kill those tasks immediately to allow tasks from the newer retry Stage to run. Did you run into issues with your pr or just needed rebase? > Always try to cancel running tasks when a stage is marked as zombie > --- > > Key: SPARK-2666 > URL: https://issues.apache.org/jira/browse/SPARK-2666 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Reporter: Lianhui Wang > > There are some situations in which the scheduler can mark a task set as a > "zombie" before the task set has completed all of its tasks. For example: > (a) When a task fails b/c of a {{FetchFailed}} > (b) When a stage completes because two different attempts create all the > ShuffleMapOutput, though no attempt has completed all its tasks (at least, > this *should* result in the task set being marked as zombie, see SPARK-10370) > (there may be others, I'm not sure if this list is exhaustive.) > Marking a taskset as zombie prevents any *additional* tasks from getting > scheduled, however it does not cancel all currently running tasks. We should > cancel all running to avoid wasting resources (and also to make the behavior > a little more clear to the end user). Rather than canceling tasks in each > case piecemeal, we should refactor the scheduler so that these two actions > are always taken together -- canceling tasks should go hand-in-hand with > marking the taskset as zombie. > Some implementation notes: > * We should change {{taskSetManager.isZombie}} to be private and put it > behind a method like {{markZombie}} or something. > * marking a stage as zombie before the all tasks have completed does *not* > necessarily mean the stage attempt has failed. In case (a), the stage > attempt has failed, but in stage (b) we are not canceling b/c of a failure, > rather just b/c no more tasks are needed. > * {{taskScheduler.cancelTasks}} always marks the task set as zombie. > However, it also has some side-effects like logging that the stage has failed > and creating a {{TaskSetFailed}} event, which we don't want eg. in case (b) > when nothing has failed. So it may need some additional refactoring to go > along w/ {{markZombie}}. > * {{SchedulerBackend}}'s are free to not implement {{killTask}}, so we need > to be sure to catch the {{UnsupportedOperationException}} s > * Testing this *might* benefit from SPARK-10372 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13481) History server page with a default sorting as "desc" time.
[ https://issues.apache.org/jira/browse/SPARK-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13481: -- Assignee: Zhuo Liu > History server page with a default sorting as "desc" time. > -- > > Key: SPARK-13481 > URL: https://issues.apache.org/jira/browse/SPARK-13481 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Zhuo Liu >Assignee: Zhuo Liu >Priority: Minor > Fix For: 2.0.0 > > > Now by default, it shows as ascending order of appId. We might prefer to > display as descending order by default, which will show the latest > application at the top. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13481) History server page with a default sorting as "desc" time.
[ https://issues.apache.org/jira/browse/SPARK-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13481. --- Resolution: Fixed Fix Version/s: 2.0.0 > History server page with a default sorting as "desc" time. > -- > > Key: SPARK-13481 > URL: https://issues.apache.org/jira/browse/SPARK-13481 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Zhuo Liu >Assignee: Zhuo Liu >Priority: Minor > Fix For: 2.0.0 > > > Now by default, it shows as ascending order of appId. We might prefer to > display as descending order by default, which will show the latest > application at the top. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169068#comment-15169068 ] Thomas Graves commented on SPARK-3374: -- Just adding some more detail: Note we should remove all environment variable configs. These were mostly used in yarn-client mode (see YarnClientSchedulerBackend). We should check all yarn code for deprecated configs and remove. We should also look at the yarn.ClientArguments and ApplicationMasterArguments to see if we really need these or if we can just do it through configs. > Spark on Yarn remove deprecated configs for 2.0 > --- > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12523) Support long-running of the Spark On HBase and hive meta store.
[ https://issues.apache.org/jira/browse/SPARK-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-12523. --- Resolution: Fixed Assignee: SaintBacchus Fix Version/s: 2.0.0 > Support long-running of the Spark On HBase and hive meta store. > --- > > Key: SPARK-12523 > URL: https://issues.apache.org/jira/browse/SPARK-12523 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 2.0.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Fix For: 2.0.0 > > > **AMDelegationTokenRenewer** now only obtain the HDFS token in AM, if we want > to use long-running Spark on HBase or hive meta store, we should obtain the > these token as also. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-12316. --- Resolution: Fixed Fix Version/s: 2.0.0 1.6.1 > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Fix For: 1.6.1, 2.0.0 > > Attachments: 20151210045149.jpg, 20151210045533.jpg > > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. > !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg! > !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11316) coalesce doesn't handle UnionRDD with partial locality properly
[ https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-11316: -- Summary: coalesce doesn't handle UnionRDD with partial locality properly (was: coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly) > coalesce doesn't handle UnionRDD with partial locality properly > --- > > Key: SPARK-11316 > URL: https://issues.apache.org/jira/browse/SPARK-11316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > So I haven't fully debugged this yet but reporting what I'm seeing and think > might be going on. > I have a graph processing job that is seeing huge slow down in setupGroups in > the location iterator where its getting the preferred locations for the > coalesce. They are coalescing from 2400 down to 1200 and its taking 17+ > hours to do the calculation. Killed it at this point so don't know total > time. > It appears that the job is doing an isEmpty call, a bunch of other > transformation, then a coalesce (where it takes so long), other > transformations, then finally a count to trigger it. > It appears that there is only one node that its finding in the setupGroup > call and to get to that node it has to first to through the while loop: > while (numCreated < targetLen && tries < expectedCoupons2) { > where expectedCoupons2 is around 19000. It finds very few or none in this > loop. > Then it does the second loop: > while (numCreated < targetLen) { // if we don't have enough partition > groups, create duplicates > var (nxt_replica, nxt_part) = rotIt.next() > val pgroup = PartitionGroup(nxt_replica) > groupArr += pgroup > groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup > var tries = 0 > while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // > ensure at least one part > nxt_part = rotIt.next()._2 > tries += 1 > } > numCreated += 1 > } > Where it has an inner while loop and both of those are going 1200 times. > 1200*1200 loops. This is taking a very long time. > The user can work around the issue by adding in a count() call very close to > after the isEmpty call before the coalesce is called. I also tried putting > in a take(1) right before the isEmpty call and it seems to work around > the issue, took 1 hours with the take vs a few minutes with the count(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13364) history server application column Id not sorting as number
[ https://issues.apache.org/jira/browse/SPARK-13364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13364: -- Assignee: Zhuo Liu > history server application column Id not sorting as number > -- > > Key: SPARK-13364 > URL: https://issues.apache.org/jira/browse/SPARK-13364 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Minor > Fix For: 2.0.0 > > > The new history server is using datatables, the application column isn't > sorting them properly. Its not sorting the last _X part right. below is > an example where the 30174 should be before 30149 > application_1453493359692_30149 > application_1453493359692_30174 > I'm guessing its sorting used the string rather then just the > application id. > href="/history/application_1453493359692_30029/1/jobs/">application_1453493359692_30029 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13364) history server application column Id not sorting as number
[ https://issues.apache.org/jira/browse/SPARK-13364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13364. --- Resolution: Fixed Fix Version/s: 2.0.0 > history server application column Id not sorting as number > -- > > Key: SPARK-13364 > URL: https://issues.apache.org/jira/browse/SPARK-13364 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Minor > Fix For: 2.0.0 > > > The new history server is using datatables, the application column isn't > sorting them properly. Its not sorting the last _X part right. below is > an example where the 30174 should be before 30149 > application_1453493359692_30149 > application_1453493359692_30174 > I'm guessing its sorting used the string rather then just the > application id. > href="/history/application_1453493359692_30029/1/jobs/">application_1453493359692_30029 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11316) coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly
[ https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-11316: - Assignee: Thomas Graves > coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly > --- > > Key: SPARK-11316 > URL: https://issues.apache.org/jira/browse/SPARK-11316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > So I haven't fully debugged this yet but reporting what I'm seeing and think > might be going on. > I have a graph processing job that is seeing huge slow down in setupGroups in > the location iterator where its getting the preferred locations for the > coalesce. They are coalescing from 2400 down to 1200 and its taking 17+ > hours to do the calculation. Killed it at this point so don't know total > time. > It appears that the job is doing an isEmpty call, a bunch of other > transformation, then a coalesce (where it takes so long), other > transformations, then finally a count to trigger it. > It appears that there is only one node that its finding in the setupGroup > call and to get to that node it has to first to through the while loop: > while (numCreated < targetLen && tries < expectedCoupons2) { > where expectedCoupons2 is around 19000. It finds very few or none in this > loop. > Then it does the second loop: > while (numCreated < targetLen) { // if we don't have enough partition > groups, create duplicates > var (nxt_replica, nxt_part) = rotIt.next() > val pgroup = PartitionGroup(nxt_replica) > groupArr += pgroup > groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup > var tries = 0 > while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // > ensure at least one part > nxt_part = rotIt.next()._2 > tries += 1 > } > numCreated += 1 > } > Where it has an inner while loop and both of those are going 1200 times. > 1200*1200 loops. This is taking a very long time. > The user can work around the issue by adding in a count() call very close to > after the isEmpty call before the coalesce is called. I also tried putting > in a take(1) right before the isEmpty call and it seems to work around > the issue, took 1 hours with the take vs a few minutes with the count(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11316) coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly
[ https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-11316: -- Summary: coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly (was: isEmpty before coalesce seems to cause huge performance issue in setupGroups) > coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly > --- > > Key: SPARK-11316 > URL: https://issues.apache.org/jira/browse/SPARK-11316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Critical > > So I haven't fully debugged this yet but reporting what I'm seeing and think > might be going on. > I have a graph processing job that is seeing huge slow down in setupGroups in > the location iterator where its getting the preferred locations for the > coalesce. They are coalescing from 2400 down to 1200 and its taking 17+ > hours to do the calculation. Killed it at this point so don't know total > time. > It appears that the job is doing an isEmpty call, a bunch of other > transformation, then a coalesce (where it takes so long), other > transformations, then finally a count to trigger it. > It appears that there is only one node that its finding in the setupGroup > call and to get to that node it has to first to through the while loop: > while (numCreated < targetLen && tries < expectedCoupons2) { > where expectedCoupons2 is around 19000. It finds very few or none in this > loop. > Then it does the second loop: > while (numCreated < targetLen) { // if we don't have enough partition > groups, create duplicates > var (nxt_replica, nxt_part) = rotIt.next() > val pgroup = PartitionGroup(nxt_replica) > groupArr += pgroup > groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup > var tries = 0 > while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // > ensure at least one part > nxt_part = rotIt.next()._2 > tries += 1 > } > numCreated += 1 > } > Where it has an inner while loop and both of those are going 1200 times. > 1200*1200 loops. This is taking a very long time. > The user can work around the issue by adding in a count() call very close to > after the isEmpty call before the coalesce is called. I also tried putting > in a take(1) right before the isEmpty call and it seems to work around > the issue, took 1 hours with the take vs a few minutes with the count(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-2541) Standalone mode can't access secure HDFS anymore
[ https://issues.apache.org/jira/browse/SPARK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reopened SPARK-2541: -- > Standalone mode can't access secure HDFS anymore > > > Key: SPARK-2541 > URL: https://issues.apache.org/jira/browse/SPARK-2541 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0, 1.0.1 >Reporter: Thomas Graves > Attachments: SPARK-2541-partial.patch > > > In spark 0.9.x you could access secure HDFS from Standalone deploy, that > doesn't work in 1.X anymore. > It looks like the issues is in SparkHadoopUtil.runAsSparkUser. Previously it > wouldn't do the doAs if the currentUser == user. Not sure how it affects > when the daemons run as a super user but SPARK_USER is set to someone else. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2541) Standalone mode can't access secure HDFS anymore
[ https://issues.apache.org/jira/browse/SPARK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152357#comment-15152357 ] Thomas Graves commented on SPARK-2541: -- I'm fine with reopening this. Ideally we would officially support security in standalone mode but it appears that hasn't happened so I think fixing this lets it work. > Standalone mode can't access secure HDFS anymore > > > Key: SPARK-2541 > URL: https://issues.apache.org/jira/browse/SPARK-2541 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0, 1.0.1 >Reporter: Thomas Graves > Attachments: SPARK-2541-partial.patch > > > In spark 0.9.x you could access secure HDFS from Standalone deploy, that > doesn't work in 1.X anymore. > It looks like the issues is in SparkHadoopUtil.runAsSparkUser. Previously it > wouldn't do the doAs if the currentUser == user. Not sure how it affects > when the daemons run as a super user but SPARK_USER is set to someone else. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13365) should coalesce do anything if coalescing to same number of partitions without shuffle
Thomas Graves created SPARK-13365: - Summary: should coalesce do anything if coalescing to same number of partitions without shuffle Key: SPARK-13365 URL: https://issues.apache.org/jira/browse/SPARK-13365 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.6.0 Reporter: Thomas Graves Currently if a user does a coalesce to the same number of partitions as already exist it spends a bunch of time doing stuff when it seems like it shouldn't do anything. for instance I have an RDD with 100 partitions if I run coalesce(100) it seems like it should skip any computation since it already has 100 partitions. One case I've seen this is actually when users do coalesce(1000) without the shuffle which really turns into a coalesce(100). I'm presenting this as a question as I'm not sure if there are use cases I haven't thought of where this would break. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13364) history server application column not sorting properly
Thomas Graves created SPARK-13364: - Summary: history server application column not sorting properly Key: SPARK-13364 URL: https://issues.apache.org/jira/browse/SPARK-13364 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.0.0 Reporter: Thomas Graves The new history server is using datatables, the application column isn't sorting them properly. Its not sorting the last _X part right. below is an example where the 30174 should be before 30149 application_1453493359692_30149 application_1453493359692_30174 I'm guessing its sorting used the string rather then just the application id. application_1453493359692_30029 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150616#comment-15150616 ] Thomas Graves commented on SPARK-12316: --- [~hshreedharan] I think what you are saying also makes sense but is a much bigger change. as I mention on the Pr we just checked for the credentials file so if it didn't get renewed then something strange happened anyway so delaying 1 minute to retry seems reasonable. We can definitely add in more logic around this too. For instance only retry a certain number of tries or if the staging dir is gone rather then no files within it, immediately exit but I think that can be done separately. Please comment on the PR if you have alternate ideas. > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Attachments: 20151210045149.jpg, 20151210045533.jpg > > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. > !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg! > !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150583#comment-15150583 ] Thomas Graves commented on SPARK-12316: --- Ah ok, thanks for the clarification. I'll make any further comments on the PR. > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Attachments: 20151210045149.jpg, 20151210045533.jpg > > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. > !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg! > !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-11701. --- Resolution: Duplicate > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and save back to hdfs and set the > speculation parameters low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-11701: -- Description: I am using dynamic container allocation and speculation and am seeing issues with the active task accounting. The Executor UI still shows active tasks on the an executor but the job/stage is all completed. I think its also affecting the dynamic allocation being able to release containers because it thinks there are still tasks. Its easily reproduce by using spark-shell, turn on dynamic allocation, then run just a wordcount on decent sized file and save back to hdfs and set the speculation parameters low: spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.dynamicAllocation.maxExecutors 10 spark.dynamicAllocation.minExecutors 2 spark.dynamicAllocation.initialExecutors 10 spark.dynamicAllocation.executorIdleTimeout 40s $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g was: I am using dynamic container allocation and speculation and am seeing issues with the active task accounting. The Executor UI still shows active tasks on the an executor but the job/stage is all completed. I think its also affecting the dynamic allocation being able to release containers because it thinks there are still tasks. Its easily reproduce by using spark-shell, turn on dynamic allocation, then run just a wordcount on decent sized file and set the speculation parameters low: spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.dynamicAllocation.maxExecutors 10 spark.dynamicAllocation.minExecutors 2 spark.dynamicAllocation.initialExecutors 10 spark.dynamicAllocation.executorIdleTimeout 40s $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and save back to hdfs and set the > speculation parameters low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13343) speculative tasks that didn't commit shouldn't be marked as success
Thomas Graves created SPARK-13343: - Summary: speculative tasks that didn't commit shouldn't be marked as success Key: SPARK-13343 URL: https://issues.apache.org/jira/browse/SPARK-13343 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 1.6.0 Reporter: Thomas Graves Currently Speculative tasks that didn't commit can show up as success of failures (depending on timing of commit). This is a bit confusing because that task didn't really succeed in the sense it didn't write anything. I think these tasks should be marked as KILLED or something that is more obvious to the user exactly what happened. it is happened to hit the timing where it got a commit denied exception then it shows up as failed and counts against your task failures. It shouldn't count against task failures since that failure really doesn't matter. MapReduce handles these situation so perhaps we can look there for a model. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4224) Support group acls
[ https://issues.apache.org/jira/browse/SPARK-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148682#comment-15148682 ] Thomas Graves commented on SPARK-4224: -- Ok, I understand closing older things but in this instance I would like it to stay open. Also it would be nice if you put comment in there stating why it was closed. It is still on our list of todos and its a feature that I would definitely like in Spark. It makes certain things much easier for organizations and it you look at many other open source products (hadoop, storm, etc) they all support group acls. When you work in teams (with 10-30 people) its much easier to just add groups to the acls then list out individual users. > Support group acls > -- > > Key: SPARK-4224 > URL: https://issues.apache.org/jira/browse/SPARK-4224 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > Currently we support view and modify acls but you have to specify a list of > users. It would be nice to also support groups, so that anyone in the group > has permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148662#comment-15148662 ] Thomas Graves commented on SPARK-12316: --- I'm not following how this ended up in an infinite loop. Can you please describe exactly what you are seeing? for instance, shutdown is happening you happen to hit updateCredentialsIfRequired. But if the File isn't found you would get an exception and fall back to schedule it an hour later in the catch NonFatal. If the stop was already called then delegationTokenRenewer.shutdown() should have happened and I assume schedule would have thrown (perhaps I'm wrong here). > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Attachments: 20151210045149.jpg, 20151210045533.jpg > > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. > !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg! > !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-4224) Support group acls
[ https://issues.apache.org/jira/browse/SPARK-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reopened SPARK-4224: -- > Support group acls > -- > > Key: SPARK-4224 > URL: https://issues.apache.org/jira/browse/SPARK-4224 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > Currently we support view and modify acls but you have to specify a list of > users. It would be nice to also support groups, so that anyone in the group > has permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4224) Support group acls
[ https://issues.apache.org/jira/browse/SPARK-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148627#comment-15148627 ] Thomas Graves commented on SPARK-4224: -- why did you close this? > Support group acls > -- > > Key: SPARK-4224 > URL: https://issues.apache.org/jira/browse/SPARK-4224 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > Currently we support view and modify acls but you have to specify a list of > users. It would be nice to also support groups, so that anyone in the group > has permissions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13124) Adding JQuery DataTables messed up the Web UI css and js
[ https://issues.apache.org/jira/browse/SPARK-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13124. --- Resolution: Fixed Fix Version/s: 2.0.0 > Adding JQuery DataTables messed up the Web UI css and js > > > Key: SPARK-13124 > URL: https://issues.apache.org/jira/browse/SPARK-13124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Alex Bozarth > Fix For: 2.0.0 > > Attachments: css_issue.png, js_issue.png > > > With the addition of JQuery DataTables in SPARK-10873 all the old tables are > using the new DataTables css instead of the old css. Though we most likely > want to switch over completely to DataTables eventually, we should still keep > the old tables UI. > Also when you open up Web Inspector all pages in the WebUI throw an > jsonFormatter.min.js.map not found error. This file was not included in the > update and seems to be required to use Web Inspector on the new js file > (Error doesn't affect actual use) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13124) Adding JQuery DataTables messed up the Web UI css and js
[ https://issues.apache.org/jira/browse/SPARK-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13124: -- Assignee: Alex Bozarth > Adding JQuery DataTables messed up the Web UI css and js > > > Key: SPARK-13124 > URL: https://issues.apache.org/jira/browse/SPARK-13124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Alex Bozarth > Fix For: 2.0.0 > > Attachments: css_issue.png, js_issue.png > > > With the addition of JQuery DataTables in SPARK-10873 all the old tables are > using the new DataTables css instead of the old css. Though we most likely > want to switch over completely to DataTables eventually, we should still keep > the old tables UI. > Also when you open up Web Inspector all pages in the WebUI throw an > jsonFormatter.min.js.map not found error. This file was not included in the > update and seems to be required to use Web Inspector on the new js file > (Error doesn't affect actual use) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13126. --- Resolution: Fixed Fix Version/s: 2.0.0 > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Zhuo Liu >Priority: Minor > Fix For: 2.0.0 > > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-13126: -- Assignee: Zhuo Liu > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Zhuo Liu >Priority: Minor > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13163) Column width on new History Server DataTables not getting set correctly
[ https://issues.apache.org/jira/browse/SPARK-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-13163. --- Resolution: Fixed Fix Version/s: 2.0.0 > Column width on new History Server DataTables not getting set correctly > --- > > Key: SPARK-13163 > URL: https://issues.apache.org/jira/browse/SPARK-13163 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Priority: Minor > Fix For: 2.0.0 > > Attachments: page_width_fixed.png, width_long_name.png > > > The column width on the DataTable UI for the History Server is being set for > all entries in the table not just the current page. This means if there is > even one App with a long name in your history the table will look really odd > as seen below. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11316) isEmpty before coalesce seems to cause huge performance issue in setupGroups
[ https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139658#comment-15139658 ] Thomas Graves commented on SPARK-11316: --- So we ran into this again, here is the scenario and what is happening: UnionRDD is being coalesced. The UnionRDD is made up of mapPartitionRDD with not preferred locations and a checkpointedRDD with preferred locations. Its coalescing to a > number of partitions but its not using shuffle so its going to coalesce to same number of partitions. The UnionRDD has 2 Rdd's, one with 1020 in MapPartitionsRDD and 960 in CheckPointedRDD, thus its coalescing from 1980 to 1980. It goes into the setupGroups called to setup 1980 groups, but since the MapPartitionsRDD doesn't have preferred locations it only has 960 actual preferred locations. It goes through the first while loop and create partitionsGroups for each of the hosts possible until it hits expectedCoupons2 number. In this has it hits 1661, so it created groups for 1661 of 1980 and a bunch of those groups got partitions assigned (out of the 960). It then enters the second while loop to go through the rest of the 1980-1661=319 groups it needs. Here though for each of the 319 iterations it goes into the inner while loop while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) trying to add a partition to each group. In this case since there are less partitions then groups it ends up walking through targetLen almost all of the times and never adding a partition to the group because all the partitions are already assigned to groups (because we only have 960 partitions to put into 1980 groups). The entire process of 319 * 1980 tries takes over 15 minutes (3 seconds per 319 interation). > isEmpty before coalesce seems to cause huge performance issue in setupGroups > > > Key: SPARK-11316 > URL: https://issues.apache.org/jira/browse/SPARK-11316 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Critical > > So I haven't fully debugged this yet but reporting what I'm seeing and think > might be going on. > I have a graph processing job that is seeing huge slow down in setupGroups in > the location iterator where its getting the preferred locations for the > coalesce. They are coalescing from 2400 down to 1200 and its taking 17+ > hours to do the calculation. Killed it at this point so don't know total > time. > It appears that the job is doing an isEmpty call, a bunch of other > transformation, then a coalesce (where it takes so long), other > transformations, then finally a count to trigger it. > It appears that there is only one node that its finding in the setupGroup > call and to get to that node it has to first to through the while loop: > while (numCreated < targetLen && tries < expectedCoupons2) { > where expectedCoupons2 is around 19000. It finds very few or none in this > loop. > Then it does the second loop: > while (numCreated < targetLen) { // if we don't have enough partition > groups, create duplicates > var (nxt_replica, nxt_part) = rotIt.next() > val pgroup = PartitionGroup(nxt_replica) > groupArr += pgroup > groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup > var tries = 0 > while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // > ensure at least one part > nxt_part = rotIt.next()._2 > tries += 1 > } > numCreated += 1 > } > Where it has an inner while loop and both of those are going 1200 times. > 1200*1200 loops. This is taking a very long time. > The user can work around the issue by adding in a count() call very close to > after the isEmpty call before the coalesce is called. I also tried putting > in a take(1) right before the isEmpty call and it seems to work around > the issue, took 1 hours with the take vs a few minutes with the count(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137000#comment-15137000 ] Thomas Graves commented on SPARK-12316: --- you say "endless cycle call" do you mean the application master hangs? It seems like it should throw and if the application is done it should just exit anyway since the AM is just calling stop on it.I just want to clarify what is happening because I assume even if you wait a minute you could still hit the same condition once when its tearing down. > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Attachments: 20151210045149.jpg, 20151210045533.jpg > > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. > !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg! > !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.
[ https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-12316: -- Assignee: SaintBacchus > Stack overflow with endless call of `Delegation token thread` when > application end. > --- > > Key: SPARK-12316 > URL: https://issues.apache.org/jira/browse/SPARK-12316 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.6.0 >Reporter: SaintBacchus >Assignee: SaintBacchus > Attachments: 20151210045149.jpg, 20151210045533.jpg > > > When application end, AM will clean the staging dir. > But if the driver trigger to update the delegation token, it will can't find > the right token file and then it will endless cycle call the method > 'updateCredentialsIfRequired'. > Then it lead to StackOverflowError. > !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg! > !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10873) can't sort columns on history page
[ https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-10873. --- Resolution: Fixed Fix Version/s: 2.0.0 > can't sort columns on history page > -- > > Key: SPARK-10873 > URL: https://issues.apache.org/jira/browse/SPARK-10873 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Thomas Graves >Assignee: Zhuo Liu > Fix For: 2.0.0 > > > Starting with 1.5.1 the history server page isn't allowing sorting by column -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10873) Change history to use datatables to support sorting columns and searching
[ https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-10873: -- Summary: Change history to use datatables to support sorting columns and searching (was: Change history table to use datatables to support sorting columns and searching) > Change history to use datatables to support sorting columns and searching > - > > Key: SPARK-10873 > URL: https://issues.apache.org/jira/browse/SPARK-10873 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Thomas Graves >Assignee: Zhuo Liu > Fix For: 2.0.0 > > > Starting with 1.5.1 the history server page isn't allowing sorting by column -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10873) Change history table to use datatables to support sorting columns and searching
[ https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-10873: -- Summary: Change history table to use datatables to support sorting columns and searching (was: can't sort columns on history page) > Change history table to use datatables to support sorting columns and > searching > --- > > Key: SPARK-10873 > URL: https://issues.apache.org/jira/browse/SPARK-10873 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Thomas Graves >Assignee: Zhuo Liu > Fix For: 2.0.0 > > > Starting with 1.5.1 the history server page isn't allowing sorting by column -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123498#comment-15123498 ] Thomas Graves commented on SPARK-3374: -- The Drive and AM configs should not be combined. In client mode they are completely separate processes and need to be allowed to be configured separately. If there are things we can do to make things more clear I'm all for that. > Spark on Yarn remove deprecated configs for 2.0 > --- > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13064) api/v1/application/jobs/attempt lacks "attempId" field for spark-shell
[ https://issues.apache.org/jira/browse/SPARK-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122077#comment-15122077 ] Thomas Graves commented on SPARK-13064: --- thanks [~vanzin] We can just have the rest api add it as a 1 if it doesn't exist, unless you know a reason we shouldn't do this? >From an api point of view I would prefer to see it always return the attempt >id and for client mode it just always 1. > api/v1/application/jobs/attempt lacks "attempId" field for spark-shell > -- > > Key: SPARK-13064 > URL: https://issues.apache.org/jira/browse/SPARK-13064 > Project: Spark > Issue Type: Improvement >Reporter: Zhuo Liu >Priority: Minor > > For any application launches with spark-shell will not have attemptId field > in their rest API. From the REST API point of view, we might want to force an > Id for it, i.e., "1". > {code} > { > "id" : "application_1453789230389_377545", > "name" : "PySparkShell", > "attempts" : [ { > "startTime" : "2016-01-28T02:17:11.035GMT", > "endTime" : "2016-01-28T02:30:01.355GMT", > "lastUpdated" : "2016-01-28T02:30:01.516GMT", > "duration" : 770320, > "sparkUser" : "huyng", > "completed" : true > } ] > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles
[ https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-1239: Assignee: Thomas Graves > Don't fetch all map output statuses at each reducer during shuffles > --- > > Key: SPARK-1239 > URL: https://issues.apache.org/jira/browse/SPARK-1239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Patrick Wendell >Assignee: Thomas Graves > > Instead we should modify the way we fetch map output statuses to take both a > mapper and a reducer - or we should just piggyback the statuses on each task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10911) Executors should System.exit on clean shutdown
[ https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-10911. --- Resolution: Fixed Fix Version/s: 2.0.0 > Executors should System.exit on clean shutdown > -- > > Key: SPARK-10911 > URL: https://issues.apache.org/jira/browse/SPARK-10911 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Minor > Fix For: 2.0.0 > > > Executors should call System.exit on clean shutdown to make sure all user > threads exit and jvm shuts down. > We ran into a case where an Executor was left around for days trying to > shutdown because the user code was using a non-daemon thread pool and one of > those threads wasn't exiting. We should force the jvm to go away with > System.exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1832) Executor UI improvement suggestions
[ https://issues.apache.org/jira/browse/SPARK-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116057#comment-15116057 ] Thomas Graves commented on SPARK-1832: -- I think they meant the driver. I'll mark this as done. > Executor UI improvement suggestions > --- > > Key: SPARK-1832 > URL: https://issues.apache.org/jira/browse/SPARK-1832 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Thomas Graves > Fix For: 2.0.0 > > > I received some suggestions from a user for the /executors UI page to make it > more helpful. This gets more important when you have a really large number of > executors. > Fill some of the cells with color in order to make it easier to absorb > the info, e.g. > RED if Failed Tasks greater than 0 (maybe the more failed, the more intense > the red) > GREEN if Active Tasks greater than 0 (maybe more intense the larger the > number) > Possibly color code COMPLETE TASKS using various shades of blue (e.g., based > on the log(# completed) > - if dark blue then write the value in white (same for the RED and GREEN above > Maybe mark the MASTER task somehow > > Report the TOTALS in each column (do this at the TOP so no need to scroll > to the bottom, or print both at top and bottom). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1832) Executor UI improvement suggestions
[ https://issues.apache.org/jira/browse/SPARK-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-1832. -- Resolution: Fixed Assignee: Alex Bozarth Fix Version/s: 2.0.0 > Executor UI improvement suggestions > --- > > Key: SPARK-1832 > URL: https://issues.apache.org/jira/browse/SPARK-1832 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Thomas Graves >Assignee: Alex Bozarth > Fix For: 2.0.0 > > > I received some suggestions from a user for the /executors UI page to make it > more helpful. This gets more important when you have a really large number of > executors. > Fill some of the cells with color in order to make it easier to absorb > the info, e.g. > RED if Failed Tasks greater than 0 (maybe the more failed, the more intense > the red) > GREEN if Active Tasks greater than 0 (maybe more intense the larger the > number) > Possibly color code COMPLETE TASKS using various shades of blue (e.g., based > on the log(# completed) > - if dark blue then write the value in white (same for the RED and GREEN above > Maybe mark the MASTER task somehow > > Report the TOTALS in each column (do this at the TOP so no need to scroll > to the bottom, or print both at top and bottom). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12149) Executor UI improvement suggestions - Color UI
[ https://issues.apache.org/jira/browse/SPARK-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-12149. --- Resolution: Fixed Fix Version/s: 2.0.0 > Executor UI improvement suggestions - Color UI > -- > > Key: SPARK-12149 > URL: https://issues.apache.org/jira/browse/SPARK-12149 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Reporter: Alex Bozarth >Assignee: Alex Bozarth > Fix For: 2.0.0 > > > Splitting off the Color UI portion of the parent UI improvements task, > description copied below: > Fill some of the cells with color in order to make it easier to absorb the > info, e.g. > RED if Failed Tasks greater than 0 (maybe the more failed, the more intense > the red) > GREEN if Active Tasks greater than 0 (maybe more intense the larger the > number) > Possibly color code COMPLETE TASKS using various shades of blue (e.g., based > on the log(# completed) > if dark blue then write the value in white (same for the RED and GREEN above > Merging another idea from SPARK-2132: > Color GC time red when over a percentage of task time -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3611) Show number of cores for each executor in application web UI
[ https://issues.apache.org/jira/browse/SPARK-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115309#comment-15115309 ] Thomas Graves commented on SPARK-3611: -- I know the pull request was closed due to not being able to reliably get this information, it looks like its now available through ExecutorInfo structure. > Show number of cores for each executor in application web UI > > > Key: SPARK-3611 > URL: https://issues.apache.org/jira/browse/SPARK-3611 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Matei Zaharia >Priority: Minor > Labels: starter > > This number is not always fully known, because e.g. in Mesos your executors > can scale up and down in # of CPUs, but it would be nice to show at least the > number of cores the machine has in that case, or the # of cores the executor > has been configured with if known. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10911) Executors should System.exit on clean shutdown
[ https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115225#comment-15115225 ] Thomas Graves commented on SPARK-10911: --- see the pull request for comments and discussion https://github.com/apache/spark/pull/9946 > Executors should System.exit on clean shutdown > -- > > Key: SPARK-10911 > URL: https://issues.apache.org/jira/browse/SPARK-10911 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Minor > > Executors should call System.exit on clean shutdown to make sure all user > threads exit and jvm shuts down. > We ran into a case where an Executor was left around for days trying to > shutdown because the user code was using a non-daemon thread pool and one of > those threads wasn't exiting. We should force the jvm to go away with > System.exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11806) Spark 2.0 deprecations and removals
[ https://issues.apache.org/jira/browse/SPARK-11806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108961#comment-15108961 ] Thomas Graves commented on SPARK-11806: --- I added a task to remove the deprecated yarn configs, especially the old env variables. Do we have anything around this for core in general? > Spark 2.0 deprecations and removals > --- > > Key: SPARK-11806 > URL: https://issues.apache.org/jira/browse/SPARK-11806 > Project: Spark > Issue Type: Umbrella > Components: Spark Core >Reporter: Reynold Xin >Assignee: Reynold Xin > Labels: releasenotes > > This is an umbrella ticket to track things we are deprecating and removing in > Spark 2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-3374: - Parent Issue: SPARK-11806 (was: SPARK-3492) > Spark on Yarn remove deprecated configs for 2.0 > --- > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-3374: - Summary: Spark on Yarn remove deprecated configs for 2.0 (was: Spark on Yarn config cleanup) > Spark on Yarn remove deprecated configs for 2.0 > --- > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12930) NullPointerException running hive query with array dereference in select and where clause
[ https://issues.apache.org/jira/browse/SPARK-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108871#comment-15108871 ] Thomas Graves commented on SPARK-12930: --- Note that change the query to remove the ['pos'] from info['pos'] int he select part of the command works around the issue. Stack trace from exception: java.lang.NullPointerException Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1296) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1284) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1283) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1509) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1471) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) > NullPointerException running hive query with array dereference in select and > where clause > - > > Key: SPARK-12930 > URL: https://issues.apache.org/jira/browse/SPARK-12930 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.2 >Reporter: Thomas Graves > > I had a user doing a hive query from spark where they had a array dereference > in the select clause and in the where clause, it gave the user a > NullPointerException when the where clause should have filtered it out. Its > like spark is evaluating the select part before running the where clause. > The info['pos'] below is what caused the issue: > Query looked like: > SELECT foo, > info['pos'] AS pos > FROM db.table > WHERE date >= '$initialDate' AND > date <= '$finalDate' AND > info is not null AND > info['pos'] is not null > LIMIT 10 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12930) NullPointerException running hive query with array dereference in select and where clause
[ https://issues.apache.org/jira/browse/SPARK-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108871#comment-15108871 ] Thomas Graves edited comment on SPARK-12930 at 1/20/16 4:41 PM: Note that change the query to remove the ['pos'] from info['pos'] int he select part of the command works around the issue. Stack trace from exception: java.lang.NullPointerException Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1296) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1284) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1283) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1509) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1471) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1007) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:310) at org.apache.spark.rdd.RDD.reduce(RDD.scala:989) at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1370) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:310) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1357) at org.apache.spark.sql.execution.TakeOrderedAndProject.collectData(basicOperators.scala:257) at org.apache.spark.sql.execution.TakeOrderedAndProject.executeCollect(basicOperators.scala:263) at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385) at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903) at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384) at com.yahoo.corp.sparktests.SparkTests$.main(SparkTests.scala:80) at com.yahoo.corp.sparktests.SparkTests.main(SparkTests.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) was (Author: tgraves): Note that change the query to remove the ['pos'] from info['pos'] int he select part of the command works around the issue. Stack trace from exception: java.lang.NullPointerException Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1296) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1284) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1283) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$han
[jira] [Created] (SPARK-12930) NullPointerException running hive query with array dereference in select and where clause
Thomas Graves created SPARK-12930: - Summary: NullPointerException running hive query with array dereference in select and where clause Key: SPARK-12930 URL: https://issues.apache.org/jira/browse/SPARK-12930 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.2 Reporter: Thomas Graves I had a user doing a hive query from spark where they had a array dereference in the select clause and in the where clause, it gave the user a NullPointerException when the where clause should have filtered it out. Its like spark is evaluating the select part before running the where clause. The info['pos'] below is what caused the issue: Query looked like: SELECT foo, info['pos'] AS pos FROM db.table WHERE date >= '$initialDate' AND date <= '$finalDate' AND info is not null AND info['pos'] is not null LIMIT 10 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch
[ https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-6166: - Assignee: (was: Shixiong Zhu) > Add config to limit number of concurrent outbound connections for shuffle > fetch > --- > > Key: SPARK-6166 > URL: https://issues.apache.org/jira/browse/SPARK-6166 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.4.0 >Reporter: Mridul Muralidharan >Priority: Minor > > spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of > size. > But this is not always sufficient : when the number of hosts in the cluster > increase, this can lead to very large number of in-bound connections to one > more nodes - causing workers to fail under the load. > I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on > number of outstanding outbound connections. > This might still cause hotspots in the cluster, but in our tests this has > significantly reduced the occurance of worker failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1747) check for Spark on Yarn ApplicationMaster split brain
[ https://issues.apache.org/jira/browse/SPARK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106784#comment-15106784 ] Thomas Graves commented on SPARK-1747: -- This should stay open. > check for Spark on Yarn ApplicationMaster split brain > - > > Key: SPARK-1747 > URL: https://issues.apache.org/jira/browse/SPARK-1747 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.0.0 >Reporter: Thomas Graves > > On yarn there is a possibility that applications can end up with an issue > referred to as "split brain". This problem is that you have one Application > Master running, something happens like a network split that the AM can no > longer talk to the ResourceManager. After some time the ResourceManager will > start a new application attempt assuming the old one failed and you end up > with 2 application masters. Note the network split could prevent it from > talking to the RM but it could still be running along contacting regular > executors. > If the previous AM does not need any more resources from the RM it could try > to commit. This could cause lots of problems where the second AM finishes and > tries to commit too. This could potentially result in data corruption. > I believe this same issue can happen on Spark since its using the hadoop > output formats. One instance that has this issue is the FileOutputCommitter. > It first writes to a temporary directory (task commit) and then moves the > file to the final directory (job commit). The first AM could finish the job > commit, tell the user its done, the user starts another down stream job, but > then the second AM comes in to do the job commit and files the down stream > job are processing could disappear until the second AM finishes the job > commit. > This was fixed in MR by https://issues.apache.org/jira/browse/MAPREDUCE-4832 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3374) Spark on Yarn config cleanup
[ https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106766#comment-15106766 ] Thomas Graves commented on SPARK-3374: -- I think with 2.0 we should actually just remove a bunch of the old configs and that would clean it up quite a bit. > Spark on Yarn config cleanup > > > Key: SPARK-3374 > URL: https://issues.apache.org/jira/browse/SPARK-3374 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves > > The configs in yarn have gotten scattered and inconsistent between cluster > and client modes and supporting backwards compatibility. We should try to > clean this up, move things to common places and support configs across both > cluster and client modes where we want to make them public. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12149) Executor UI improvement suggestions - Color UI
[ https://issues.apache.org/jira/browse/SPARK-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-12149: -- Assignee: Alex Bozarth > Executor UI improvement suggestions - Color UI > -- > > Key: SPARK-12149 > URL: https://issues.apache.org/jira/browse/SPARK-12149 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Reporter: Alex Bozarth >Assignee: Alex Bozarth > > Splitting off the Color UI portion of the parent UI improvements task, > description copied below: > Fill some of the cells with color in order to make it easier to absorb the > info, e.g. > RED if Failed Tasks greater than 0 (maybe the more failed, the more intense > the red) > GREEN if Active Tasks greater than 0 (maybe more intense the larger the > number) > Possibly color code COMPLETE TASKS using various shades of blue (e.g., based > on the log(# completed) > if dark blue then write the value in white (same for the RED and GREEN above > Merging another idea from SPARK-2132: > Color GC time red when over a percentage of task time -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12716) Executor UI improvement suggestions - Totals
[ https://issues.apache.org/jira/browse/SPARK-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-12716. --- Resolution: Fixed Assignee: Alex Bozarth Fix Version/s: 2.0.0 > Executor UI improvement suggestions - Totals > > > Key: SPARK-12716 > URL: https://issues.apache.org/jira/browse/SPARK-12716 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Reporter: Alex Bozarth >Assignee: Alex Bozarth > Fix For: 2.0.0 > > > Splitting off the Totals portion of the parent UI improvements task, > description copied below: > I received some suggestions from a user for the /executors UI page to make it > more helpful. This gets more important when you have a really large number of > executors. > ... > Report the TOTALS in each column (do this at the TOP so no need to scroll to > the bottom, or print both at top and bottom). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12784) Spark UI IndexOutOfBoundsException with dynamic allocation
Thomas Graves created SPARK-12784: - Summary: Spark UI IndexOutOfBoundsException with dynamic allocation Key: SPARK-12784 URL: https://issues.apache.org/jira/browse/SPARK-12784 Project: Spark Issue Type: Bug Components: Web UI, YARN Affects Versions: 1.5.2 Reporter: Thomas Graves Trying to load the web UI Executors page when using dynamic allocation running on yarn can lead to an IndexOutOfBoundsException Exception. I'm assuming the number of executors is changing as its trying to be loaded which is causing this as during this time it was letting executors go. HTTP ERROR 500 Problem accessing /executors/. Reason: Server Error Caused by: java.lang.IndexOutOfBoundsException: 1058 at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:52) at scala.collection.immutable.Stream.apply(Stream.scala:185) at org.apache.spark.ui.exec.ExecutorsPage$.getExecInfo(ExecutorsPage.scala:180) at org.apache.spark.ui.exec.ExecutorsPage$$anonfun$11.apply(ExecutorsPage.scala:60) at org.apache.spark.ui.exec.ExecutorsPage$$anonfun$11.apply(ExecutorsPage.scala:59) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.ui.exec.ExecutorsPage.render(ExecutorsPage.scala:59) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79) at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:69) at javax.servlet.http.HttpServlet.service(HttpServlet.java:735) at javax.servlet.http.HttpServlet.service(HttpServlet.java:848) at org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684) at org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496) at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164) at org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467) at org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499) at org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086) at org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428) at org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020) at org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:264) at org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.spark-project.jetty.server.Server.handle(Server.java:370) at org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494) at org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971) at org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033) at org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644) at org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82) at org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667) at org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52) at org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Simply reloading eventually gets the ui to come up so its not a blocker but not a very friendly experience either. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes
[ https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091965#comment-15091965 ] Thomas Graves commented on SPARK-2930: -- I think simply putting a webhdfs url in the examples should be good here. > clarify docs on using webhdfs with spark.yarn.access.namenodes > -- > > Key: SPARK-2930 > URL: https://issues.apache.org/jira/browse/SPARK-2930 > Project: Spark > Issue Type: Improvement > Components: Documentation, YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > > The documentation of spark.yarn.access.namenodes talks about putting > namenodes in it and gives example with hdfs://. > I can also be used with webhdfs so we should clarify how to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes
[ https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091941#comment-15091941 ] Thomas Graves commented on SPARK-2930: -- I think we should still document this. Its a one line change I'll try to get something up today > clarify docs on using webhdfs with spark.yarn.access.namenodes > -- > > Key: SPARK-2930 > URL: https://issues.apache.org/jira/browse/SPARK-2930 > Project: Spark > Issue Type: Improvement > Components: Documentation, YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > > The documentation of spark.yarn.access.namenodes talks about putting > namenodes in it and gives example with hdfs://. > I can also be used with webhdfs so we should clarify how to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes
[ https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reopened SPARK-2930: -- > clarify docs on using webhdfs with spark.yarn.access.namenodes > -- > > Key: SPARK-2930 > URL: https://issues.apache.org/jira/browse/SPARK-2930 > Project: Spark > Issue Type: Improvement > Components: Documentation, YARN >Affects Versions: 1.1.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Minor > > The documentation of spark.yarn.access.namenodes talks about putting > namenodes in it and gives example with hdfs://. > I can also be used with webhdfs so we should clarify how to use it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
[ https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-12654. --- Resolution: Fixed Assignee: Thomas Graves (was: Apache Spark) > sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop > - > > Key: SPARK-12654 > URL: https://issues.apache.org/jira/browse/SPARK-12654 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 1.6.1, 2.0.0 > > > On a secure hadoop cluster using pyspark or spark-shell in yarn client mode > with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute. > Then try to use: > val files = sc.wholeTextFiles("dir") > files.collect() > and it fails with: > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation > Token can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) > > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > at org.apache.hadoop.ipc.Client.call(Client.java:1382) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242) > at > org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55) > at > org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For addit
[jira] [Closed] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
[ https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves closed SPARK-12654. - Resolution: Fixed Fix Version/s: 2.0.0 1.6.1 > sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop > - > > Key: SPARK-12654 > URL: https://issues.apache.org/jira/browse/SPARK-12654 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 1.6.1, 2.0.0 > > > On a secure hadoop cluster using pyspark or spark-shell in yarn client mode > with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute. > Then try to use: > val files = sc.wholeTextFiles("dir") > files.collect() > and it fails with: > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation > Token can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) > > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > at org.apache.hadoop.ipc.Client.call(Client.java:1382) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242) > at > org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55) > at > org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additio
[jira] [Reopened] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
[ https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reopened SPARK-12654: --- > sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop > - > > Key: SPARK-12654 > URL: https://issues.apache.org/jira/browse/SPARK-12654 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Fix For: 1.6.1, 2.0.0 > > > On a secure hadoop cluster using pyspark or spark-shell in yarn client mode > with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute. > Then try to use: > val files = sc.wholeTextFiles("dir") > files.collect() > and it fails with: > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation > Token can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) > > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > at org.apache.hadoop.ipc.Client.call(Client.java:1382) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242) > at > org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55) > at > org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12713) UI Executor page should keep links around to executors that died
[ https://issues.apache.org/jira/browse/SPARK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089458#comment-15089458 ] Thomas Graves commented on SPARK-12713: --- I mean while the job is still running and an executor dies I want to be able to see the logs and stats for it immediately while its still running. I also don't believe the history server properly shows you all executors that have been run unless that has changed recently. > UI Executor page should keep links around to executors that died > > > Key: SPARK-12713 > URL: https://issues.apache.org/jira/browse/SPARK-12713 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 1.5.2 >Reporter: Thomas Graves > > When an executor dies the web ui no longer shows it in the executors page > which makes getting to the logs to see what happened very difficult. I'm > running on yarn so not sure if behavior is different in standalone mode. > We should figure out a way to keep links around to the ones that died so we > can show stats and log links. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12713) UI Executor page should keep links around to executors that died
Thomas Graves created SPARK-12713: - Summary: UI Executor page should keep links around to executors that died Key: SPARK-12713 URL: https://issues.apache.org/jira/browse/SPARK-12713 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 1.5.2 Reporter: Thomas Graves When an executor dies the web ui no longer shows it in the executors page which makes getting to the logs to see what happened very difficult. I'm running on yarn so not sure if behavior is different in standalone mode. We should figure out a way to keep links around to the ones that died so we can show stats and log links. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
[ https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083663#comment-15083663 ] Thomas Graves commented on SPARK-12654: --- It looks like the version of getConf in HadoopRDD already creates it as a JobConf versus a hadoop Configuration. Not sure why NewHadoopRDD didn't do the same. [~joshrosen] Do you know the history on that? > sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop > - > > Key: SPARK-12654 > URL: https://issues.apache.org/jira/browse/SPARK-12654 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > > On a secure hadoop cluster using pyspark or spark-shell in yarn client mode > with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute. > Then try to use: > val files = sc.wholeTextFiles("dir") > files.collect() > and it fails with: > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation > Token can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) > > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > at org.apache.hadoop.ipc.Client.call(Client.java:1382) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242) > at > org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55) > at > org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) -- This message was sent by Atlassian JI
[jira] [Commented] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
[ https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083643#comment-15083643 ] Thomas Graves commented on SPARK-12654: --- So the bug here is that WholeTextFileRDD.getPartitions has: val conf = getConf in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it uses that to create a new newJobContext. The newJobContext will copy credentials around, but credentials are only present in a JobConf not in a Hadoop Configuration. So basically when it is cloning the hadoop configuration its changing it from a JobConf to Configuration and dropping the credentials that were there. NewHadoopRDD just uses the conf passed in for the getPartitions (not getConf) which is why it works. Need to investigate to see if wholeTextfiles should be using conf or if getConf needs to change. > sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop > - > > Key: SPARK-12654 > URL: https://issues.apache.org/jira/browse/SPARK-12654 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > > On a secure hadoop cluster using pyspark or spark-shell in yarn client mode > with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute. > Then try to use: > val files = sc.wholeTextFiles("dir") > files.collect() > and it fails with: > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation > Token can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) > > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > at org.apache.hadoop.ipc.Client.call(Client.java:1382) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainToken
[jira] [Created] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
Thomas Graves created SPARK-12654: - Summary: sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop Key: SPARK-12654 URL: https://issues.apache.org/jira/browse/SPARK-12654 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.5.0 Reporter: Thomas Graves On a secure hadoop cluster using pyspark or spark-shell in yarn client mode with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute. Then try to use: val files = sc.wholeTextFiles("dir") files.collect() and it fails with: py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) at org.apache.hadoop.ipc.Client.call(Client.java:1451) at org.apache.hadoop.ipc.Client.call(Client.java:1382) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242) at org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55) at org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
[ https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-12654: - Assignee: Thomas Graves > sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop > - > > Key: SPARK-12654 > URL: https://issues.apache.org/jira/browse/SPARK-12654 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > > On a secure hadoop cluster using pyspark or spark-shell in yarn client mode > with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute. > Then try to use: > val files = sc.wholeTextFiles("dir") > files.collect() > and it fails with: > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.collectAndServe. > : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation > Token can be issued only with kerberos or web authentication > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090) > > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > at org.apache.hadoop.ipc.Client.call(Client.java:1382) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232) > at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:483) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source) > at > org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434) > at > org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529) > at > org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507) > at > org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100) > at > org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242) > at > org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55) > at > org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064200#comment-15064200 ] Thomas Graves commented on SPARK-11701: --- I ran into another instance of this and its when the job has multiple stages, if its not the last stage and both speculative tasks finish, they are both marked as success. One of them gets ignored which can leave counts wrong and it shows that an executor still has a task. 15/12/18 16:01:08 INFO scheduler.TaskSetManager: Ignoring task-finished event for 8.1 in stage 0.0 because task 8 has already completed successfully In this case the TaskCommit code and DAG scheduler won't handle it, the TaskSetManager.handleSuccessfulTask needs to handle it. > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12384) Allow -Xms to be set differently then -Xmx
[ https://issues.apache.org/jira/browse/SPARK-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062068#comment-15062068 ] Thomas Graves commented on SPARK-12384: --- Yes and that is why this change it meant for the gateway side when users are running Spark-shell or anything in YARN client mode. > Allow -Xms to be set differently then -Xmx > -- > > Key: SPARK-12384 > URL: https://issues.apache.org/jira/browse/SPARK-12384 > Project: Spark > Issue Type: Improvement > Components: Spark Submit, YARN >Affects Versions: 1.6.0 >Reporter: Thomas Graves > > Currently Spark automatically sets the -Xms parameter to be the same as the > -Xmx parameter. We should allow the user to set this separately. > The main use case here is if I'm running the spark-shell on a shared gateway. > Many users specify a larger memory size then needed and will never use that > much memory, so all its doing is preventing other users from potentially > using that memory. Allowing it to be less is just more multi-tenant friendly. > I think it makes sense to leave this for cluster mode, although if a user > really wants to override I don't see why we shouldn't let them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12384) Allow -Xms to be set differently then -Xmx
Thomas Graves created SPARK-12384: - Summary: Allow -Xms to be set differently then -Xmx Key: SPARK-12384 URL: https://issues.apache.org/jira/browse/SPARK-12384 Project: Spark Issue Type: Improvement Components: Spark Submit, YARN Affects Versions: 1.6.0 Reporter: Thomas Graves Currently Spark automatically sets the -Xms parameter to be the same as the -Xmx parameter. We should allow the user to set this separately. The main use case here is if I'm running the spark-shell on a shared gateway. Many users specify a larger memory size then needed and will never use that much memory, so all its doing is preventing other users from potentially using that memory. Allowing it to be less is just more multi-tenant friendly. I think it makes sense to leave this for cluster mode, although if a user really wants to override I don't see why we shouldn't let them. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038603#comment-15038603 ] Thomas Graves commented on SPARK-11701: --- Note after investigating this some more part of the problem is that we get Success back from a speculative task even though the orginal task passed. In this case the second one didn't commit: 15/12/03 18:49:13 INFO mapred.SparkHadoopMapRedUtil: No need to commit output of task because needsTaskCommit=false: attempt_201512031848_0009_m_30_316 Normally these speculative tasks fail with the TaskCommitDenied. I think it makes more sense to mark these as killed, but I think for these particular case if instead of just logging we throw the TaskCommitDenied exception then things just work (including not counting the task as failure for max number task failures). > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10911) Executors should System.exit on clean shutdown
[ https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038418#comment-15038418 ] Thomas Graves commented on SPARK-10911: --- there are other cases this can happen. We've seen it happen on a botched NM upgrade. Someone removed the database for running containers during NM rolling upgrade so it didn't know the running containers existed. I think this could potentially happen in many ways, whether someone does something bad or bugs in RM, standalone, etc.. This should be put in place to make sure the executor exits when it should. > Executors should System.exit on clean shutdown > -- > > Key: SPARK-10911 > URL: https://issues.apache.org/jira/browse/SPARK-10911 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Minor > > Executors should call System.exit on clean shutdown to make sure all user > threads exit and jvm shuts down. > We ran into a case where an Executor was left around for days trying to > shutdown because the user code was using a non-daemon thread pool and one of > those threads wasn't exiting. We should force the jvm to go away with > System.exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10873) can't sort columns on history page
[ https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-10873: -- Assignee: Zhuo Liu > can't sort columns on history page > -- > > Key: SPARK-10873 > URL: https://issues.apache.org/jira/browse/SPARK-10873 > Project: Spark > Issue Type: Bug > Components: Web UI >Reporter: Thomas Graves >Assignee: Zhuo Liu > > Starting with 1.5.1 the history server page isn't allowing sorting by column -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036395#comment-15036395 ] Thomas Graves commented on SPARK-11701: --- Also seems related to https://github.com/apache/spark/pull/9288 > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036366#comment-15036366 ] Thomas Graves commented on SPARK-11701: --- this looks like a dup of SPARK-9038 > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles
[ https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036341#comment-15036341 ] Thomas Graves commented on SPARK-1239: -- I have another user hitting this also. The above mentions other issues that need to be addressed in MapOutputStatusTracker do you have links to those other issues? > Don't fetch all map output statuses at each reducer during shuffles > --- > > Key: SPARK-1239 > URL: https://issues.apache.org/jira/browse/SPARK-1239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Patrick Wendell > > Instead we should modify the way we fetch map output statuses to take both a > mapper and a reducer - or we should just piggyback the statuses on each task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034749#comment-15034749 ] Thomas Graves commented on SPARK-11701: --- The same issue existing with dynamic allocation in 1.6. It looks like the ExecutorAllocationManager is also getting the onTaskEnd and tracking the tasks running on it. Since its not sending it out for these it still thinks it has tasks. > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-11701: - Assignee: Thomas Graves > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034625#comment-15034625 ] Thomas Graves commented on SPARK-11701: --- So it looks like this is a race condition. If the task end for tasks (in this case probably because of speculation) comes in after the stage is finished, then the DAGScheduler.handleTaskCompletion will skip the task completion event: if (!stageIdToStage.contains(task.stageId)) { // Skip all the actions if the stage has been cancelled. return } Since it skips here it never sends out the SparkListenerTaskEnd event and the UI is never updated. I'm assuming this also is affecting the dynamic allocation stuff too (at least in 1.5). I still have to make sure that still exists in 1.6. > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034042#comment-15034042 ] Thomas Graves commented on SPARK-11701: --- tested on latest 1.6 branch and I am no longer seeing the TransportResponseHandler exception. I do still see the original issue. Looking at the logs it seems there is an info message printed near the end on tasks that are on executors still showing active tasks. I'm guessing it is ignoring this and not doing the accounting properly. 15/12/01 16:35:16 INFO TaskSetManager: Ignoring task-finished event for 25.1 in stage 0.0 because task 25 has already completed successfully > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM
[ https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031843#comment-15031843 ] Thomas Graves commented on SPARK-4117: -- [~devaraj.k] thanks for explaining. Sounds good on ApplicationMasterNotRegisteredException since the AMRMClientImpl is handling it. For ApplicationAttemptNotFoundException, you hit one of the places that allocate is called and that is on the first registration. There is another one in the launchReporterThread that regularly gets called after starting. This is the one that catches exceptions and will wait for a number of failures before finally exits. So if ApplicationAttemptNotFoundException is sent anytime after the application is running it will hit that logic. I don't think its that big of an issue since it will eventually exit, it could just take a little longer. It looks like the only cases that should be thrown is if we have already unregistered or something weird happened on RM Where it lost the application. > Spark on Yarn handle AM being told command from RM > -- > > Key: SPARK-4117 > URL: https://issues.apache.org/jira/browse/SPARK-4117 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > In the allocateResponse from the RM it can send commands that the AM should > follow. for instance AM_RESYNC and AM_SHUTDOWN. We should add support for > those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10911) Executors should System.exit on clean shutdown
[ https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-10911: -- Assignee: Zhuo Liu > Executors should System.exit on clean shutdown > -- > > Key: SPARK-10911 > URL: https://issues.apache.org/jira/browse/SPARK-10911 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Assignee: Zhuo Liu >Priority: Minor > > Executors should call System.exit on clean shutdown to make sure all user > threads exit and jvm shuts down. > We ran into a case where an Executor was left around for days trying to > shutdown because the user code was using a non-daemon thread pool and one of > those threads wasn't exiting. We should force the jvm to go away with > System.exit. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM
[ https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024603#comment-15024603 ] Thomas Graves commented on SPARK-4117: -- [~devaraj.k] Where is Spark handling the ApplicationMasterNotRegisteredException and ApplicationAttemptNotFoundException exceptions? doing a quick look I don't see it doing anything special with those. We do catch exceptions for the allocate call but we just increment failure count and try again until we hit max failure count. Ideally I think on a ApplicationMasterNotRegisteredException we would re-register. And for ApplicationAttemptNotFoundException I would think we just immediately shutdown rather then trying again. > Spark on Yarn handle AM being told command from RM > -- > > Key: SPARK-4117 > URL: https://issues.apache.org/jira/browse/SPARK-4117 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 1.2.0 >Reporter: Thomas Graves > > In the allocateResponse from the RM it can send commands that the AM should > follow. for instance AM_RESYNC and AM_SHUTDOWN. We should add support for > those. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
[ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003977#comment-15003977 ] Thomas Graves commented on SPARK-11701: --- [~jerryshao] ARe you referring to my last post about 1.6 or the original description? If recent break I can see the 1.6 issue, but I'm guessing not the 1.5 issue with speculation and tasks staying? > YARN - dynamic allocation and speculation active task accounting wrong > -- > > Key: SPARK-11701 > URL: https://issues.apache.org/jira/browse/SPARK-11701 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.1 >Reporter: Thomas Graves >Priority: Critical > > I am using dynamic container allocation and speculation and am seeing issues > with the active task accounting. The Executor UI still shows active tasks on > the an executor but the job/stage is all completed. I think its also > affecting the dynamic allocation being able to release containers because it > thinks there are still tasks. > Its easily reproduce by using spark-shell, turn on dynamic allocation, then > run just a wordcount on decent sized file and set the speculation parameters > low: > spark.dynamicAllocation.enabled true > spark.shuffle.service.enabled true > spark.dynamicAllocation.maxExecutors 10 > spark.dynamicAllocation.minExecutors 2 > spark.dynamicAllocation.initialExecutors 10 > spark.dynamicAllocation.executorIdleTimeout 40s > $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf > spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 > --master yarn --deploy-mode client --executor-memory 4g --driver-memory 4g -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org