[jira] [Commented] (SPARK-13775) history server sort by completed time by default

2016-03-09 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187820#comment-15187820
 ] 

Thomas Graves commented on SPARK-13775:
---

Note those were basically rhetorical questions.   You are probably right I 
should have waited.  I thought about it but decided to merge anyway since it 
didn't hurt anything.  

> history server sort by completed time by default
> 
>
> Key: SPARK-13775
> URL: https://issues.apache.org/jira/browse/SPARK-13775
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Trivial
>
> The new history server ui using datatables sorts by application Id. Lets 
> change it to sort by completed time like it did with the old table format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13775) history server sort by completed time by default

2016-03-09 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187804#comment-15187804
 ] 

Thomas Graves commented on SPARK-13775:
---

why does it really matter?  Did the version I merged harm anything?  

> history server sort by completed time by default
> 
>
> Key: SPARK-13775
> URL: https://issues.apache.org/jira/browse/SPARK-13775
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Trivial
>
> The new history server ui using datatables sorts by application Id. Lets 
> change it to sort by completed time like it did with the old table format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13642) Properly handle signal kill of ApplicationMaster

2016-03-09 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-13642:
--
Assignee: Saisai Shao

> Properly handle signal kill of ApplicationMaster
> 
>
> Key: SPARK-13642
> URL: https://issues.apache.org/jira/browse/SPARK-13642
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>
> Currently when running Spark on Yarn with yarn cluster mode, the default 
> application final state is "SUCCEED", if any exception is occurred, this 
> final state will be changed to "FAILED" and trigger the reattempt if 
> possible. 
> This is OK in normal case, but if there's a race condition when AM received a 
> signal (SIGTERM) and no any exception is occurred. In this situation, 
> shutdown hook will be invoked and marked this application as finished with 
> success, and there's no another attempt.
> In such situation, actually from Spark's aspect this application is failed 
> and need another attempt, but from Yarn's aspect the application is finished 
> with success.
> This could happened in NM failure situation, the failure of NM will send 
> SIGTERM to AM, AM should mark this attempt as failure and rerun again, not 
> invoke unregister.
> So to increase the chance of this race condition, here is the reproduced code:
> {code}
> val sc = ...
> Thread.sleep(3L)
> sc.parallelize(1 to 100).collect()
> {code}
> If the AM is failed in sleeping, there's no exception been thrown, so from 
> Yarn's point this application is finished successfully, but from Spark's 
> point, this application should be reattempted.
> The log normally like this:
> {noformat}
> 16/03/03 12:44:19 INFO ContainerManagementProtocolProxy: Opening proxy : 
> 192.168.0.105:45454
> 16/03/03 12:44:21 INFO YarnClusterSchedulerBackend: Registered executor 
> NettyRpcEndpointRef(null) (192.168.0.105:57461) with ID 2
> 16/03/03 12:44:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.0.105:57462 with 511.1 MB RAM, BlockManagerId(2, 192.168.0.105, 57462)
> 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: Registered executor 
> NettyRpcEndpointRef(null) (192.168.0.105:57467) with ID 1
> 16/03/03 12:44:23 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.0.105:57468 with 511.1 MB RAM, BlockManagerId(1, 192.168.0.105, 57468)
> 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
> 16/03/03 12:44:23 INFO YarnClusterScheduler: 
> YarnClusterScheduler.postStartHook done
> 16/03/03 12:44:39 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/03/03 12:44:39 INFO SparkContext: Invoking stop() from shutdown hook
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/metrics/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/api,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/static,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/environment/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/environment,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage/rdd,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/pool,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/stage,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletC

[jira] [Commented] (SPARK-13775) history server sort by completed time by default

2016-03-09 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187707#comment-15187707
 ] 

Thomas Graves commented on SPARK-13775:
---

As I stated in the PR, I merged that in because it was ready and better then 
what was there before.  



> history server sort by completed time by default
> 
>
> Key: SPARK-13775
> URL: https://issues.apache.org/jira/browse/SPARK-13775
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Trivial
>
> The new history server ui using datatables sorts by application Id. Lets 
> change it to sort by completed time like it did with the old table format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13775) history server sort by completed time by default

2016-03-09 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-13775:
-

 Summary: history server sort by completed time by default
 Key: SPARK-13775
 URL: https://issues.apache.org/jira/browse/SPARK-13775
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 2.0.0
Reporter: Thomas Graves


The new history server ui using datatables sorts by application Id. Lets change 
it to sort by completed time like it did with the old table format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13723) YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true

2016-03-09 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187424#comment-15187424
 ] 

Thomas Graves commented on SPARK-13723:
---

Warnings going by when spark submit is started are really pretty useless unless 
user is explicitly looking for something.  Way to many things get printed there 
for them to notice.

spark-submit -help doesn't list the behavior of num-executors now when this is 
on. This is probably separate bug.

If its already mis-understood, which I know it is because I've had to explain 
to multiple people, then I don't see an argument for not changing the behavior. 

It really comes down to what would be the best experience for users.  If we 
have arguments one way or another then I could be swayed.

I also think its a bit confusing to look at the configs and see that dynamic 
allocation config is on but its not using it because --num-executors is 
specified.

One reason to not change this is if we think Spark isn't ready.  For instance 
spark has some know issues with scalability and so with dynamic allocations 
users could be getting thousands of executors vs a few or 10's and we could hit 
spark internal issues or require more memory for the AM by default.  If that 
makes user experience worse that would be a reason not to do it.



> YARN - Change behavior of --num-executors when 
> spark.dynamicAllocation.enabled true
> ---
>
> Key: SPARK-13723
> URL: https://issues.apache.org/jira/browse/SPARK-13723
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Priority: Minor
>
> I think we should change the behavior when --num-executors is specified when 
> dynamic allocation is enabled. Currently if --num-executors is specified 
> dynamic allocation is disabled and it just uses a static number of executors.
> I would rather see the default behavior changed in the 2.x line. If dynamic 
> allocation config is on then num-executors goes to max and initial # of 
> executors. I think this would allow users to easily cap their usage and would 
> still allow it to free up executors. It would also allow users doing ML start 
> out with a # of executors and if they are actually caching the data the 
> executors wouldn't be freed up. So you would get very similar behavior to if 
> dynamic allocation was off.
> Part of the reason for this is when using a static number if generally wastes 
> resources, especially with people doing adhoc things with spark-shell. It 
> also has a big affect when people are doing MapReduce/ETL type work loads.   
> The problem is that people are used to specifying num-executors so if we turn 
> it on by default in a cluster config its just overridden.
> We should also update the spark-submit --help description for --num-executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0

2016-03-09 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15187141#comment-15187141
 ] 

Thomas Graves commented on SPARK-3374:
--

It seems this work is also being done under 
https://issues.apache.org/jira/browse/SPARK-12343 which has a pull request up 
for it already.

> Spark on Yarn remove deprecated configs for 2.0
> ---
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Boyang Jerry Peng
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0

2016-03-09 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-3374:
-
Assignee: Boyang Jerry Peng

> Spark on Yarn remove deprecated configs for 2.0
> ---
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Boyang Jerry Peng
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13675) The url link in historypage is not correct for application running in yarn cluster mode

2016-03-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13675.
---
   Resolution: Fixed
 Assignee: Saisai Shao
Fix Version/s: 2.0.0

> The url link in historypage is not correct for application running in yarn 
> cluster mode
> ---
>
> Key: SPARK-13675
> URL: https://issues.apache.org/jira/browse/SPARK-13675
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
> Fix For: 2.0.0
>
> Attachments: Screen Shot 2016-02-29 at 3.57.32 PM.png
>
>
> Current URL for each application to access history UI is like: 
> http://localhost:18080/history/application_1457058760338_0016/1/jobs/ or 
> http://localhost:18080/history/application_1457058760338_0016/2/jobs/
> Here *1* or *2* represents the number of attempts in {{historypage.js}}, but 
> it will parse to attempt id in {{HistoryServer}}, while the correct attempt 
> id should be like "appattempt_1457058760338_0016_02", so it will fail to 
> parse to a correct attempt id in {{HistoryServer}}.
> This is OK in yarn client mode, since we don't need this attempt id to fetch 
> out the app cache, but it is failed in yarn cluster mode, where attempt id 
> "1" or "2" is actually wrong.
> So here we should fix this url to parse the correct application id and 
> attempt id.
> This bug is newly introduced in SPARK-10873, there's no issue in branch 1.6.
> Here is the screenshot:
> !https://issues.apache.org/jira/secure/attachment/12791437/Screen%20Shot%202016-02-29%20at%203.57.32%20PM.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13723) YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true

2016-03-07 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-13723:
-

 Summary: YARN - Change behavior of --num-executors when 
spark.dynamicAllocation.enabled true
 Key: SPARK-13723
 URL: https://issues.apache.org/jira/browse/SPARK-13723
 Project: Spark
  Issue Type: Improvement
  Components: YARN
Affects Versions: 2.0.0
Reporter: Thomas Graves


I think we should change the behavior when --num-executors is specified when 
dynamic allocation is enabled. Currently if --num-executors is specified 
dynamic allocation is disabled and it just uses a static number of executors.

I would rather see the default behavior changed in the 2.x line. If dynamic 
allocation config is on then num-executors goes to max and initial # of 
executors. I think this would allow users to easily cap their usage and would 
still allow it to free up executors. It would also allow users doing ML start 
out with a # of executors and if they are actually caching the data the 
executors wouldn't be freed up. So you would get very similar behavior to if 
dynamic allocation was off.

Part of the reason for this is when using a static number if generally wastes 
resources, especially with people doing adhoc things with spark-shell. It also 
has a big affect when people are doing MapReduce/ETL type work loads.   The 
problem is that people are used to specifying num-executors so if we turn it on 
by default in a cluster config its just overridden.

We should also update the spark-submit --help description for --num-executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13723) YARN - Change behavior of --num-executors when spark.dynamicAllocation.enabled true

2016-03-07 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183272#comment-15183272
 ] 

Thomas Graves commented on SPARK-13723:
---

see some discussion on https://github.com/apache/spark/pull/11528

> YARN - Change behavior of --num-executors when 
> spark.dynamicAllocation.enabled true
> ---
>
> Key: SPARK-13723
> URL: https://issues.apache.org/jira/browse/SPARK-13723
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>
> I think we should change the behavior when --num-executors is specified when 
> dynamic allocation is enabled. Currently if --num-executors is specified 
> dynamic allocation is disabled and it just uses a static number of executors.
> I would rather see the default behavior changed in the 2.x line. If dynamic 
> allocation config is on then num-executors goes to max and initial # of 
> executors. I think this would allow users to easily cap their usage and would 
> still allow it to free up executors. It would also allow users doing ML start 
> out with a # of executors and if they are actually caching the data the 
> executors wouldn't be freed up. So you would get very similar behavior to if 
> dynamic allocation was off.
> Part of the reason for this is when using a static number if generally wastes 
> resources, especially with people doing adhoc things with spark-shell. It 
> also has a big affect when people are doing MapReduce/ETL type work loads.   
> The problem is that people are used to specifying num-executors so if we turn 
> it on by default in a cluster config its just overridden.
> We should also update the spark-submit --help description for --num-executors



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13459) Separate Alive and Dead Executors in Executor Totals Table

2016-03-04 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13459.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Separate Alive and Dead Executors in Executor Totals Table
> --
>
> Key: SPARK-13459
> URL: https://issues.apache.org/jira/browse/SPARK-13459
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Alex Bozarth
>Assignee: Alex Bozarth
>Priority: Minor
> Fix For: 2.0.0
>
>
> Now that dead executors are shown in the executors table (SPARK-7729) the 
> totals table added in SPARK-12716 should be updated to include the separate 
> totals for alive and dead executors as well as the current total.
> (This improvement was originally discussed in the PR for SPARK-12716 while 
> SPARK-7729 was still in progress.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13459) Separate Alive and Dead Executors in Executor Totals Table

2016-03-04 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-13459:
--
Assignee: Alex Bozarth

> Separate Alive and Dead Executors in Executor Totals Table
> --
>
> Key: SPARK-13459
> URL: https://issues.apache.org/jira/browse/SPARK-13459
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Alex Bozarth
>Assignee: Alex Bozarth
>Priority: Minor
>
> Now that dead executors are shown in the executors table (SPARK-7729) the 
> totals table added in SPARK-12716 should be updated to include the separate 
> totals for alive and dead executors as well as the current total.
> (This improvement was originally discussed in the PR for SPARK-12716 while 
> SPARK-7729 was still in progress.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0

2016-03-04 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180027#comment-15180027
 ] 

Thomas Graves commented on SPARK-3374:
--

[~srowen]  can you add [~jerrypeng] as a contributor so he can assign himself 
to jira?

> Spark on Yarn remove deprecated configs for 2.0
> ---
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13642) Inconsistent finishing state between driver and AM

2016-03-03 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15177905#comment-15177905
 ] 

Thomas Graves commented on SPARK-13642:
---

The problem is that you really don't want the opposite to happen... ie a 
success attempt to be marked as failed because then it will be retried and you 
could mess up good data.

  // We report success to avoid
  // retrying applications that have succeeded (System.exit(0)), which 
means that
  // applications that explicitly exit with a non-zero status will also 
show up as
  // succeeded in the RM UI.

In your above case stop() was called on the SparkContext and we assume stop 
means the application ran til finish thus was successful from YARNs point of 
view.

I think we would need a better way for the YARN side to really know what 
happened on the driver side.

> Inconsistent finishing state between driver and AM 
> ---
>
> Key: SPARK-13642
> URL: https://issues.apache.org/jira/browse/SPARK-13642
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: Saisai Shao
>
> Currently when running Spark on Yarn with yarn cluster mode, the default 
> application final state is "SUCCEED", if any exception is occurred, this 
> final state will be changed to "FAILED" and trigger the reattempt if 
> possible. 
> This is OK in normal case, but if there's a race condition when AM received a 
> signal (SIGTERM) and no any exception is occurred. In this situation, 
> shutdown hook will be invoked and marked this application as finished with 
> success, and there's no another attempt.
> In such situation, actually from Spark's aspect this application is failed 
> and need another attempt, but from Yarn's aspect the application is finished 
> with success.
> This could happened in NM failure situation, the failure of NM will send 
> SIGTERM to AM, AM should mark this attempt as failure and rerun again, not 
> invoke unregister.
> So to increase the chance of this race condition, here is the reproduced code:
> {code}
> val sc = ...
> Thread.sleep(3L)
> sc.parallelize(1 to 100).collect()
> {code}
> If the AM is failed in sleeping, there's no exception been thrown, so from 
> Yarn's point this application is finished successfully, but from Spark's 
> point, this application should be reattempted.
> The log normally like this:
> {noformat}
> 16/03/03 12:44:19 INFO ContainerManagementProtocolProxy: Opening proxy : 
> 192.168.0.105:45454
> 16/03/03 12:44:21 INFO YarnClusterSchedulerBackend: Registered executor 
> NettyRpcEndpointRef(null) (192.168.0.105:57461) with ID 2
> 16/03/03 12:44:21 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.0.105:57462 with 511.1 MB RAM, BlockManagerId(2, 192.168.0.105, 57462)
> 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: Registered executor 
> NettyRpcEndpointRef(null) (192.168.0.105:57467) with ID 1
> 16/03/03 12:44:23 INFO BlockManagerMasterEndpoint: Registering block manager 
> 192.168.0.105:57468 with 511.1 MB RAM, BlockManagerId(1, 192.168.0.105, 57468)
> 16/03/03 12:44:23 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready 
> for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
> 16/03/03 12:44:23 INFO YarnClusterScheduler: 
> YarnClusterScheduler.postStartHook done
> 16/03/03 12:44:39 ERROR ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/03/03 12:44:39 INFO SparkContext: Invoking stop() from shutdown hook
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/metrics/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/api,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/static,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/executors,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/environment/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/environment,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/03/03 12:44:39 INFO ContextHandler: stopped 
> o.e.j.s.ServletContextH

[jira] [Commented] (SPARK-2666) Always try to cancel running tasks when a stage is marked as zombie

2016-03-02 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15176149#comment-15176149
 ] 

Thomas Graves commented on SPARK-2666:
--

[~lianhuiwang] were you going to work on this?  I'm running into this and I 
think its a bad idea to keep running the old tasks.  It all depends on what and 
how long those tasks are running.  In my case those tasks run a very long time 
doing an expensive shuffle. We should kill those tasks immediately to allow 
tasks from the newer retry Stage to run.

Did you run into issues with your pr or just needed rebase?

> Always try to cancel running tasks when a stage is marked as zombie
> ---
>
> Key: SPARK-2666
> URL: https://issues.apache.org/jira/browse/SPARK-2666
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Reporter: Lianhui Wang
>
> There are some situations in which the scheduler can mark a task set as a 
> "zombie" before the task set has completed all of its tasks.  For example:
> (a) When a task fails b/c of a {{FetchFailed}}
> (b) When a stage completes because two different attempts create all the 
> ShuffleMapOutput, though no attempt has completed all its tasks (at least, 
> this *should* result in the task set being marked as zombie, see SPARK-10370)
> (there may be others, I'm not sure if this list is exhaustive.)
> Marking a taskset as zombie prevents any *additional* tasks from getting 
> scheduled, however it does not cancel all currently running tasks.  We should 
> cancel all running to avoid wasting resources (and also to make the behavior 
> a little more clear to the end user).  Rather than canceling tasks in each 
> case piecemeal, we should refactor the scheduler so that these two actions 
> are always taken together -- canceling tasks should go hand-in-hand with 
> marking the taskset as zombie.
> Some implementation notes:
> * We should change {{taskSetManager.isZombie}} to be private and put it 
> behind a method like {{markZombie}} or something.
> * marking a stage as zombie before the all tasks have completed does *not* 
> necessarily mean the stage attempt has failed.  In case (a), the stage 
> attempt has failed, but in stage (b) we are not canceling b/c of a failure, 
> rather just b/c no more tasks are needed.
> * {{taskScheduler.cancelTasks}} always marks the task set as zombie.  
> However, it also has some side-effects like logging that the stage has failed 
> and creating a {{TaskSetFailed}} event, which we don't want eg. in case (b) 
> when nothing has failed.  So it may need some additional refactoring to go 
> along w/ {{markZombie}}.
> * {{SchedulerBackend}}'s are free to not implement {{killTask}}, so we need 
> to be sure to catch the {{UnsupportedOperationException}} s
> * Testing this *might* benefit from SPARK-10372



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13481) History server page with a default sorting as "desc" time.

2016-02-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-13481:
--
Assignee: Zhuo Liu

> History server page with a default sorting as "desc" time.
> --
>
> Key: SPARK-13481
> URL: https://issues.apache.org/jira/browse/SPARK-13481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Zhuo Liu
>Assignee: Zhuo Liu
>Priority: Minor
> Fix For: 2.0.0
>
>
> Now by default, it shows as ascending order of appId. We might prefer to 
> display as descending order by default, which will show the latest 
> application at the top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13481) History server page with a default sorting as "desc" time.

2016-02-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13481.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> History server page with a default sorting as "desc" time.
> --
>
> Key: SPARK-13481
> URL: https://issues.apache.org/jira/browse/SPARK-13481
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Zhuo Liu
>Assignee: Zhuo Liu
>Priority: Minor
> Fix For: 2.0.0
>
>
> Now by default, it shows as ascending order of appId. We might prefer to 
> display as descending order by default, which will show the latest 
> application at the top.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0

2016-02-26 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169068#comment-15169068
 ] 

Thomas Graves commented on SPARK-3374:
--

Just adding some more detail:

Note we should remove all environment variable configs. These were mostly used 
in yarn-client mode (see YarnClientSchedulerBackend).  We should check all yarn 
code for deprecated configs and remove.  We should also look at the 
yarn.ClientArguments and ApplicationMasterArguments to see if we really need 
these or if we can just do it through configs.

> Spark on Yarn remove deprecated configs for 2.0
> ---
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12523) Support long-running of the Spark On HBase and hive meta store.

2016-02-26 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-12523.
---
   Resolution: Fixed
 Assignee: SaintBacchus
Fix Version/s: 2.0.0

> Support long-running of the Spark On HBase and hive meta store.
> ---
>
> Key: SPARK-12523
> URL: https://issues.apache.org/jira/browse/SPARK-12523
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 2.0.0
>
>
> **AMDelegationTokenRenewer** now only obtain the HDFS token in AM, if we want 
> to use long-running Spark on HBase or hive meta store, we should obtain  the 
> these token as also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-25 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-12316.
---
   Resolution: Fixed
Fix Version/s: 2.0.0
   1.6.1

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Fix For: 1.6.1, 2.0.0
>
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11316) coalesce doesn't handle UnionRDD with partial locality properly

2016-02-23 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-11316:
--
Summary: coalesce doesn't handle UnionRDD with partial locality properly  
(was: coalesce setupGroups doesn't handle UnionRDD with partial localtiy 
properly)

> coalesce doesn't handle UnionRDD with partial locality properly
> ---
>
> Key: SPARK-11316
> URL: https://issues.apache.org/jira/browse/SPARK-11316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> So I haven't fully debugged this yet but reporting what I'm seeing and think 
> might be going on.
> I have a graph processing job that is seeing huge slow down in setupGroups in 
> the location iterator where its getting the preferred locations for the 
> coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ 
> hours to do the calculation.  Killed it at this point so don't know total 
> time.
> It appears that the job is doing an isEmpty call, a bunch of other 
> transformation, then a coalesce (where it takes so long), other 
> transformations, then finally a count to trigger it.   
> It appears that there is only one node that its finding in the setupGroup 
> call and to get to that node it has to first to through the while loop:
> while (numCreated < targetLen && tries < expectedCoupons2) {
> where expectedCoupons2 is around 19000.  It finds very few or none in this 
> loop.  
> Then it does the second loop:
> while (numCreated < targetLen) {  // if we don't have enough partition 
> groups, create duplicates
>   var (nxt_replica, nxt_part) = rotIt.next()
>   val pgroup = PartitionGroup(nxt_replica)
>   groupArr += pgroup
>   groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
>   var tries = 0
>   while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // 
> ensure at least one part
> nxt_part = rotIt.next()._2
> tries += 1
>   }
>   numCreated += 1
> }
> Where it has an inner while loop and both of those are going 1200 times.  
> 1200*1200 loops.  This is taking a very long time.
> The user can work around the issue by adding in a count() call very close to 
> after the isEmpty call before the coalesce is called.  I also tried putting 
> in a take(1)  right before the isEmpty call and it seems to work around 
> the issue, took 1 hours with the take vs a few minutes with the count().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13364) history server application column Id not sorting as number

2016-02-23 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-13364:
--
Assignee: Zhuo Liu

> history server application column Id not sorting as number
> --
>
> Key: SPARK-13364
> URL: https://issues.apache.org/jira/browse/SPARK-13364
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Minor
> Fix For: 2.0.0
>
>
> The new history server is using datatables, the application column isn't 
> sorting them properly. Its not sorting the last _X part right. below is 
> an example where the 30174 should be before 30149
> application_1453493359692_30149 
> application_1453493359692_30174
> I'm guessing its sorting used the  string rather then just the 
> application id.
>  href="/history/application_1453493359692_30029/1/jobs/">application_1453493359692_30029



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13364) history server application column Id not sorting as number

2016-02-23 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13364.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> history server application column Id not sorting as number
> --
>
> Key: SPARK-13364
> URL: https://issues.apache.org/jira/browse/SPARK-13364
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Minor
> Fix For: 2.0.0
>
>
> The new history server is using datatables, the application column isn't 
> sorting them properly. Its not sorting the last _X part right. below is 
> an example where the 30174 should be before 30149
> application_1453493359692_30149 
> application_1453493359692_30174
> I'm guessing its sorting used the  string rather then just the 
> application id.
>  href="/history/application_1453493359692_30029/1/jobs/">application_1453493359692_30029



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11316) coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly

2016-02-23 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-11316:
-

Assignee: Thomas Graves

> coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly
> ---
>
> Key: SPARK-11316
> URL: https://issues.apache.org/jira/browse/SPARK-11316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> So I haven't fully debugged this yet but reporting what I'm seeing and think 
> might be going on.
> I have a graph processing job that is seeing huge slow down in setupGroups in 
> the location iterator where its getting the preferred locations for the 
> coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ 
> hours to do the calculation.  Killed it at this point so don't know total 
> time.
> It appears that the job is doing an isEmpty call, a bunch of other 
> transformation, then a coalesce (where it takes so long), other 
> transformations, then finally a count to trigger it.   
> It appears that there is only one node that its finding in the setupGroup 
> call and to get to that node it has to first to through the while loop:
> while (numCreated < targetLen && tries < expectedCoupons2) {
> where expectedCoupons2 is around 19000.  It finds very few or none in this 
> loop.  
> Then it does the second loop:
> while (numCreated < targetLen) {  // if we don't have enough partition 
> groups, create duplicates
>   var (nxt_replica, nxt_part) = rotIt.next()
>   val pgroup = PartitionGroup(nxt_replica)
>   groupArr += pgroup
>   groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
>   var tries = 0
>   while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // 
> ensure at least one part
> nxt_part = rotIt.next()._2
> tries += 1
>   }
>   numCreated += 1
> }
> Where it has an inner while loop and both of those are going 1200 times.  
> 1200*1200 loops.  This is taking a very long time.
> The user can work around the issue by adding in a count() call very close to 
> after the isEmpty call before the coalesce is called.  I also tried putting 
> in a take(1)  right before the isEmpty call and it seems to work around 
> the issue, took 1 hours with the take vs a few minutes with the count().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11316) coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly

2016-02-23 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-11316:
--
Summary: coalesce setupGroups doesn't handle UnionRDD with partial localtiy 
properly  (was: isEmpty before coalesce seems to cause huge performance issue 
in setupGroups)

> coalesce setupGroups doesn't handle UnionRDD with partial localtiy properly
> ---
>
> Key: SPARK-11316
> URL: https://issues.apache.org/jira/browse/SPARK-11316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Critical
>
> So I haven't fully debugged this yet but reporting what I'm seeing and think 
> might be going on.
> I have a graph processing job that is seeing huge slow down in setupGroups in 
> the location iterator where its getting the preferred locations for the 
> coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ 
> hours to do the calculation.  Killed it at this point so don't know total 
> time.
> It appears that the job is doing an isEmpty call, a bunch of other 
> transformation, then a coalesce (where it takes so long), other 
> transformations, then finally a count to trigger it.   
> It appears that there is only one node that its finding in the setupGroup 
> call and to get to that node it has to first to through the while loop:
> while (numCreated < targetLen && tries < expectedCoupons2) {
> where expectedCoupons2 is around 19000.  It finds very few or none in this 
> loop.  
> Then it does the second loop:
> while (numCreated < targetLen) {  // if we don't have enough partition 
> groups, create duplicates
>   var (nxt_replica, nxt_part) = rotIt.next()
>   val pgroup = PartitionGroup(nxt_replica)
>   groupArr += pgroup
>   groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
>   var tries = 0
>   while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // 
> ensure at least one part
> nxt_part = rotIt.next()._2
> tries += 1
>   }
>   numCreated += 1
> }
> Where it has an inner while loop and both of those are going 1200 times.  
> 1200*1200 loops.  This is taking a very long time.
> The user can work around the issue by adding in a count() call very close to 
> after the isEmpty call before the coalesce is called.  I also tried putting 
> in a take(1)  right before the isEmpty call and it seems to work around 
> the issue, took 1 hours with the take vs a few minutes with the count().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-2541) Standalone mode can't access secure HDFS anymore

2016-02-18 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reopened SPARK-2541:
--

> Standalone mode can't access secure HDFS anymore
> 
>
> Key: SPARK-2541
> URL: https://issues.apache.org/jira/browse/SPARK-2541
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Thomas Graves
> Attachments: SPARK-2541-partial.patch
>
>
> In spark 0.9.x you could access secure HDFS from Standalone deploy, that 
> doesn't work in 1.X anymore. 
> It looks like the issues is in SparkHadoopUtil.runAsSparkUser.  Previously it 
> wouldn't do the doAs if the currentUser == user.  Not sure how it affects 
> when the daemons run as a super user but SPARK_USER is set to someone else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2541) Standalone mode can't access secure HDFS anymore

2016-02-18 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152357#comment-15152357
 ] 

Thomas Graves commented on SPARK-2541:
--

I'm fine with reopening this. Ideally we would officially support security in 
standalone mode but it appears that hasn't happened so I think fixing this lets 
it work.

> Standalone mode can't access secure HDFS anymore
> 
>
> Key: SPARK-2541
> URL: https://issues.apache.org/jira/browse/SPARK-2541
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0, 1.0.1
>Reporter: Thomas Graves
> Attachments: SPARK-2541-partial.patch
>
>
> In spark 0.9.x you could access secure HDFS from Standalone deploy, that 
> doesn't work in 1.X anymore. 
> It looks like the issues is in SparkHadoopUtil.runAsSparkUser.  Previously it 
> wouldn't do the doAs if the currentUser == user.  Not sure how it affects 
> when the daemons run as a super user but SPARK_USER is set to someone else.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13365) should coalesce do anything if coalescing to same number of partitions without shuffle

2016-02-17 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-13365:
-

 Summary: should coalesce do anything if coalescing to same number 
of partitions without shuffle
 Key: SPARK-13365
 URL: https://issues.apache.org/jira/browse/SPARK-13365
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Thomas Graves


Currently if a user does a coalesce to the same number of partitions as already 
exist it spends a bunch of time doing stuff when it seems like it shouldn't do 
anything.

for instance I have an RDD with 100 partitions if I run coalesce(100) it seems 
like it should skip any computation since it already has 100 partitions.  One 
case I've seen this is actually when users do coalesce(1000) without the 
shuffle which really turns into a coalesce(100).

I'm presenting this as a question as I'm not sure if there are use cases I 
haven't thought of where this would break.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13364) history server application column not sorting properly

2016-02-17 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-13364:
-

 Summary: history server application column not sorting properly
 Key: SPARK-13364
 URL: https://issues.apache.org/jira/browse/SPARK-13364
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.0.0
Reporter: Thomas Graves


The new history server is using datatables, the application column isn't 
sorting them properly. Its not sorting the last _X part right. below is an 
example where the 30174 should be before 30149
application_1453493359692_30149 
application_1453493359692_30174

I'm guessing its sorting used the  string rather then just the 
application id.
application_1453493359692_30029



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-17 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150616#comment-15150616
 ] 

Thomas Graves commented on SPARK-12316:
---

[~hshreedharan] I think what you are saying also makes sense but is a much 
bigger change. as I mention on the Pr we just checked for the credentials file 
so if it didn't get renewed then something strange happened anyway so delaying 
1 minute to retry seems reasonable.

We can definitely add in more logic around this too. For instance only retry a 
certain number of tries or if the staging dir is gone rather then no files 
within it, immediately exit but I think that can be done separately.  

Please comment on the PR if you have alternate ideas.

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-17 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150583#comment-15150583
 ] 

Thomas Graves commented on SPARK-12316:
---

Ah ok, thanks for the clarification.  I'll make any further comments on the PR.

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2016-02-16 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-11701.
---
Resolution: Duplicate

> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and save back to hdfs and set the 
> speculation parameters low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2016-02-16 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-11701:
--
Description: 
I am using dynamic container allocation and speculation and am seeing issues 
with the active task accounting.  The Executor UI still shows active tasks on 
the an executor but the job/stage is all completed.  I think its also affecting 
the dynamic allocation being able to release containers because it thinks there 
are still tasks.

Its easily reproduce by using spark-shell, turn on dynamic allocation, then run 
just a wordcount on decent sized file and save back to hdfs and set the 
speculation parameters low: 

 spark.dynamicAllocation.enabled true
 spark.shuffle.service.enabled true
 spark.dynamicAllocation.maxExecutors 10
 spark.dynamicAllocation.minExecutors 2
 spark.dynamicAllocation.initialExecutors 10
 spark.dynamicAllocation.executorIdleTimeout 40s


$SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 --master 
yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g

  was:
I am using dynamic container allocation and speculation and am seeing issues 
with the active task accounting.  The Executor UI still shows active tasks on 
the an executor but the job/stage is all completed.  I think its also affecting 
the dynamic allocation being able to release containers because it thinks there 
are still tasks.

Its easily reproduce by using spark-shell, turn on dynamic allocation, then run 
just a wordcount on decent sized file and set the speculation parameters low: 

 spark.dynamicAllocation.enabled true
 spark.shuffle.service.enabled true
 spark.dynamicAllocation.maxExecutors 10
 spark.dynamicAllocation.minExecutors 2
 spark.dynamicAllocation.initialExecutors 10
 spark.dynamicAllocation.executorIdleTimeout 40s


$SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 --master 
yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g


> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and save back to hdfs and set the 
> speculation parameters low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13343) speculative tasks that didn't commit shouldn't be marked as success

2016-02-16 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-13343:
-

 Summary: speculative tasks that didn't commit shouldn't be marked 
as success
 Key: SPARK-13343
 URL: https://issues.apache.org/jira/browse/SPARK-13343
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.6.0
Reporter: Thomas Graves


Currently Speculative tasks that didn't commit can show up as success of 
failures (depending on timing of commit). This is a bit confusing because that 
task didn't really succeed in the sense it didn't write anything.  

I think these tasks should be marked as KILLED or something that is more 
obvious to the user exactly what happened.  it is happened to hit the timing 
where it got a commit denied exception then it shows up as failed and counts 
against your task failures.  It shouldn't count against task failures since 
that failure really doesn't matter.

MapReduce handles these situation so perhaps we can look there for a model.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4224) Support group acls

2016-02-16 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148682#comment-15148682
 ] 

Thomas Graves commented on SPARK-4224:
--

Ok, I understand closing older things but in this instance I would like it to 
stay open. Also it would be nice if you put comment in there stating why it was 
closed.

It is still on our list of todos and its a feature that I would definitely like 
in Spark.  It makes certain things much easier for organizations and it you 
look at many other open source products (hadoop, storm, etc) they all support 
group acls. 

When you work in teams (with 10-30 people) its much easier to just add groups 
to the acls then list out individual users.  

> Support group acls
> --
>
> Key: SPARK-4224
> URL: https://issues.apache.org/jira/browse/SPARK-4224
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> Currently we support view and modify acls but you have to specify a list of 
> users. It would be nice to also support groups, so that anyone in the group 
> has permissions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-16 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148662#comment-15148662
 ] 

Thomas Graves commented on SPARK-12316:
---

I'm not following how this ended up in an infinite loop.  Can you please 
describe exactly what you are seeing?

for instance, shutdown is happening you happen to hit 
updateCredentialsIfRequired.  But if the File isn't found you would get an 
exception and fall back to schedule it an hour later in the catch NonFatal.  If 
the stop was already called then delegationTokenRenewer.shutdown() should have 
happened and I assume schedule would have thrown (perhaps I'm wrong here).

 

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-4224) Support group acls

2016-02-16 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reopened SPARK-4224:
--

> Support group acls
> --
>
> Key: SPARK-4224
> URL: https://issues.apache.org/jira/browse/SPARK-4224
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> Currently we support view and modify acls but you have to specify a list of 
> users. It would be nice to also support groups, so that anyone in the group 
> has permissions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4224) Support group acls

2016-02-16 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148627#comment-15148627
 ] 

Thomas Graves commented on SPARK-4224:
--

why did you close this?

> Support group acls
> --
>
> Key: SPARK-4224
> URL: https://issues.apache.org/jira/browse/SPARK-4224
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> Currently we support view and modify acls but you have to specify a list of 
> users. It would be nice to also support groups, so that anyone in the group 
> has permissions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13124) Adding JQuery DataTables messed up the Web UI css and js

2016-02-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13124.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Adding JQuery DataTables messed up the Web UI css and js
> 
>
> Key: SPARK-13124
> URL: https://issues.apache.org/jira/browse/SPARK-13124
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Alex Bozarth
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
> Attachments: css_issue.png, js_issue.png
>
>
> With the addition of JQuery DataTables in SPARK-10873 all the old tables are 
> using the new DataTables css instead of the old css. Though we most likely 
> want to switch over completely to DataTables eventually, we should still keep 
> the old tables UI.
> Also when you open up Web Inspector all pages in the WebUI throw an 
> jsonFormatter.min.js.map not found error. This file was not included in the 
> update and seems to be required to use Web Inspector on the new js file 
> (Error doesn't affect actual use)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13124) Adding JQuery DataTables messed up the Web UI css and js

2016-02-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-13124:
--
Assignee: Alex Bozarth

> Adding JQuery DataTables messed up the Web UI css and js
> 
>
> Key: SPARK-13124
> URL: https://issues.apache.org/jira/browse/SPARK-13124
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Alex Bozarth
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
> Attachments: css_issue.png, js_issue.png
>
>
> With the addition of JQuery DataTables in SPARK-10873 all the old tables are 
> using the new DataTables css instead of the old css. Though we most likely 
> want to switch over completely to DataTables eventually, we should still keep 
> the old tables UI.
> Also when you open up Web Inspector all pages in the WebUI throw an 
> jsonFormatter.min.js.map not found error. This file was not included in the 
> update and seems to be required to use Web Inspector on the new js file 
> (Error doesn't affect actual use)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13126) History Server page always has horizontal scrollbar

2016-02-10 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13126.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> History Server page always has horizontal scrollbar
> ---
>
> Key: SPARK-13126
> URL: https://issues.apache.org/jira/browse/SPARK-13126
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Alex Bozarth
>Assignee: Zhuo Liu
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: page_width.png
>
>
> The new History Server page table is always wider than the page no matter how 
> much larger you make the window. Most likely an odd CSS error, doesn't seem 
> to be to be a simple fix when manipulating the css using the Web Inspector



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13126) History Server page always has horizontal scrollbar

2016-02-10 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-13126:
--
Assignee: Zhuo Liu

> History Server page always has horizontal scrollbar
> ---
>
> Key: SPARK-13126
> URL: https://issues.apache.org/jira/browse/SPARK-13126
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Alex Bozarth
>Assignee: Zhuo Liu
>Priority: Minor
> Attachments: page_width.png
>
>
> The new History Server page table is always wider than the page no matter how 
> much larger you make the window. Most likely an odd CSS error, doesn't seem 
> to be to be a simple fix when manipulating the css using the Web Inspector



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-13163) Column width on new History Server DataTables not getting set correctly

2016-02-10 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-13163.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Column width on new History Server DataTables not getting set correctly
> ---
>
> Key: SPARK-13163
> URL: https://issues.apache.org/jira/browse/SPARK-13163
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0
>Reporter: Alex Bozarth
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: page_width_fixed.png, width_long_name.png
>
>
> The column width on the DataTable UI for the History Server is being set for 
> all entries in the table not just the current page. This means if there is 
> even one App with a long name in your history the table will look really odd 
> as seen below.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11316) isEmpty before coalesce seems to cause huge performance issue in setupGroups

2016-02-09 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15139658#comment-15139658
 ] 

Thomas Graves commented on SPARK-11316:
---

So we ran into this again, here is the scenario and what is happening:

UnionRDD is being coalesced.  The UnionRDD is made up of mapPartitionRDD with 
not preferred locations and a checkpointedRDD with preferred locations.

Its coalescing to a > number of partitions but its not using shuffle so its 
going to coalesce to same number of partitions.  The UnionRDD has 2 Rdd's, one 
with 1020 in MapPartitionsRDD and 960 in CheckPointedRDD, thus its coalescing 
from 1980 to 1980.   It goes into the setupGroups called to setup 1980 groups, 
but since the MapPartitionsRDD doesn't have preferred locations it only has 960 
actual preferred locations.  It goes through the first while loop and create 
partitionsGroups for each of the hosts possible until it hits expectedCoupons2 
number. In this has it hits 1661, so it created groups for 1661 of 1980 and a 
bunch of those groups got partitions assigned (out of the 960).

It then enters the second while loop to go through the rest of the 
1980-1661=319 groups it needs.  Here though for each of the 319 iterations it 
goes into the inner while loop while (!addPartToPGroup(nxt_part, pgroup) && 
tries < targetLen) trying to add a partition to each group.  In this case since 
there are less partitions then groups it ends up walking through targetLen 
almost all of the times and never adding a partition to the group because all 
the partitions are already assigned to groups (because we only have 960 
partitions to put into 1980 groups).  The entire process of 319 * 1980 tries 
takes over 15 minutes (3 seconds per 319 interation).

> isEmpty before coalesce seems to cause huge performance issue in setupGroups
> 
>
> Key: SPARK-11316
> URL: https://issues.apache.org/jira/browse/SPARK-11316
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Critical
>
> So I haven't fully debugged this yet but reporting what I'm seeing and think 
> might be going on.
> I have a graph processing job that is seeing huge slow down in setupGroups in 
> the location iterator where its getting the preferred locations for the 
> coalesce.  They are coalescing from 2400 down to 1200 and its taking 17+ 
> hours to do the calculation.  Killed it at this point so don't know total 
> time.
> It appears that the job is doing an isEmpty call, a bunch of other 
> transformation, then a coalesce (where it takes so long), other 
> transformations, then finally a count to trigger it.   
> It appears that there is only one node that its finding in the setupGroup 
> call and to get to that node it has to first to through the while loop:
> while (numCreated < targetLen && tries < expectedCoupons2) {
> where expectedCoupons2 is around 19000.  It finds very few or none in this 
> loop.  
> Then it does the second loop:
> while (numCreated < targetLen) {  // if we don't have enough partition 
> groups, create duplicates
>   var (nxt_replica, nxt_part) = rotIt.next()
>   val pgroup = PartitionGroup(nxt_replica)
>   groupArr += pgroup
>   groupHash.getOrElseUpdate(nxt_replica, ArrayBuffer()) += pgroup
>   var tries = 0
>   while (!addPartToPGroup(nxt_part, pgroup) && tries < targetLen) { // 
> ensure at least one part
> nxt_part = rotIt.next()._2
> tries += 1
>   }
>   numCreated += 1
> }
> Where it has an inner while loop and both of those are going 1200 times.  
> 1200*1200 loops.  This is taking a very long time.
> The user can work around the issue by adding in a count() call very close to 
> after the isEmpty call before the coalesce is called.  I also tried putting 
> in a take(1)  right before the isEmpty call and it seems to work around 
> the issue, took 1 hours with the take vs a few minutes with the count().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-08 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137000#comment-15137000
 ] 

Thomas Graves commented on SPARK-12316:
---

you say "endless cycle call" do you mean the application master hangs?  It 
seems like it should throw and if the application is done it should just exit 
anyway since the AM is just calling stop on it.I just want to clarify what 
is happening because I assume even if you wait a minute you could still hit the 
same condition once when its tearing down.

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2016-02-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-12316:
--
Assignee: SaintBacchus

> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>Assignee: SaintBacchus
> Attachments: 20151210045149.jpg, 20151210045533.jpg
>
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.
> !https://issues.apache.org/jira/secure/attachment/12779495/20151210045149.jpg!
> !https://issues.apache.org/jira/secure/attachment/12779496/20151210045533.jpg!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10873) can't sort columns on history page

2016-01-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-10873.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> can't sort columns on history page
> --
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
> Fix For: 2.0.0
>
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10873) Change history to use datatables to support sorting columns and searching

2016-01-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-10873:
--
Summary: Change history to use datatables to support sorting columns and 
searching  (was: Change history table to use datatables to support sorting 
columns and searching)

> Change history to use datatables to support sorting columns and searching
> -
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
> Fix For: 2.0.0
>
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10873) Change history table to use datatables to support sorting columns and searching

2016-01-29 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-10873:
--
Summary: Change history table to use datatables to support sorting columns 
and searching  (was: can't sort columns on history page)

> Change history table to use datatables to support sorting columns and 
> searching
> ---
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
> Fix For: 2.0.0
>
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0

2016-01-29 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123498#comment-15123498
 ] 

Thomas Graves commented on SPARK-3374:
--

The Drive and AM configs should not be combined.   In client mode they are 
completely separate processes and need to be allowed to be configured 
separately.  If there are things we can do to make things more clear I'm all 
for that.

> Spark on Yarn remove deprecated configs for 2.0
> ---
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13064) api/v1/application/jobs/attempt lacks "attempId" field for spark-shell

2016-01-28 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122077#comment-15122077
 ] 

Thomas Graves commented on SPARK-13064:
---

thanks [~vanzin]  We can just have the rest api add it as a 1 if it doesn't 
exist, unless you know a reason we shouldn't do this?

>From an api point of view I would prefer to see it always return the attempt 
>id and for client mode it just always 1.

> api/v1/application/jobs/attempt lacks "attempId" field for spark-shell
> --
>
> Key: SPARK-13064
> URL: https://issues.apache.org/jira/browse/SPARK-13064
> Project: Spark
>  Issue Type: Improvement
>Reporter: Zhuo Liu
>Priority: Minor
>
> For any application launches with spark-shell will not have attemptId field 
> in their rest API. From the REST API point of view, we might want to force an 
> Id for it, i.e., "1".
> {code}
> {
>   "id" : "application_1453789230389_377545",
>   "name" : "PySparkShell",
>   "attempts" : [ {
> "startTime" : "2016-01-28T02:17:11.035GMT",
> "endTime" : "2016-01-28T02:30:01.355GMT",
> "lastUpdated" : "2016-01-28T02:30:01.516GMT",
> "duration" : 770320,
> "sparkUser" : "huyng",
> "completed" : true
>   } ]
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles

2016-01-28 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-1239:


Assignee: Thomas Graves

> Don't fetch all map output statuses at each reducer during shuffles
> ---
>
> Key: SPARK-1239
> URL: https://issues.apache.org/jira/browse/SPARK-1239
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Patrick Wendell
>Assignee: Thomas Graves
>
> Instead we should modify the way we fetch map output statuses to take both a 
> mapper and a reducer - or we should just piggyback the statuses on each task. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10911) Executors should System.exit on clean shutdown

2016-01-26 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-10911.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Executors should System.exit on clean shutdown
> --
>
> Key: SPARK-10911
> URL: https://issues.apache.org/jira/browse/SPARK-10911
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Minor
> Fix For: 2.0.0
>
>
> Executors should call System.exit on clean shutdown to make sure all user 
> threads exit and jvm shuts down.
> We ran into a case where an Executor was left around for days trying to 
> shutdown because the user code was using a non-daemon thread pool and one of 
> those threads wasn't exiting.  We should force the jvm to go away with 
> System.exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1832) Executor UI improvement suggestions

2016-01-25 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116057#comment-15116057
 ] 

Thomas Graves commented on SPARK-1832:
--

I think they meant the driver.  I'll mark this as done.

> Executor UI improvement suggestions
> ---
>
> Key: SPARK-1832
> URL: https://issues.apache.org/jira/browse/SPARK-1832
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
> Fix For: 2.0.0
>
>
> I received some suggestions from a user for the /executors UI page to make it 
> more helpful. This gets more important when you have a really large number of 
> executors.
>  Fill some of the cells with color in order to make it easier to absorb 
> the info, e.g.
> RED if Failed Tasks greater than 0 (maybe the more failed, the more intense 
> the red)
> GREEN if Active Tasks greater than 0 (maybe more intense the larger the 
> number)
> Possibly color code COMPLETE TASKS using various shades of blue (e.g., based 
> on the log(# completed)
> - if dark blue then write the value in white (same for the RED and GREEN above
> Maybe mark the MASTER task somehow
>  
> Report the TOTALS in each column (do this at the TOP so no need to scroll 
> to the bottom, or print both at top and bottom).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-1832) Executor UI improvement suggestions

2016-01-25 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-1832.
--
   Resolution: Fixed
 Assignee: Alex Bozarth
Fix Version/s: 2.0.0

> Executor UI improvement suggestions
> ---
>
> Key: SPARK-1832
> URL: https://issues.apache.org/jira/browse/SPARK-1832
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> I received some suggestions from a user for the /executors UI page to make it 
> more helpful. This gets more important when you have a really large number of 
> executors.
>  Fill some of the cells with color in order to make it easier to absorb 
> the info, e.g.
> RED if Failed Tasks greater than 0 (maybe the more failed, the more intense 
> the red)
> GREEN if Active Tasks greater than 0 (maybe more intense the larger the 
> number)
> Possibly color code COMPLETE TASKS using various shades of blue (e.g., based 
> on the log(# completed)
> - if dark blue then write the value in white (same for the RED and GREEN above
> Maybe mark the MASTER task somehow
>  
> Report the TOTALS in each column (do this at the TOP so no need to scroll 
> to the bottom, or print both at top and bottom).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12149) Executor UI improvement suggestions - Color UI

2016-01-25 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-12149.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

> Executor UI improvement suggestions - Color UI
> --
>
> Key: SPARK-12149
> URL: https://issues.apache.org/jira/browse/SPARK-12149
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Reporter: Alex Bozarth
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> Splitting off the Color UI portion of the parent UI improvements task, 
> description copied below:
> Fill some of the cells with color in order to make it easier to absorb the 
> info, e.g.
> RED if Failed Tasks greater than 0 (maybe the more failed, the more intense 
> the red)
> GREEN if Active Tasks greater than 0 (maybe more intense the larger the 
> number)
> Possibly color code COMPLETE TASKS using various shades of blue (e.g., based 
> on the log(# completed)
> if dark blue then write the value in white (same for the RED and GREEN above
> Merging another idea from SPARK-2132: 
> Color GC time red when over a percentage of task time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3611) Show number of cores for each executor in application web UI

2016-01-25 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115309#comment-15115309
 ] 

Thomas Graves commented on SPARK-3611:
--

I know the pull request was closed due to not being able to reliably get this 
information, it looks like its now available through ExecutorInfo structure.

> Show number of cores for each executor in application web UI
> 
>
> Key: SPARK-3611
> URL: https://issues.apache.org/jira/browse/SPARK-3611
> Project: Spark
>  Issue Type: New Feature
>  Components: Web UI
>Affects Versions: 1.0.0
>Reporter: Matei Zaharia
>Priority: Minor
>  Labels: starter
>
> This number is not always fully known, because e.g. in Mesos your executors 
> can scale up and down in # of CPUs, but it would be nice to show at least the 
> number of cores the machine has in that case, or the # of cores the executor 
> has been configured with if known.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10911) Executors should System.exit on clean shutdown

2016-01-25 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15115225#comment-15115225
 ] 

Thomas Graves commented on SPARK-10911:
---

see the pull request for comments and discussion 
https://github.com/apache/spark/pull/9946

> Executors should System.exit on clean shutdown
> --
>
> Key: SPARK-10911
> URL: https://issues.apache.org/jira/browse/SPARK-10911
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Minor
>
> Executors should call System.exit on clean shutdown to make sure all user 
> threads exit and jvm shuts down.
> We ran into a case where an Executor was left around for days trying to 
> shutdown because the user code was using a non-daemon thread pool and one of 
> those threads wasn't exiting.  We should force the jvm to go away with 
> System.exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11806) Spark 2.0 deprecations and removals

2016-01-20 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108961#comment-15108961
 ] 

Thomas Graves commented on SPARK-11806:
---

I added a task to remove the deprecated yarn configs, especially the old env 
variables. 

Do we have anything around this for core in general?

> Spark 2.0 deprecations and removals
> ---
>
> Key: SPARK-11806
> URL: https://issues.apache.org/jira/browse/SPARK-11806
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>  Labels: releasenotes
>
> This is an umbrella ticket to track things we are deprecating and removing in 
> Spark 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0

2016-01-20 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-3374:
-
Parent Issue: SPARK-11806  (was: SPARK-3492)

> Spark on Yarn remove deprecated configs for 2.0
> ---
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3374) Spark on Yarn remove deprecated configs for 2.0

2016-01-20 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-3374:
-
Summary: Spark on Yarn remove deprecated configs for 2.0  (was: Spark on 
Yarn config cleanup)

> Spark on Yarn remove deprecated configs for 2.0
> ---
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12930) NullPointerException running hive query with array dereference in select and where clause

2016-01-20 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108871#comment-15108871
 ] 

Thomas Graves commented on SPARK-12930:
---

Note that change the query to remove the ['pos'] from info['pos'] int he select 
part of the command works around the issue.

Stack trace from exception:
 java.lang.NullPointerException
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1296)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1284)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1283)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1283)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1509)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1471)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)

> NullPointerException running hive query with array dereference in select and 
> where clause
> -
>
> Key: SPARK-12930
> URL: https://issues.apache.org/jira/browse/SPARK-12930
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Thomas Graves
>
> I had a user doing a hive query from spark where they had a array dereference 
> in the select clause and in the where clause, it gave the user a 
> NullPointerException when the where clause should have filtered it out.  Its 
> like spark is evaluating the select part before running the where clause.  
> The info['pos'] below is what caused the issue:
> Query looked like:
> SELECT foo, 
> info['pos'] AS pos 
> FROM db.table
> WHERE date >= '$initialDate' AND
> date <= '$finalDate' AND
> info is not null AND
> info['pos'] is not null
> LIMIT 10 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12930) NullPointerException running hive query with array dereference in select and where clause

2016-01-20 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15108871#comment-15108871
 ] 

Thomas Graves edited comment on SPARK-12930 at 1/20/16 4:41 PM:


Note that change the query to remove the ['pos'] from info['pos'] int he select 
part of the command works around the issue.

Stack trace from exception:
 java.lang.NullPointerException
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1296)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1284)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1283)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1283)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1509)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1471)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1460)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1824)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1007)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.reduce(RDD.scala:989)
at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1370)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:310)
at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1357)
at 
org.apache.spark.sql.execution.TakeOrderedAndProject.collectData(basicOperators.scala:257)
at 
org.apache.spark.sql.execution.TakeOrderedAndProject.executeCollect(basicOperators.scala:263)
at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at org.apache.spark.sql.DataFrame$$anonfun$collect$1.apply(DataFrame.scala:1385)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:1903)
at org.apache.spark.sql.DataFrame.collect(DataFrame.scala:1384)
at com.yahoo.corp.sparktests.SparkTests$.main(SparkTests.scala:80)
at com.yahoo.corp.sparktests.SparkTests.main(SparkTests.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:685)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


was (Author: tgraves):
Note that change the query to remove the ['pos'] from info['pos'] int he select 
part of the command works around the issue.

Stack trace from exception:
 java.lang.NullPointerException
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1296)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1284)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1283)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1283)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$han

[jira] [Created] (SPARK-12930) NullPointerException running hive query with array dereference in select and where clause

2016-01-20 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-12930:
-

 Summary: NullPointerException running hive query with array 
dereference in select and where clause
 Key: SPARK-12930
 URL: https://issues.apache.org/jira/browse/SPARK-12930
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.2
Reporter: Thomas Graves


I had a user doing a hive query from spark where they had a array dereference 
in the select clause and in the where clause, it gave the user a 
NullPointerException when the where clause should have filtered it out.  Its 
like spark is evaluating the select part before running the where clause.  The 
info['pos'] below is what caused the issue:

Query looked like:
SELECT foo, 
info['pos'] AS pos 
FROM db.table
WHERE date >= '$initialDate' AND
date <= '$finalDate' AND
info is not null AND
info['pos'] is not null
LIMIT 10 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6166) Add config to limit number of concurrent outbound connections for shuffle fetch

2016-01-19 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-6166:
-
Assignee: (was: Shixiong Zhu)

> Add config to limit number of concurrent outbound connections for shuffle 
> fetch
> ---
>
> Key: SPARK-6166
> URL: https://issues.apache.org/jira/browse/SPARK-6166
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.4.0
>Reporter: Mridul Muralidharan
>Priority: Minor
>
> spark.reducer.maxMbInFlight puts a bound on the in flight data in terms of 
> size.
> But this is not always sufficient : when the number of hosts in the cluster 
> increase, this can lead to very large number of in-bound connections to one 
> more nodes - causing workers to fail under the load.
> I propose we also add a spark.reducer.maxReqsInFlight - which puts a bound on 
> number of outstanding outbound connections.
> This might still cause hotspots in the cluster, but in our tests this has 
> significantly reduced the occurance of worker failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1747) check for Spark on Yarn ApplicationMaster split brain

2016-01-19 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106784#comment-15106784
 ] 

Thomas Graves commented on SPARK-1747:
--

This should stay open. 

> check for Spark on Yarn ApplicationMaster split brain
> -
>
> Key: SPARK-1747
> URL: https://issues.apache.org/jira/browse/SPARK-1747
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.0.0
>Reporter: Thomas Graves
>
> On yarn there is a possibility that applications can end up with an issue 
> referred to as "split brain".  This problem is that you have one Application 
> Master running, something happens like a network split that the AM can no 
> longer talk to the ResourceManager. After some time the ResourceManager will 
> start a new application attempt assuming the old one failed and you end up 
> with 2 application masters.  Note the network split could prevent it from 
> talking to the RM but it could still be running along contacting regular 
> executors. 
> If the previous AM does not need any more resources from the RM it could try 
> to commit. This could cause lots of problems where the second AM finishes and 
> tries to commit too. This could potentially result in data corruption.
> I believe this same issue can happen on Spark since its using the hadoop 
> output formats.  One instance that has this issue is the FileOutputCommitter. 
>  It first writes to a temporary directory (task commit) and then  moves the 
> file to the final directory (job commit).  The first AM could finish the job 
> commit, tell the user its done, the user starts another down stream job, but 
> then the second AM comes in to do the job commit and files the down stream 
> job are processing could disappear until the second AM finishes the job 
> commit. 
> This was fixed in MR by https://issues.apache.org/jira/browse/MAPREDUCE-4832



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3374) Spark on Yarn config cleanup

2016-01-19 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106766#comment-15106766
 ] 

Thomas Graves commented on SPARK-3374:
--

I think with 2.0 we should actually just remove a bunch of the old configs and 
that would clean it up quite a bit.

> Spark on Yarn config cleanup
> 
>
> Key: SPARK-3374
> URL: https://issues.apache.org/jira/browse/SPARK-3374
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>
> The configs in yarn have gotten scattered and inconsistent between cluster 
> and client modes and supporting backwards compatibility.  We should try to 
> clean this up, move things to common places and support configs across both 
> cluster and client modes where we want to make them public.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12149) Executor UI improvement suggestions - Color UI

2016-01-15 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-12149:
--
Assignee: Alex Bozarth

> Executor UI improvement suggestions - Color UI
> --
>
> Key: SPARK-12149
> URL: https://issues.apache.org/jira/browse/SPARK-12149
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Reporter: Alex Bozarth
>Assignee: Alex Bozarth
>
> Splitting off the Color UI portion of the parent UI improvements task, 
> description copied below:
> Fill some of the cells with color in order to make it easier to absorb the 
> info, e.g.
> RED if Failed Tasks greater than 0 (maybe the more failed, the more intense 
> the red)
> GREEN if Active Tasks greater than 0 (maybe more intense the larger the 
> number)
> Possibly color code COMPLETE TASKS using various shades of blue (e.g., based 
> on the log(# completed)
> if dark blue then write the value in white (same for the RED and GREEN above
> Merging another idea from SPARK-2132: 
> Color GC time red when over a percentage of task time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12716) Executor UI improvement suggestions - Totals

2016-01-15 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-12716.
---
   Resolution: Fixed
 Assignee: Alex Bozarth
Fix Version/s: 2.0.0

> Executor UI improvement suggestions - Totals
> 
>
> Key: SPARK-12716
> URL: https://issues.apache.org/jira/browse/SPARK-12716
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Reporter: Alex Bozarth
>Assignee: Alex Bozarth
> Fix For: 2.0.0
>
>
> Splitting off the Totals portion of the parent UI improvements task, 
> description copied below:
> I received some suggestions from a user for the /executors UI page to make it 
> more helpful. This gets more important when you have a really large number of 
> executors.
> ...
> Report the TOTALS in each column (do this at the TOP so no need to scroll to 
> the bottom, or print both at top and bottom).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12784) Spark UI IndexOutOfBoundsException with dynamic allocation

2016-01-12 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-12784:
-

 Summary: Spark UI IndexOutOfBoundsException with dynamic allocation
 Key: SPARK-12784
 URL: https://issues.apache.org/jira/browse/SPARK-12784
 Project: Spark
  Issue Type: Bug
  Components: Web UI, YARN
Affects Versions: 1.5.2
Reporter: Thomas Graves


Trying to load the web UI Executors page when using dynamic allocation running 
on yarn can lead to an IndexOutOfBoundsException Exception.

I'm assuming the number of executors is changing as its trying to be loaded 
which is causing this as during this time it was letting executors go.

HTTP ERROR 500

Problem accessing /executors/. Reason:

Server Error

Caused by:

java.lang.IndexOutOfBoundsException: 1058
at 
scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:52)
at scala.collection.immutable.Stream.apply(Stream.scala:185)
at 
org.apache.spark.ui.exec.ExecutorsPage$.getExecInfo(ExecutorsPage.scala:180)
at 
org.apache.spark.ui.exec.ExecutorsPage$$anonfun$11.apply(ExecutorsPage.scala:60)
at 
org.apache.spark.ui.exec.ExecutorsPage$$anonfun$11.apply(ExecutorsPage.scala:59)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.immutable.Range.foreach(Range.scala:141)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at org.apache.spark.ui.exec.ExecutorsPage.render(ExecutorsPage.scala:59)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79)
at org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:79)
at org.apache.spark.ui.JettyUtils$$anon$1.doGet(JettyUtils.scala:69)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at 
org.spark-project.jetty.servlet.ServletHolder.handle(ServletHolder.java:684)
at 
org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1496)
at 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:164)
at 
org.spark-project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1467)
at 
org.spark-project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.spark-project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.spark-project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.spark-project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.spark-project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.spark-project.jetty.server.handler.GzipHandler.handle(GzipHandler.java:264)
at 
org.spark-project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.spark-project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.spark-project.jetty.server.Server.handle(Server.java:370)
at 
org.spark-project.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:494)
at 
org.spark-project.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:971)
at 
org.spark-project.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1033)
at 
org.spark-project.jetty.http.HttpParser.parseNext(HttpParser.java:644)
at 
org.spark-project.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.spark-project.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at 
org.spark-project.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
at 
org.spark-project.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at 
org.spark-project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.spark-project.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)


Simply reloading eventually gets the ui to come up so its not a blocker but not 
a very friendly experience either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes

2016-01-11 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091965#comment-15091965
 ] 

Thomas Graves commented on SPARK-2930:
--

I think simply putting a webhdfs url in the examples should be good here.

> clarify docs on using webhdfs with spark.yarn.access.namenodes
> --
>
> Key: SPARK-2930
> URL: https://issues.apache.org/jira/browse/SPARK-2930
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
>
> The documentation of spark.yarn.access.namenodes talks about putting 
> namenodes in it and gives example with hdfs://.  
> I can also be used with webhdfs so we should clarify how to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes

2016-01-11 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15091941#comment-15091941
 ] 

Thomas Graves commented on SPARK-2930:
--

I think we should still document this.  Its a one line change I'll try to get 
something up today

> clarify docs on using webhdfs with spark.yarn.access.namenodes
> --
>
> Key: SPARK-2930
> URL: https://issues.apache.org/jira/browse/SPARK-2930
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
>
> The documentation of spark.yarn.access.namenodes talks about putting 
> namenodes in it and gives example with hdfs://.  
> I can also be used with webhdfs so we should clarify how to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-2930) clarify docs on using webhdfs with spark.yarn.access.namenodes

2016-01-11 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reopened SPARK-2930:
--

> clarify docs on using webhdfs with spark.yarn.access.namenodes
> --
>
> Key: SPARK-2930
> URL: https://issues.apache.org/jira/browse/SPARK-2930
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, YARN
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Minor
>
> The documentation of spark.yarn.access.namenodes talks about putting 
> namenodes in it and gives example with hdfs://.  
> I can also be used with webhdfs so we should clarify how to use it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop

2016-01-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-12654.
---
Resolution: Fixed
  Assignee: Thomas Graves  (was: Apache Spark)

> sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
> -
>
> Key: SPARK-12654
> URL: https://issues.apache.org/jira/browse/SPARK-12654
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 1.6.1, 2.0.0
>
>
> On a secure hadoop cluster using pyspark or spark-shell in yarn client mode 
> with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute.  
> Then try to use:
> val files =  sc.wholeTextFiles("dir") 
> files.collect()
> and it fails with:
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation 
> Token can be issued only with kerberos or web authentication
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
>  
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> at org.apache.hadoop.ipc.Client.call(Client.java:1382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434)
> at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
> at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242)
> at 
> org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55)
> at 
> org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For addit

[jira] [Closed] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop

2016-01-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves closed SPARK-12654.
-
   Resolution: Fixed
Fix Version/s: 2.0.0
   1.6.1

> sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
> -
>
> Key: SPARK-12654
> URL: https://issues.apache.org/jira/browse/SPARK-12654
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 1.6.1, 2.0.0
>
>
> On a secure hadoop cluster using pyspark or spark-shell in yarn client mode 
> with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute.  
> Then try to use:
> val files =  sc.wholeTextFiles("dir") 
> files.collect()
> and it fails with:
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation 
> Token can be issued only with kerberos or web authentication
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
>  
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> at org.apache.hadoop.ipc.Client.call(Client.java:1382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434)
> at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
> at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242)
> at 
> org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55)
> at 
> org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additio

[jira] [Reopened] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop

2016-01-08 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reopened SPARK-12654:
---

> sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
> -
>
> Key: SPARK-12654
> URL: https://issues.apache.org/jira/browse/SPARK-12654
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Fix For: 1.6.1, 2.0.0
>
>
> On a secure hadoop cluster using pyspark or spark-shell in yarn client mode 
> with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute.  
> Then try to use:
> val files =  sc.wholeTextFiles("dir") 
> files.collect()
> and it fails with:
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation 
> Token can be issued only with kerberos or web authentication
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
>  
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> at org.apache.hadoop.ipc.Client.call(Client.java:1382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434)
> at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
> at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242)
> at 
> org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55)
> at 
> org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12713) UI Executor page should keep links around to executors that died

2016-01-08 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089458#comment-15089458
 ] 

Thomas Graves commented on SPARK-12713:
---

I mean while the job is still running and an executor dies I want to be able to 
see the logs and stats for it immediately while its still running.

I also don't believe the history server properly shows you all executors that 
have been run unless that has changed recently.

> UI Executor page should keep links around to executors that died
> 
>
> Key: SPARK-12713
> URL: https://issues.apache.org/jira/browse/SPARK-12713
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 1.5.2
>Reporter: Thomas Graves
>
> When an executor dies the web ui no longer shows it in the executors page 
> which makes getting to the logs to see what happened very difficult.  I'm 
> running on yarn so not sure if behavior is different in standalone mode.
> We should figure out a way to keep links around to the ones that died so we 
> can show stats and log links.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12713) UI Executor page should keep links around to executors that died

2016-01-08 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-12713:
-

 Summary: UI Executor page should keep links around to executors 
that died
 Key: SPARK-12713
 URL: https://issues.apache.org/jira/browse/SPARK-12713
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 1.5.2
Reporter: Thomas Graves


When an executor dies the web ui no longer shows it in the executors page which 
makes getting to the logs to see what happened very difficult.  I'm running on 
yarn so not sure if behavior is different in standalone mode.

We should figure out a way to keep links around to the ones that died so we can 
show stats and log links.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop

2016-01-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083663#comment-15083663
 ] 

Thomas Graves commented on SPARK-12654:
---

It looks like the version of getConf in HadoopRDD already creates it as a 
JobConf versus a hadoop Configuration.  Not sure why NewHadoopRDD didn't do the 
same.  

[~joshrosen]  Do you know the history on that?

> sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
> -
>
> Key: SPARK-12654
> URL: https://issues.apache.org/jira/browse/SPARK-12654
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> On a secure hadoop cluster using pyspark or spark-shell in yarn client mode 
> with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute.  
> Then try to use:
> val files =  sc.wholeTextFiles("dir") 
> files.collect()
> and it fails with:
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation 
> Token can be issued only with kerberos or web authentication
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
>  
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> at org.apache.hadoop.ipc.Client.call(Client.java:1382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434)
> at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
> at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242)
> at 
> org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55)
> at 
> org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)



--
This message was sent by Atlassian JI

[jira] [Commented] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop

2016-01-05 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083643#comment-15083643
 ] 

Thomas Graves commented on SPARK-12654:
---

So the bug here is that WholeTextFileRDD.getPartitions has:
val conf = getConf
in getConf if the cloneConf=true it creates a new Hadoop Configuration. Then it 
uses that to create a new newJobContext.

The newJobContext will copy credentials around, but credentials are only 
present in a JobConf not in a Hadoop Configuration. So basically when it is 
cloning the hadoop configuration its changing it from a JobConf to 
Configuration and dropping the credentials that were there. NewHadoopRDD just 
uses the conf passed in for the getPartitions (not getConf) which is why it 
works.  

Need to investigate to see if wholeTextfiles should be using conf or if getConf 
needs to change.

> sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
> -
>
> Key: SPARK-12654
> URL: https://issues.apache.org/jira/browse/SPARK-12654
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> On a secure hadoop cluster using pyspark or spark-shell in yarn client mode 
> with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute.  
> Then try to use:
> val files =  sc.wholeTextFiles("dir") 
> files.collect()
> and it fails with:
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation 
> Token can be issued only with kerberos or web authentication
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
>  
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> at org.apache.hadoop.ipc.Client.call(Client.java:1382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434)
> at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
> at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainToken

[jira] [Created] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop

2016-01-05 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-12654:
-

 Summary: sc.wholeTextFiles with spark.hadoop.cloneConf=true fails 
on secure Hadoop
 Key: SPARK-12654
 URL: https://issues.apache.org/jira/browse/SPARK-12654
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.0
Reporter: Thomas Graves


On a secure hadoop cluster using pyspark or spark-shell in yarn client mode 
with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute.  Then 
try to use:
val files =  sc.wholeTextFiles("dir") 
files.collect()
and it fails with:

py4j.protocol.Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token 
can be issued only with kerberos or web authentication
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
 
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1382)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434)
at 
org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
at 
org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242)
at 
org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55)
at 
org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12654) sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop

2016-01-05 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-12654:
-

Assignee: Thomas Graves

> sc.wholeTextFiles with spark.hadoop.cloneConf=true fails on secure Hadoop
> -
>
> Key: SPARK-12654
> URL: https://issues.apache.org/jira/browse/SPARK-12654
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> On a secure hadoop cluster using pyspark or spark-shell in yarn client mode 
> with spark.hadoop.cloneConf=true, start it up and wait for over 1 minute.  
> Then try to use:
> val files =  sc.wholeTextFiles("dir") 
> files.collect()
> and it fails with:
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation 
> Token can be issued only with kerberos or web authentication
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:7365)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:528)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:963)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2096)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2092)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1694)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2090)
>  
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> at org.apache.hadoop.ipc.Client.call(Client.java:1382)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
> at com.sun.proxy.$Proxy12.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:909)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.getDelegationToken(Unknown Source)
> at 
> org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1029)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1434)
> at 
> org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:529)
> at 
> org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:507)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2120)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
> at 
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:242)
> at 
> org.apache.spark.input.WholeTextFileInputFormat.setMinPartitions(WholeTextFileInputFormat.scala:55)
> at 
> org.apache.spark.rdd.WholeTextFileRDD.getPartitions(NewHadoopRDD.scala:304)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-18 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064200#comment-15064200
 ] 

Thomas Graves commented on SPARK-11701:
---

I ran into another instance of this and its when the job has multiple stages, 
if its not the last stage and both speculative tasks finish, they are both 
marked as success.  One of them gets ignored which can leave counts wrong and 
it shows that an executor still has a task.

15/12/18 16:01:08 INFO scheduler.TaskSetManager: Ignoring task-finished event 
for 8.1 in stage 0.0 because task 8 has already completed successfully

In this case the TaskCommit code and DAG scheduler won't handle it, the 
TaskSetManager.handleSuccessfulTask needs to handle it.

> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12384) Allow -Xms to be set differently then -Xmx

2015-12-17 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062068#comment-15062068
 ] 

Thomas Graves commented on SPARK-12384:
---

Yes and that is why this change it meant for the gateway side when users are 
running Spark-shell or anything in YARN client mode.

> Allow -Xms to be set differently then -Xmx
> --
>
> Key: SPARK-12384
> URL: https://issues.apache.org/jira/browse/SPARK-12384
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit, YARN
>Affects Versions: 1.6.0
>Reporter: Thomas Graves
>
> Currently Spark automatically sets the -Xms parameter to be the same as the 
> -Xmx parameter. We should allow the user to set this separately.
> The main use case here is if I'm running the spark-shell on a shared gateway. 
> Many users specify a larger memory size then needed and will never use that 
> much memory, so all its doing is preventing other users from potentially 
> using that memory.  Allowing it to be less is just more multi-tenant friendly.
> I think it makes sense to leave this for cluster mode, although if a user 
> really wants to override I don't see why we shouldn't let them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12384) Allow -Xms to be set differently then -Xmx

2015-12-16 Thread Thomas Graves (JIRA)
Thomas Graves created SPARK-12384:
-

 Summary: Allow -Xms to be set differently then -Xmx
 Key: SPARK-12384
 URL: https://issues.apache.org/jira/browse/SPARK-12384
 Project: Spark
  Issue Type: Improvement
  Components: Spark Submit, YARN
Affects Versions: 1.6.0
Reporter: Thomas Graves


Currently Spark automatically sets the -Xms parameter to be the same as the 
-Xmx parameter. We should allow the user to set this separately.

The main use case here is if I'm running the spark-shell on a shared gateway. 
Many users specify a larger memory size then needed and will never use that 
much memory, so all its doing is preventing other users from potentially using 
that memory.  Allowing it to be less is just more multi-tenant friendly.

I think it makes sense to leave this for cluster mode, although if a user 
really wants to override I don't see why we shouldn't let them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-03 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038603#comment-15038603
 ] 

Thomas Graves commented on SPARK-11701:
---


Note after investigating this some more part of the problem is that we get 
Success back from a speculative task even though the orginal task passed.  In 
this case the second one didn't commit:

15/12/03 18:49:13 INFO mapred.SparkHadoopMapRedUtil: No need to commit output 
of task because needsTaskCommit=false: attempt_201512031848_0009_m_30_316

Normally these speculative tasks fail with the TaskCommitDenied.  I think it 
makes more sense to mark these as killed, but I think for these particular case 
if instead of just logging we throw the TaskCommitDenied exception then things 
just work (including not counting the task as failure for max number task 
failures).


> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10911) Executors should System.exit on clean shutdown

2015-12-03 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038418#comment-15038418
 ] 

Thomas Graves commented on SPARK-10911:
---

there are other cases this can happen. We've seen it happen on a botched NM 
upgrade. Someone removed the database for running containers during NM rolling 
upgrade so it didn't know the running containers existed.  I think this could 
potentially happen in many ways, whether someone does something bad or bugs in 
RM, standalone, etc..  This should be put in place to make sure the executor 
exits when it should. 

> Executors should System.exit on clean shutdown
> --
>
> Key: SPARK-10911
> URL: https://issues.apache.org/jira/browse/SPARK-10911
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Minor
>
> Executors should call System.exit on clean shutdown to make sure all user 
> threads exit and jvm shuts down.
> We ran into a case where an Executor was left around for days trying to 
> shutdown because the user code was using a non-daemon thread pool and one of 
> those threads wasn't exiting.  We should force the jvm to go away with 
> System.exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10873) can't sort columns on history page

2015-12-02 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-10873:
--
Assignee: Zhuo Liu

> can't sort columns on history page
> --
>
> Key: SPARK-10873
> URL: https://issues.apache.org/jira/browse/SPARK-10873
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>
> Starting with 1.5.1 the history server page isn't allowing sorting by column



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-02 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036395#comment-15036395
 ] 

Thomas Graves commented on SPARK-11701:
---

Also seems related to https://github.com/apache/spark/pull/9288

> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-02 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036366#comment-15036366
 ] 

Thomas Graves commented on SPARK-11701:
---

this looks like a dup of SPARK-9038

> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-1239) Don't fetch all map output statuses at each reducer during shuffles

2015-12-02 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15036341#comment-15036341
 ] 

Thomas Graves commented on SPARK-1239:
--

I have another user hitting this also.  The above mentions other issues that 
need to be addressed in MapOutputStatusTracker do you have links to those other 
issues?

> Don't fetch all map output statuses at each reducer during shuffles
> ---
>
> Key: SPARK-1239
> URL: https://issues.apache.org/jira/browse/SPARK-1239
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Patrick Wendell
>
> Instead we should modify the way we fetch map output statuses to take both a 
> mapper and a reducer - or we should just piggyback the statuses on each task. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-01 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034749#comment-15034749
 ] 

Thomas Graves commented on SPARK-11701:
---

The same issue existing with dynamic allocation in 1.6.  It looks like the 
ExecutorAllocationManager is also getting the onTaskEnd and tracking the tasks 
running on it. Since its not sending it out for these it still thinks it has 
tasks.

> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-01 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-11701:
-

Assignee: Thomas Graves

> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-01 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034625#comment-15034625
 ] 

Thomas Graves commented on SPARK-11701:
---

So it looks like this is a race condition. If the task end for tasks (in this 
case probably because of speculation) comes in after the stage is finished, 
then the DAGScheduler.handleTaskCompletion will skip the task completion event:

   if (!stageIdToStage.contains(task.stageId)) {
  // Skip all the actions if the stage has been cancelled.
  return
}

Since it skips here it never sends out the SparkListenerTaskEnd event and the 
UI is never updated. I'm assuming this also is affecting the dynamic allocation 
stuff too (at least in 1.5). I still have to make sure that still exists in 1.6.

> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-12-01 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034042#comment-15034042
 ] 

Thomas Graves commented on SPARK-11701:
---

tested on latest 1.6 branch and I am no longer seeing the 
TransportResponseHandler exception. I do still see the original issue.

Looking at the logs it seems there is an info message printed near the end on 
tasks that are on executors still showing active tasks. I'm guessing it is 
ignoring this and not doing the accounting properly.

15/12/01 16:35:16 INFO TaskSetManager: Ignoring task-finished event for 25.1 in 
stage 0.0 because task 25 has already completed successfully


> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM

2015-11-30 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031843#comment-15031843
 ] 

Thomas Graves commented on SPARK-4117:
--

[~devaraj.k]  thanks for explaining.  Sounds good on 
ApplicationMasterNotRegisteredException since the AMRMClientImpl is handling 
it.  

For ApplicationAttemptNotFoundException, you hit one of the places that 
allocate is called and that is on the first registration. There is another one 
in the launchReporterThread that regularly gets called after starting.  This is 
the one that catches exceptions and will wait for a number of failures before 
finally exits.  So if ApplicationAttemptNotFoundException is sent anytime after 
the application is running it will hit that logic.  I don't think its that big 
of an issue since it will eventually exit, it could just take a little longer.  
It looks like the only cases that should be thrown is if we have already 
unregistered or something weird happened on RM Where it lost the application.

> Spark on Yarn handle AM being told command from RM
> --
>
> Key: SPARK-4117
> URL: https://issues.apache.org/jira/browse/SPARK-4117
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> In the allocateResponse from the RM it can send commands that the AM should 
> follow. for instance AM_RESYNC and AM_SHUTDOWN.  We should add support for 
> those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10911) Executors should System.exit on clean shutdown

2015-11-24 Thread Thomas Graves (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-10911:
--
Assignee: Zhuo Liu

> Executors should System.exit on clean shutdown
> --
>
> Key: SPARK-10911
> URL: https://issues.apache.org/jira/browse/SPARK-10911
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Assignee: Zhuo Liu
>Priority: Minor
>
> Executors should call System.exit on clean shutdown to make sure all user 
> threads exit and jvm shuts down.
> We ran into a case where an Executor was left around for days trying to 
> shutdown because the user code was using a non-daemon thread pool and one of 
> those threads wasn't exiting.  We should force the jvm to go away with 
> System.exit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4117) Spark on Yarn handle AM being told command from RM

2015-11-24 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024603#comment-15024603
 ] 

Thomas Graves commented on SPARK-4117:
--

[~devaraj.k]  Where is Spark handling the 
ApplicationMasterNotRegisteredException and ApplicationAttemptNotFoundException 
exceptions?  doing a quick look I don't see it doing anything special with 
those.

We do catch exceptions for the allocate call but we just increment failure 
count and try again until we hit max failure count.  Ideally I think on a 
ApplicationMasterNotRegisteredException we would re-register.  And for 
ApplicationAttemptNotFoundException I would think we just immediately shutdown 
rather then trying again.

> Spark on Yarn handle AM being told command from RM
> --
>
> Key: SPARK-4117
> URL: https://issues.apache.org/jira/browse/SPARK-4117
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Thomas Graves
>
> In the allocateResponse from the RM it can send commands that the AM should 
> follow. for instance AM_RESYNC and AM_SHUTDOWN.  We should add support for 
> those.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong

2015-11-13 Thread Thomas Graves (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003977#comment-15003977
 ] 

Thomas Graves commented on SPARK-11701:
---

[~jerryshao] ARe you referring to my last post about 1.6 or the original 
description?  If recent break I can see the 1.6 issue, but I'm guessing not the 
1.5 issue with speculation and tasks staying?



> YARN - dynamic allocation and speculation active task accounting wrong
> --
>
> Key: SPARK-11701
> URL: https://issues.apache.org/jira/browse/SPARK-11701
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.1
>Reporter: Thomas Graves
>Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues 
> with the active task accounting.  The Executor UI still shows active tasks on 
> the an executor but the job/stage is all completed.  I think its also 
> affecting the dynamic allocation being able to release containers because it 
> thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then 
> run just a wordcount on decent sized file and set the speculation parameters 
> low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf 
> spark.speculation.multiplier=0.2 --conf spark.speculation.quantile=0.1 
> --master yarn --deploy-mode client  --executor-memory 4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    8   9   10   11   12   13   14   15   16   17   >