what's the dump info by jstack?

Yours, Xuefeng Wu 吴雪峰 敬上

> On 2015年2月6日, at 上午10:20, Michael Albert <m_albert...@yahoo.com.INVALID> 
> wrote:
> 
> My apologies for following up my own post, but I thought this might be of 
> interest.
> 
> I terminated the java process corresponding to executor which had opened the 
> stderr file mentioned below (kill <pid>).
> Then my spark job completed without error (it was actually almost finished).
> 
> Now I am completely confused :-).
> 
> Thanks!
> -Mike
> 
> 
> From: Michael Albert <m_albert...@yahoo.com.INVALID>
> To: "user@spark.apache.org" <user@spark.apache.org> 
> Sent: Thursday, February 5, 2015 9:04 PM
> Subject: Spark stalls or hangs: is this a clue? remote fetches seem to never 
> return?
> 
> Greetings!
> 
> Again, thanks to all who have given suggestions.
> I am still trying to diagnose a problem where I have processes than run for 
> one or several hours but intermittently stall or hang.
> By "stall" I mean that there is no CPU usage on the workers or the driver, 
> nor network activity, nor do I see disk activity.
> It just hangs.
> 
> Using the Application Master to find which workers still had active tasks, I 
> then went to that machine and looked in the user logs.
> In one of the users log's "stderr" files, it ends with "Started 50 remote 
> fetches...."
> Should there be a message saying that the fetch was completed?
> Any suggestions as to how I might diagnose why the fetch was not completed?
> 
> Thanks!
> -Mike
> 
> Here is the last part of the log:
> 15/02/06 01:33:46 INFO storage.MemoryStore: ensureFreeSpace(5368) called with 
> curMem=875861, maxMem=2315649024
> 15/02/06 01:33:46 INFO storage.MemoryStore: Block broadcast_10 stored as 
> values in memory (estimated size 5.2 KB, free 2.2 GB)
> 15/02/06 01:33:46 INFO spark.MapOutputTrackerWorker: Don't have map outputs 
> for shuffle 5, fetching them
> 15/02/06 01:33:46 INFO spark.MapOutputTrackerWorker: Doing the fetch; tracker 
> actor = 
> Actor[akka.tcp://sparkDriver@ip-10-171-0-208.ec2.internal:44124/user/MapOutputTracker#-878402310]
> 15/02/06 01:33:46 INFO spark.MapOutputTrackerWorker: Don't have map outputs 
> for shuffle 5, fetching them
> 15/02/06 01:33:46 INFO spark.MapOutputTrackerWorker: Got the output locations
> 15/02/06 01:33:46 INFO storage.ShuffleBlockFetcherIterator: Getting 300 
> non-empty blocks out of 300 blocks
> 15/02/06 01:33:46 INFO storage.ShuffleBlockFetcherIterator: Getting 300 
> non-empty blocks out of 300 blocks
> 15/02/06 01:33:46 INFO storage.ShuffleBlockFetcherIterator: Started 50 remote 
> fetches in 47 ms
> 15/02/06 01:33:46 INFO storage.ShuffleBlockFetcherIterator: Started 50 remote 
> fetches in 48 ms
> It's been like that for half and hour.
> 
> Thanks!
> -Mike
> 
> 
> 
> 

Reply via email to