Controlling data placement / locality

2016-11-29 Thread Michael Johnson
I'm reading in data from a single file. I do some computations on the data to get good groupings of the data. Future computations in my program operate on a single group at once. (E.g., I might do frequent itemset mining of members within each group.) How do I tell Spark that all members of a

Re: Very long pause/hang at end of execution

2016-11-16 Thread Michael Johnson
On Wed, Nov 16, 2016 at 10:44 AM Aniket Bhatnagar wrote: Thanks for sharing the thread dump. I had a look at them and couldn't find anything unusual. Is there anything in the logs (driver + executor) that suggests what's going on? Also, what does the spark job do

Re: Very long pause/hang at end of execution

2016-11-06 Thread Michael Johnson
dumps (using jstack or jcmd on the local spark JVM process) with 1 second delay between each dump and attach them? I can take a look. Thanks,Aniket On Sun, Nov 6, 2016 at 2:21 PM Michael Johnson <mjjohnson@yahoo.com> wrote: Thanks; I tried looking at the thread dumps for the driver and the

Re: Very long pause/hang at end of execution

2016-11-06 Thread Michael Johnson
thread dump analysis tool. Thanks,Aniket On Sun, Nov 6, 2016 at 1:31 PM Michael Johnson <mjjohnson@yahoo.com.invalid> wrote: I'm doing some processing and then clustering of a small dataset (~150 MB). Everything seems to work fine, until the end; the last few lines of my program are lo

Very long pause/hang at end of execution

2016-11-06 Thread Michael Johnson
I'm doing some processing and then clustering of a small dataset (~150 MB). Everything seems to work fine, until the end; the last few lines of my program are log statements, but after printing those, nothing seems to happen for a long time...many minutes; I'm not usually patient enough to let