By the way, how do I know if my map task is single threaded (ie. one thread executing for each record ) ? and how to change that into multi-threading ?
Thank you, Maha On Mar 12, 2011, at 9:11 PM, Harsh J wrote: > Hello, > > On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim <juneng...@gmail.com> wrote: >> hi, >> >> is a single thread allocated to a single output file when a job is trying to >> write multiple output files? > > At the lower levels, a data streaming thread is indeed run for every > OutputStream created for writing on the DFS. > > The map task is generally single threaded unless you multi-thread the > calls (in which case the record writers are still got in a > synchronized fashion). > >> if counts of output files are 10,000, does a hadoop try to create threads >> for each output file? > > Yes, there should be 10,000 threads 'started' for streaming writes > (but not all really working at the same time, as per the record writer > access methods in tasks). > > Please correct me if I'm wrong. > > -- > Harsh J > www.harshj.com