Re: Is Hadoop's TooRunner thread-safe?
Yes, this is the best way to go. Sent from my iPhone5s > On 2014年3月22日, at 3:03, Something Something wrote: > > I will be happy to follow all these steps if someone confirms that this is > the best way to handle it. Seems harmless to me, but just wondering. Thanks. > > >> On Fri, Mar 21, 2014 at 1:26 AM, Bertrand Dechoux wrote: >> JIRA, test, patch and review? I am sure the community would welcome it. And >> if you don't, well, it is unlikely to be appear soon into hadoop trunk. >> >> Bertrand >> >> >>> On Fri, Mar 21, 2014 at 12:49 AM, Something Something >>> wrote: >>> Confirmed that ToolRunner is NOT thread-safe: >>> >>> Original code (which runs into problems): >>> >>> public static int run(Configuration conf, Tool tool, String[] args) >>> throws Exception{ >>> if(conf == null) { >>> conf = new Configuration(); >>> } >>> GenericOptionsParser parser = new GenericOptionsParser(conf, args); >>> //set the configuration back, so that Tool can configure itself >>> tool.setConf(conf); >>> >>> //get the args w/o generic hadoop args >>> String[] toolArgs = parser.getRemainingArgs(); >>> return tool.run(toolArgs); >>> } >>> >>> >>> >>> >>> >>> New code (which works): >>> >>> public static int run(Configuration conf, Tool tool, String[] args) >>> throws Exception{ >>> if(conf == null) { >>> conf = new Configuration(); >>> } >>> GenericOptionsParser parser = getParser(conf, args); >>> >>> tool.setConf(conf); >>> >>> //get the args w/o generic hadoop args >>> String[] toolArgs = parser.getRemainingArgs(); >>> return tool.run(toolArgs); >>> } >>> >>> private static synchronized GenericOptionsParser >>> getParser(Configuration conf, String[] args) throws Exception { >>> return new GenericOptionsParser(conf, args); >>> } >>> >>> >>> >>> >>> >>> On Wed, Mar 19, 2014 at 10:15 AM, Something Something wrote: I would like to trigger a few Hadoop jobs simultaneously. I’ve created a pool of threads using Executors.newFixedThreadPool. Idea is that if the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time using ‘ToolRunner.run’. In my testing, I noticed that these 2 threads keep stepping on each other. When I looked under the hood, I noticed that ToolRunner creates GenericOptionsParser which in turn calls a static method ‘buildGeneralOptions’. This method uses ‘OptionBuilder.withArgName’ which uses an instance variable called, ‘argName’. This doesn’t look thread safe to me and I believe is the root cause of issues I am running into. Any thoughts? >
Re: Is Hadoop's TooRunner thread-safe?
I will be happy to follow all these steps if someone confirms that this is the best way to handle it. Seems harmless to me, but just wondering. Thanks. On Fri, Mar 21, 2014 at 1:26 AM, Bertrand Dechoux wrote: > JIRA, test, patch and review? I am sure the community would welcome it. > And if you don't, well, it is unlikely to be appear soon into hadoop trunk. > > Bertrand > > > On Fri, Mar 21, 2014 at 12:49 AM, Something Something < > mailinglist...@gmail.com> wrote: > >> Confirmed that ToolRunner is NOT thread-safe: >> >> *Original code (which runs into problems):* >> >> public static int run(Configuration conf, Tool tool, String[] args) >> throws Exception{ >> if(conf == null) { >> conf = new Configuration(); >> } >> GenericOptionsParser parser = new GenericOptionsParser(conf, args); >> //set the configuration back, so that Tool can configure itself >> tool.setConf(conf); >> >> //get the args w/o generic hadoop args >> String[] toolArgs = parser.getRemainingArgs(); >> return tool.run(toolArgs); >> } >> >> >> >> >> >> *New code (which works):* >> >> public static int run(Configuration conf, Tool tool, String[] args) >> throws Exception{ >> if(conf == null) { >> conf = new Configuration(); >> } >> GenericOptionsParser parser = getParser(conf, args); >> >> tool.setConf(conf); >> >> //get the args w/o generic hadoop args >> String[] toolArgs = parser.getRemainingArgs(); >> return tool.run(toolArgs); >> } >> >> private static *synchronized *GenericOptionsParser >> getParser(Configuration conf, String[] args) throws Exception { >> return new GenericOptionsParser(conf, args); >> } >> >> >> >> >> >> >> On Wed, Mar 19, 2014 at 10:15 AM, Something Something < >> mailinglist...@gmail.com> wrote: >> >>> I would like to trigger a few Hadoop jobs simultaneously. I've created >>> a pool of threads using Executors.newFixedThreadPool. Idea is that if >>> the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact >>> time using 'ToolRunner.run'. In my testing, I noticed that these 2 >>> threads keep stepping on each other. >>> >>> When I looked under the hood, I noticed that ToolRunner creates >>> GenericOptionsParser which in turn calls a static method >>> 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' >>> which uses an instance variable called, 'argName'. This doesn't look >>> thread safe to me and I believe is the root cause of issues I am running >>> into. >>> >>> Any thoughts? >>> >> >> >
Re: Is Hadoop's TooRunner thread-safe?
JIRA, test, patch and review? I am sure the community would welcome it. And if you don't, well, it is unlikely to be appear soon into hadoop trunk. Bertrand On Fri, Mar 21, 2014 at 12:49 AM, Something Something < mailinglist...@gmail.com> wrote: > Confirmed that ToolRunner is NOT thread-safe: > > *Original code (which runs into problems):* > > public static int run(Configuration conf, Tool tool, String[] args) > throws Exception{ > if(conf == null) { > conf = new Configuration(); > } > GenericOptionsParser parser = new GenericOptionsParser(conf, args); > //set the configuration back, so that Tool can configure itself > tool.setConf(conf); > > //get the args w/o generic hadoop args > String[] toolArgs = parser.getRemainingArgs(); > return tool.run(toolArgs); > } > > > > > > *New code (which works):* > > public static int run(Configuration conf, Tool tool, String[] args) > throws Exception{ > if(conf == null) { > conf = new Configuration(); > } > GenericOptionsParser parser = getParser(conf, args); > > tool.setConf(conf); > > //get the args w/o generic hadoop args > String[] toolArgs = parser.getRemainingArgs(); > return tool.run(toolArgs); > } > > private static *synchronized *GenericOptionsParser > getParser(Configuration conf, String[] args) throws Exception { > return new GenericOptionsParser(conf, args); > } > > > > > > > On Wed, Mar 19, 2014 at 10:15 AM, Something Something < > mailinglist...@gmail.com> wrote: > >> I would like to trigger a few Hadoop jobs simultaneously. I've created >> a pool of threads using Executors.newFixedThreadPool. Idea is that if >> the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact >> time using 'ToolRunner.run'. In my testing, I noticed that these 2 >> threads keep stepping on each other. >> >> When I looked under the hood, I noticed that ToolRunner creates >> GenericOptionsParser which in turn calls a static method >> 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' >> which uses an instance variable called, 'argName'. This doesn't look >> thread safe to me and I believe is the root cause of issues I am running >> into. >> >> Any thoughts? >> > >
Re: Is Hadoop's TooRunner thread-safe?
Confirmed that ToolRunner is NOT thread-safe: *Original code (which runs into problems):* public static int run(Configuration conf, Tool tool, String[] args) throws Exception{ if(conf == null) { conf = new Configuration(); } GenericOptionsParser parser = new GenericOptionsParser(conf, args); //set the configuration back, so that Tool can configure itself tool.setConf(conf); //get the args w/o generic hadoop args String[] toolArgs = parser.getRemainingArgs(); return tool.run(toolArgs); } *New code (which works):* public static int run(Configuration conf, Tool tool, String[] args) throws Exception{ if(conf == null) { conf = new Configuration(); } GenericOptionsParser parser = getParser(conf, args); tool.setConf(conf); //get the args w/o generic hadoop args String[] toolArgs = parser.getRemainingArgs(); return tool.run(toolArgs); } private static *synchronized *GenericOptionsParser getParser(Configuration conf, String[] args) throws Exception { return new GenericOptionsParser(conf, args); } On Wed, Mar 19, 2014 at 10:15 AM, Something Something < mailinglist...@gmail.com> wrote: > I would like to trigger a few Hadoop jobs simultaneously. I've created a > pool of threads using Executors.newFixedThreadPool. Idea is that if the > pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time > using 'ToolRunner.run'. In my testing, I noticed that these 2 threads > keep stepping on each other. > > When I looked under the hood, I noticed that ToolRunner creates > GenericOptionsParser which in turn calls a static method > 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' > which uses an instance variable called, 'argName'. This doesn't look > thread safe to me and I believe is the root cause of issues I am running > into. > > Any thoughts? >
Is Hadoop's TooRunner thread-safe?
I would like to trigger a few Hadoop jobs simultaneously. I've created a pool of threads using Executors.newFixedThreadPool. Idea is that if the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time using 'ToolRunner.run'. In my testing, I noticed that these 2 threads keep stepping on each other. When I looked under the hood, I noticed that ToolRunner creates GenericOptionsParser which in turn calls a static method 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' which uses an instance variable called, 'argName'. This doesn't look thread safe to me and I believe is the root cause of issues I am running into. Any thoughts?
Re: Is Hadoop's TooRunner thread-safe?
Any thoughts on this? Confirm or Deny it's an issue.. may be? On Mon, Mar 17, 2014 at 11:43 AM, Something Something < mailinglist...@gmail.com> wrote: > I would like to trigger a few Hadoop jobs simultaneously. I've created a > pool of threads using Executors.newFixedThreadPool. Idea is that if the > pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time > using 'ToolRunner.run'. In my testing, I noticed that these 2 threads > keep stepping on each other. > > When I looked under the hood, I noticed that ToolRunner creates > GenericOptionsParser which in turn calls a static method > 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' > which uses an instance variable called, 'argName'. This doesn't look > thread safe to me and I believe is the root cause of issues I am running > into. > > Any thoughts? >