Re: Is Hadoop's TooRunner thread-safe?

2014-03-21 Thread Bertrand Dechoux
JIRA, test, patch and review? I am sure the community would welcome it. And
if you don't, well, it is unlikely to be appear soon into hadoop trunk.

Bertrand


On Fri, Mar 21, 2014 at 12:49 AM, Something Something 
mailinglist...@gmail.com wrote:

 Confirmed that ToolRunner is NOT thread-safe:

 *Original code (which runs into problems):*

   public static int run(Configuration conf, Tool tool, String[] args)
 throws Exception{
 if(conf == null) {
   conf = new Configuration();
 }
 GenericOptionsParser parser = new GenericOptionsParser(conf, args);
 //set the configuration back, so that Tool can configure itself
 tool.setConf(conf);

 //get the args w/o generic hadoop args
 String[] toolArgs = parser.getRemainingArgs();
 return tool.run(toolArgs);
   }





 *New code (which works):*

 public static int run(Configuration conf, Tool tool, String[] args)
 throws Exception{
 if(conf == null) {
 conf = new Configuration();
 }
 GenericOptionsParser parser = getParser(conf, args);

 tool.setConf(conf);

 //get the args w/o generic hadoop args
 String[] toolArgs = parser.getRemainingArgs();
 return tool.run(toolArgs);
 }

 private static *synchronized *GenericOptionsParser
 getParser(Configuration conf, String[] args) throws Exception {
 return new GenericOptionsParser(conf, args);
 }






 On Wed, Mar 19, 2014 at 10:15 AM, Something Something 
 mailinglist...@gmail.com wrote:

 I would like to trigger a few Hadoop jobs simultaneously.  I've created
 a pool of threads using Executors.newFixedThreadPool.  Idea is that if
 the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact
 time using 'ToolRunner.run'.  In my testing, I noticed that these 2
 threads keep stepping on each other.

 When I looked under the hood, I noticed that ToolRunner creates
 GenericOptionsParser which in turn calls a static method
 'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName'
 which uses an instance variable called, 'argName'.  This doesn't look
 thread safe to me and I believe is the root cause of issues I am running
 into.

 Any thoughts?





Re: Is Hadoop's TooRunner thread-safe?

2014-03-21 Thread Something Something
I will be happy to follow all these steps if someone confirms that this is
the best way to handle it.  Seems harmless to me, but just wondering.
Thanks.


On Fri, Mar 21, 2014 at 1:26 AM, Bertrand Dechoux decho...@gmail.comwrote:

 JIRA, test, patch and review? I am sure the community would welcome it.
 And if you don't, well, it is unlikely to be appear soon into hadoop trunk.

 Bertrand


 On Fri, Mar 21, 2014 at 12:49 AM, Something Something 
 mailinglist...@gmail.com wrote:

 Confirmed that ToolRunner is NOT thread-safe:

 *Original code (which runs into problems):*

   public static int run(Configuration conf, Tool tool, String[] args)
 throws Exception{
 if(conf == null) {
   conf = new Configuration();
 }
 GenericOptionsParser parser = new GenericOptionsParser(conf, args);
 //set the configuration back, so that Tool can configure itself
 tool.setConf(conf);

 //get the args w/o generic hadoop args
 String[] toolArgs = parser.getRemainingArgs();
 return tool.run(toolArgs);
   }





 *New code (which works):*

 public static int run(Configuration conf, Tool tool, String[] args)
 throws Exception{
 if(conf == null) {
 conf = new Configuration();
 }
 GenericOptionsParser parser = getParser(conf, args);

 tool.setConf(conf);

 //get the args w/o generic hadoop args
 String[] toolArgs = parser.getRemainingArgs();
 return tool.run(toolArgs);
 }

 private static *synchronized *GenericOptionsParser
 getParser(Configuration conf, String[] args) throws Exception {
 return new GenericOptionsParser(conf, args);
 }






 On Wed, Mar 19, 2014 at 10:15 AM, Something Something 
 mailinglist...@gmail.com wrote:

 I would like to trigger a few Hadoop jobs simultaneously.  I've created
 a pool of threads using Executors.newFixedThreadPool.  Idea is that if
 the pool size is 2, my code will trigger 2 Hadoop jobs at the same exact
 time using 'ToolRunner.run'.  In my testing, I noticed that these 2
 threads keep stepping on each other.

 When I looked under the hood, I noticed that ToolRunner creates
 GenericOptionsParser which in turn calls a static method
 'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName'
 which uses an instance variable called, 'argName'.  This doesn't look
 thread safe to me and I believe is the root cause of issues I am running
 into.

 Any thoughts?






Re: Is Hadoop's TooRunner thread-safe?

2014-03-21 Thread Azuryy
Yes, this is the best way to go.

Sent from my iPhone5s

 On 2014年3月22日, at 3:03, Something Something mailinglist...@gmail.com wrote:
 
 I will be happy to follow all these steps if someone confirms that this is 
 the best way to handle it.  Seems harmless to me, but just wondering.  Thanks.
 
 
 On Fri, Mar 21, 2014 at 1:26 AM, Bertrand Dechoux decho...@gmail.com wrote:
 JIRA, test, patch and review? I am sure the community would welcome it. And 
 if you don't, well, it is unlikely to be appear soon into hadoop trunk.
 
 Bertrand
 
 
 On Fri, Mar 21, 2014 at 12:49 AM, Something Something 
 mailinglist...@gmail.com wrote:
 Confirmed that ToolRunner is NOT thread-safe:
 
 Original code (which runs into problems):
 
   public static int run(Configuration conf, Tool tool, String[] args) 
 throws Exception{
 if(conf == null) {
   conf = new Configuration();
 }
 GenericOptionsParser parser = new GenericOptionsParser(conf, args);
 //set the configuration back, so that Tool can configure itself
 tool.setConf(conf);
 
 //get the args w/o generic hadoop args
 String[] toolArgs = parser.getRemainingArgs();
 return tool.run(toolArgs);
   }
 
 
 
 
 
 New code (which works):
 
 public static int run(Configuration conf, Tool tool, String[] args)
 throws Exception{
 if(conf == null) {
 conf = new Configuration();
 }
 GenericOptionsParser parser = getParser(conf, args);
 
 tool.setConf(conf);
 
 //get the args w/o generic hadoop args
 String[] toolArgs = parser.getRemainingArgs();
 return tool.run(toolArgs);
 }
 
 private static synchronized GenericOptionsParser 
 getParser(Configuration conf, String[] args) throws Exception {
 return new GenericOptionsParser(conf, args);
 }
 
 
 
 
 
 
 On Wed, Mar 19, 2014 at 10:15 AM, Something Something 
 mailinglist...@gmail.com wrote:
 I would like to trigger a few Hadoop jobs simultaneously.  I’ve created a 
 pool of threads using Executors.newFixedThreadPool.  Idea is that if the 
 pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time 
 using ‘ToolRunner.run’.  In my testing, I noticed that these 2 threads 
 keep stepping on each other.
 
 When I looked under the hood, I noticed that ToolRunner creates 
 GenericOptionsParser which in turn calls a static method 
 ‘buildGeneralOptions’.  This method uses ‘OptionBuilder.withArgName’ which 
 uses an instance variable called, ‘argName’.  This doesn’t look thread 
 safe to me and I believe is the root cause of issues I am running into.
 
 Any thoughts?
 


Re: Is Hadoop's TooRunner thread-safe?

2014-03-20 Thread Something Something
Confirmed that ToolRunner is NOT thread-safe:

*Original code (which runs into problems):*

  public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
  conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);

//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
  }





*New code (which works):*

public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = getParser(conf, args);

tool.setConf(conf);

//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}

private static *synchronized *GenericOptionsParser
getParser(Configuration conf, String[] args) throws Exception {
return new GenericOptionsParser(conf, args);
}






On Wed, Mar 19, 2014 at 10:15 AM, Something Something 
mailinglist...@gmail.com wrote:

 I would like to trigger a few Hadoop jobs simultaneously.  I've created a
 pool of threads using Executors.newFixedThreadPool.  Idea is that if the
 pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time
 using 'ToolRunner.run'.  In my testing, I noticed that these 2 threads
 keep stepping on each other.

 When I looked under the hood, I noticed that ToolRunner creates
 GenericOptionsParser which in turn calls a static method
 'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName'
 which uses an instance variable called, 'argName'.  This doesn't look
 thread safe to me and I believe is the root cause of issues I am running
 into.

 Any thoughts?



Re: Is Hadoop's TooRunner thread-safe?

2014-03-19 Thread Something Something
Any thoughts on this?  Confirm or Deny it's an issue.. may be?


On Mon, Mar 17, 2014 at 11:43 AM, Something Something 
mailinglist...@gmail.com wrote:

 I would like to trigger a few Hadoop jobs simultaneously.  I've created a
 pool of threads using Executors.newFixedThreadPool.  Idea is that if the
 pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time
 using 'ToolRunner.run'.  In my testing, I noticed that these 2 threads
 keep stepping on each other.

 When I looked under the hood, I noticed that ToolRunner creates
 GenericOptionsParser which in turn calls a static method
 'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName'
 which uses an instance variable called, 'argName'.  This doesn't look
 thread safe to me and I believe is the root cause of issues I am running
 into.

 Any thoughts?



Is Hadoop's TooRunner thread-safe?

2014-03-19 Thread Something Something
I would like to trigger a few Hadoop jobs simultaneously.  I've created a
pool of threads using Executors.newFixedThreadPool.  Idea is that if the
pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time
using 'ToolRunner.run'.  In my testing, I noticed that these 2 threads keep
stepping on each other.

When I looked under the hood, I noticed that ToolRunner creates
GenericOptionsParser which in turn calls a static method
'buildGeneralOptions'.  This method uses 'OptionBuilder.withArgName' which
uses an instance variable called, 'argName'.  This doesn't look thread safe
to me and I believe is the root cause of issues I am running into.

Any thoughts?