Re: Exec hadoop from Java, reuse JVM (client-side)?

2012-08-01 Thread Keith Wiley
It's just easier that way.  I don't have to link in any hadoop libraries or 
bring in any other hadoop related code.  It keeps the two environments 
fundamentally separated.  I suppose I could wrap hadoop into the exterior code, 
but I do kinda like the idea of keeping my various worlds separate.  I'll 
consider it, but I don't really like the idea.  I don't want the program to be 
very dependent on hadoop.  Simply removing a call to execing it is a lot easier 
than gutting hadoop code and linked .jars.

I'll take a look at it, maybe there's a way to do that with relative ease.

On Aug 1, 2012, at 17:57 , Jim Donofrio wrote:

> Why would you call the hadoop script, why not just call the part of the 
> hadoop shell api you are trying to call directly from java?
> 
> 
> On 08/01/2012 07:37 PM, Keith Wiley wrote:
>> Hmmm, at first glance that does appear to be similar to my situation.  I'll 
>> have to delve through it in detail to see if it squarely addresses (and 
>> fixes) my problem.  Mine is sporadic and I suspect dependent on the current 
>> memory situation (it isn't a deterministic and guaranteed failure).  I am 
>> not sure if that is true of the stackoverflow question you referenced...but 
>> it is certainly worth reading over.
>> 
>> Thanks.
>> 
>> On Aug 1, 2012, at 15:34 , Dhruv wrote:
>> 
>>> Is this related?
>>> 
>>> http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




Issue with Hadoop Streaming

2012-08-01 Thread Devi Kumarappan
I am trying to run hadoop streaming using perl script as the mapper and with no 
reducer. My requirement is for the Mapper  to run on one file at a time.  since 
I have to do pattern processing in the entire contents of one file at a time 
and 
the file size is small.

Hadoop streaming manual suggests the following solution
* Generate a file containing the full HDFS path of the input files. 
Each map 
task would get one file name as input.
* Create a mapper script which, given a filename, will get the file to 
local 
disk, gzip the file and put it back in the desired output directory.
I am running the fllowing command.
hadoop jar 
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar 
-input 
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl 
/home/devi/Perl/crash_parser.pl" 

 
/user/devi/file.txt contains the following two lines.
/user/devi/s_input/a.txt
/user/devi/s_input/b.txt

When this runs, instead of spawing two mappers for a.txt and b.txt as per the 
document, only one mapper is being spawned and the perl script gets the 
/user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.
 
How could I make the mapper perl script to run using only one file at a time ?
 
Appreciate your help, Thanks, Devi

Re: Exec hadoop from Java, reuse JVM (client-side)?

2012-08-01 Thread Jim Donofrio
Why would you call the hadoop script, why not just call the part of the 
hadoop shell api you are trying to call directly from java?



On 08/01/2012 07:37 PM, Keith Wiley wrote:

Hmmm, at first glance that does appear to be similar to my situation.  I'll 
have to delve through it in detail to see if it squarely addresses (and fixes) 
my problem.  Mine is sporadic and I suspect dependent on the current memory 
situation (it isn't a deterministic and guaranteed failure).  I am not sure if 
that is true of the stackoverflow question you referenced...but it is certainly 
worth reading over.

Thanks.

On Aug 1, 2012, at 15:34 , Dhruv wrote:


Is this related?

http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
--  Keith Wiley







Re: Exec hadoop from Java, reuse JVM (client-side)?

2012-08-01 Thread Keith Wiley
Hmmm, at first glance that does appear to be similar to my situation.  I'll 
have to delve through it in detail to see if it squarely addresses (and fixes) 
my problem.  Mine is sporadic and I suspect dependent on the current memory 
situation (it isn't a deterministic and guaranteed failure).  I am not sure if 
that is true of the stackoverflow question you referenced...but it is certainly 
worth reading over.

Thanks.

On Aug 1, 2012, at 15:34 , Dhruv wrote:

> Is this related?
> 
> http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
   --  Keith Wiley




Re: Exec hadoop from Java, reuse JVM (client-side)?

2012-08-01 Thread Dhruv
Is this related?

http://stackoverflow.com/questions/1124771/how-to-solve-java-io-ioexception-error-12-cannot-allocate-memory-calling-run

On Wed, Aug 1, 2012 at 1:33 PM, Keith Wiley  wrote:

> I know there is a lot of discussion about JVM reuse in Hadoop, but that
> usually refers to mappers running on the cluste nodesr.  I have a much
> different question.  I am running a Java program which at one point execs
> hadoop and that call sometimes fails in the fashion shown below.  Thus,
> this issue occurs entirely within the client machine (of course, I am
> currently running in pseudo-distributed mode which convolutes that point
> somewhat).  In other words, I successfully ran a Java program, but it
> failed to subsequently run *another* Java program (hadoop).  My
> interpretation of the hadoop startup scripts (the hadoop command itself for
> example) is that they run a second JVM in my scenario, and that they fail
> to allocate enough memory.
>
> Is there any way to run hadoop from within a JVM such that it reuses the
> local JVM?
>
> EXCEPTION: java.io.IOException: Cannot run program "hadoop":
> java.io.IOException: error=12, Cannot allocate memory
> java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> java.lang.Runtime.exec(Runtime.java:593)
> java.lang.Runtime.exec(Runtime.java:466)
> com.util.Shell.run(Shell.java:44)
> com.exe.Foo.bar(Foo.java:107)
> com.exe.Foo.run(Foo.java:205)
> com.exe.Foo.main(Foo.java:227)
> Exception in thread "main" java.io.IOException: Cannot run program
> "hadoop": java.io.IOException: error=12, Cannot allocate memory
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
> at java.lang.Runtime.exec(Runtime.java:593)
> at java.lang.Runtime.exec(Runtime.java:466)
> at com.util.Shell.run(Shell.java:44)
> at com.exe.Foo.bar(Foo.java:107)
> at com.exe.Foo.run(Foo.java:205)
> at com.exe.Foo.main(Foo.java:227)
> Caused by: java.io.IOException: java.io.IOException: error=12, Cannot
> allocate memory
> at java.lang.UNIXProcess.(UNIXProcess.java:148)
> at java.lang.ProcessImpl.start(ProcessImpl.java:65)
> at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
> ... 6 more
>
>
> 
> Keith Wiley kwi...@keithwiley.com keithwiley.com
> music.keithwiley.com
>
> "You can scratch an itch, but you can't itch a scratch. Furthermore, an
> itch can
> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch
> can't
> scratch. All together this implies: He scratched the itch from the scratch
> that
> itched but would never itch the scratch from the itch that scratched."
>--  Keith Wiley
>
> 
>
>


Exec hadoop from Java, reuse JVM (client-side)?

2012-08-01 Thread Keith Wiley
I know there is a lot of discussion about JVM reuse in Hadoop, but that usually 
refers to mappers running on the cluste nodesr.  I have a much different 
question.  I am running a Java program which at one point execs hadoop and that 
call sometimes fails in the fashion shown below.  Thus, this issue occurs 
entirely within the client machine (of course, I am currently running in 
pseudo-distributed mode which convolutes that point somewhat).  In other words, 
I successfully ran a Java program, but it failed to subsequently run *another* 
Java program (hadoop).  My interpretation of the hadoop startup scripts (the 
hadoop command itself for example) is that they run a second JVM in my 
scenario, and that they fail to allocate enough memory.

Is there any way to run hadoop from within a JVM such that it reuses the local 
JVM?

EXCEPTION: java.io.IOException: Cannot run program "hadoop": 
java.io.IOException: error=12, Cannot allocate memory
java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
java.lang.Runtime.exec(Runtime.java:593)
java.lang.Runtime.exec(Runtime.java:466)
com.util.Shell.run(Shell.java:44)
com.exe.Foo.bar(Foo.java:107)
com.exe.Foo.run(Foo.java:205)
com.exe.Foo.main(Foo.java:227)
Exception in thread "main" java.io.IOException: Cannot run program "hadoop": 
java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at java.lang.Runtime.exec(Runtime.java:593)
at java.lang.Runtime.exec(Runtime.java:466)
at com.util.Shell.run(Shell.java:44)
at com.exe.Foo.bar(Foo.java:107)
at com.exe.Foo.run(Foo.java:205)
at com.exe.Foo.main(Foo.java:227)
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate 
memory
at java.lang.UNIXProcess.(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 6 more


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
   --  Keith Wiley




mapreduce.tasktracker.map.tasks.maximum property not working with YARN

2012-08-01 Thread anil gupta
Hi All,

I have a hadoop2.0.0-alpha hadoop/hbase cluster runnning on CentOS6.0. The
cluster has 4 admin nodes and 8 data nodes. I would only like run one
mapper/reducer at a time on each nodemanager. For doing that i had set the
following properties in my mapred-site.xml. However setting this property
is not having any impact on YARN, YARN is running simultaneously 8 map
tasks on one NodeManager.


  mapreduce.tasktracker.map.tasks.maximum
  1


  mapreduce.tasktracker.reduce.tasks.maximum
  1


Is there some other property i need to set for YARN?
-- 
Thanks & Regards,
Anil Gupta


Re: Multinode cluster only recognizes 1 node

2012-08-01 Thread Raj Vishwanathan
Sean 

Can you paste the name node and jobtracker logs.

It could be something as simple as disabling the firewall.

Raj



>
> From: "Barry, Sean F" 
>To: "common-user@hadoop.apache.org"  
>Sent: Tuesday, July 31, 2012 1:23 PM
>Subject: RE: Multinode cluster only recognizes 1 node
> 
>After a good number of hours trying to tinker with my set up I am still am 
>only seeing results for 1, not 2 nodes.
>
>This is the tut I followed
>http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
>After executing ./start-all.sh
>
>starting namenode, logging to 
>/usr/local/hadoop/logs/hadoop-hduser-namenode-master.out
>slave: starting datanode, logging to 
>/usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-slave.out
>master: starting datanode, logging to 
>/usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-master.out
>master: starting secondarynamenode, logging to 
>/usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-master.out
>starting jobtracker, logging to 
>/usr/local/hadoop/logs/hadoop-hduser-jobtracker-master.out
>slave: starting tasktracker, logging to 
>/usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-slave.out
>master: starting tasktracker, logging to 
>/usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-master.out
>
>
>
>
>
>-Original Message-
>From: syed kather [mailto:in.ab...@gmail.com] 
>Sent: Thursday, July 26, 2012 6:06 PM
>To: common-user@hadoop.apache.org
>Subject: Re: Multinode cluster only recognizes 1 node
>
>Can you paste the information when you execute . start-all.sh in kernal..
>when you do ssh on slave .. whether it is working fine...
>On Jul 27, 2012 4:50 AM, "Barry, Sean F"  wrote:
>
>> Hi,
>>
>> I just set up a 2 node POC cluster and I am currently having an issue 
>> with it. I ran a wordcount MR test on my cluster to see if it was 
>> working and noticed that the Web ui at localhost:50030 showed that I 
>> only have 1 live node. I followed the tutorial step by step and I 
>> cannot seem to figure out my problem. When I ran start-all.sh all of 
>> the daemons on my master node and my slave node start up perfectly 
>> fine. If you have any suggestions please let me know.
>>
>> -Sean
>>
>
>
>

Re: task jvm bootstrapping via distributed cache

2012-08-01 Thread Stan Rosenberg
On Tue, Jul 31, 2012 at 7:26 PM, Michael Segel
 wrote:
> Hi Stan,
>
> If I understood your question... you want to ship a jar to the nodes where 
> the task will run prior to the start of the task?
>
> Not sure what it is you're trying to do...
> Your example isn't  really clear.

Correct.  I want to ship a jar to the task, but I need to know its
absolute path before the task jvm is launched.
As an example, -javaagent JVM option expects a jar path.

>
> See: 
> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/filecache/DistributedCache.html
>
> When you pull stuff out of the cache you get the path to the jar.
> Or you should be able to get it.
>

It would be too late at that point; the task tracker controls the
launching of the JVM.  The path of the shipped jar need to be
available before the task is launched.

> Can you give a better example, there may be a different way to handle this...
>
Does the example above make sense?