Streaming doesn't update status/counters

2013-11-13 Thread Keith Wiley
I'm running Hive queries that use python scripts for the reducer.  This method 
seems to act like streaming in all conventional ways (namely, output must be 
sent to stdout and ancillary output can be sent to stderr).

I am not seeing status updates or counter updates however.  I'm doing it the 
usual way, by sending messages of the following forms to stderr:

reporter:counter:foo:bar:123
reporter:status:hello

I'm also attempting to send keep alives:
report:status:keep_alive

I can see all of these messages appearing in stderr, but they don't show up in 
the expected places in the job tracker...and I have no idea if the keep alives 
are "working" either.

With regard to counters, I'm not sure if I need to initialize or notify hadoop 
of the group/counter ids in advance, but with regard to status it should be 
very straight-forward, and I'm not seeing anything.  The status jus says 
"reduce > reduce", as always.

Thanks for any help on this.

____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
   --  Homer Simpson




DistributedCache is empty

2014-01-16 Thread Keith Wiley
My driver is implemented around Tool and so should be wrapping 
GenericOptionsParser internally.  Nevertheless, neither -files nor 
DistributedCache methods seem to work.  Usage on the command line is straight 
forward, I simply add "-files foo.py,bar.py" right after the class name (where 
those files are in the current directory I'm running hadoop from, i.e., the 
local nonHDFS filesystem).  The mapper then inspects the file list via 
DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see 
the files, there's nothing there.  Likewise, if I attempt to run those python 
scripts from the mapper using hadoop.util.Shell, the files obviously can't be 
found.

That should have worked, so I shouldn't have to rely on the DC methods, but 
nevertheless, I tried anyway, so in the driver I create a new Configuration, 
then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus 
referencing the local nonHDFS file in the current working directory.  I then 
add conf to the job ctor, seems straight forward.  Still no dice, the mapper 
can't see the files, they simply aren't there.

What on Earth am I doing wrong here?

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




Re: DistributedCache is empty

2014-01-17 Thread Keith Wiley
2.0.0

The problem was I was creating a new Configuration and giving it to the Job 
ctor (which I believe is demonstrated in some tutorials) whereas the correct 
behavior was to retrieve the preexisting Configuration and use that instead.  
This may be a distinction between writing a bare driver and one that overrides 
Configured and Tool.

On Jan 17, 2014, at 09:46 , Vinod Kumar Vavilapalli wrote:

> What is the version of Hadoop that you are using?
> 
> +Vinod
> 
> On Jan 16, 2014, at 2:41 PM, Keith Wiley  wrote:
> 
>> My driver is implemented around Tool and so should be wrapping 
>> GenericOptionsParser internally.  Nevertheless, neither -files nor 
>> DistributedCache methods seem to work.  Usage on the command line is 
>> straight forward, I simply add "-files foo.py,bar.py" right after the class 
>> name (where those files are in the current directory I'm running hadoop 
>> from, i.e., the local nonHDFS filesystem).  The mapper then inspects the 
>> file list via 
>> DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't 
>> see the files, there's nothing there.  Likewise, if I attempt to run those 
>> python scripts from the mapper using hadoop.util.Shell, the files obviously 
>> can't be found.
>> 
>> That should have worked, so I shouldn't have to rely on the DC methods, but 
>> nevertheless, I tried anyway, so in the driver I create a new Configuration, 
>> then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus 
>> referencing the local nonHDFS file in the current working directory.  I then 
>> add conf to the job ctor, seems straight forward.  Still no dice, the mapper 
>> can't see the files, they simply aren't there.
>> 
>> What on Earth am I doing wrong here?
>> 
>> 
>> Keith Wiley kwi...@keithwiley.com keithwiley.com
>> music.keithwiley.com
>> 
>> "Luminous beings are we, not this crude matter."
>>  --  Yoda
>> 
>> 
> 
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
   --  Keith Wiley




EmptyInputFormat for Hadoop 2?

2014-01-17 Thread Keith Wiley
The version of EmptyInputFormat available in the tarball (I downloaded CDH4 if 
that matteres) uses mapred, not mapreduce, and therefore is not compatible with 
calls to setInputFormatClass(), so I attempted to extrapolate the pattern of 
the old code to an updated version.  The class I created can be passed to 
setInputFormatClass() without a compile error, and the Hadoop job runs...but 
the job uses 0 mappers!  The map class isn't called at all, a map slot isn't 
even allocated to the job.  Clearly, this was not my intent.  Any help?  Here's 
what I put together:

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.mapreduce.InputFormat;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.RecordReader;
import org.apache.hadoop.mapreduce.TaskAttemptContext;

/**
 * InputFormat which simulates the absence of input data by returning zero 
split.
* @param 
* @param 
 */
public class EmptyInputFormat extends InputFormat {
@Override
public List getSplits(JobContext arg0) throws IOException,
InterruptedException {
return new ArrayList();
}

@Override
public RecordReader createRecordReader(InputSplit arg0,
TaskAttemptContext arg1) throws IOException, 
InterruptedException {
return new RecordReader() {

@Override
public void close() throws IOException { }

@Override
public K getCurrentKey() throws IOException, 
InterruptedException {
return null;
}

@Override
public V getCurrentValue() throws IOException, 
InterruptedException {
return null;
}

@Override
public float getProgress() throws IOException, 
InterruptedException {
return 0;
}

@Override
public void initialize(InputSplit arg0, 
TaskAttemptContext arg1)
throws IOException, 
InterruptedException {

}

@Override
public boolean nextKeyValue() throws IOException,
InterruptedException {
return false;
}

};
}
}

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
       --  Keith Wiley




Is perfect control over mapper num AND split distribution possible?

2014-01-21 Thread Keith Wiley
I am running a job that takes no input from the mapper-input key/value 
interface.  Each job reads the same small file from the distributed cache and 
processes it independently (to generate Monte Carlo sampling of the problem 
space).  I am using MR purely to parallelize the otherwise redundant and 
separated sampling process.  To maximize parallelism, I want to set the number 
of mappers explicitly, such that 10 samples run in exact 1X time by perfectly 
distributing over 10 mappers.  I am accomplishing this by generating a dummy MR 
input file of nonvalue data.  Each row is identical so I know the exact row 
length of all rows.  I then simply set the split size to the row length with 
the intention that Hadoop perfectly assign the intended number of mappers.  
This approach mostly works.  However, I get a few extraneous empty mappers.  
Since they get no input, they do no work and exit almost immediately, so they 
aren't a serious drain on cluster resources, but I'm confused why I get extra 
mappers in the first place.

My working theory was that the end-lines of the input file must be accounted 
for when calculating split sizes (so my splits were too small and I got a few 
extra splits hanging off the end of the input file).  I attempted to fix this 
by adding one to the calculated split size (one greater than the actual row 
length now).  This works perfectly, generating exactly the intended number of 
mappers, exactly the same number as there are rows in the input file.  However, 
the labor distribution is not perfect.  Almost every single run produces one 
mapper which receives no input (and ends immediately) and another mapper which 
receives two inputs, thus triggering two "processing sessions" on that 
particular mapper such that it takes twice as long to complete as the other 
mappers.  Obviously, this wrecks the potential parallelism by literally 
doubling the overall job time.

Which split size is correct: row length without end-line or row length with 
end-line?  The former yields extra empty mappers while the latter yields 
exactly the right number.  However, if the latter is correct, why is the task 
distribution uneven (albeit NEARLY even) and what (if anything) can be done 
about it?

Thanks.

____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
   --  Mark Twain




Re: Is perfect control over mapper num AND split distribution possible?

2014-01-21 Thread Keith Wiley
I'll look it up.  Thanks.

On Jan 21, 2014, at 11:43 , java8964 wrote:

> You cannot use hadoop "NLineInputFormat"?
> 
> If you generate 100 lines of text file, by default, one line will trigger one 
> mapper task.
> 
> As long as you have 100 task slot available, you will get 100 mapper running 
> concurrently.
> 
> You want perfect control over mapper num? NLineInputFormat is designed for 
> your purpose.
> 
> Yong

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
   --  Keith Wiley




Re: Is perfect control over mapper num AND split distribution possible?

2014-01-21 Thread Keith Wiley
Seems to work well.  Thank you very much!

On Jan 21, 2014, at 12:42 , Keith Wiley wrote:

> I'll look it up.  Thanks.
> 
> On Jan 21, 2014, at 11:43 , java8964 wrote:
> 
>> You cannot use hadoop "NLineInputFormat"?
>> 
>> If you generate 100 lines of text file, by default, one line will trigger 
>> one mapper task.
>> 
>> As long as you have 100 task slot available, you will get 100 mapper running 
>> concurrently.
>> 
>> You want perfect control over mapper num? NLineInputFormat is designed for 
>> your purpose.
>> 
>> Yong



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
   --  Keith Wiley




Force one mapper per machine (not core)?

2014-01-28 Thread Keith Wiley
I'm running a program which in the streaming layer automatically multithreads 
and does so by automatically detecting the number of cores on the machine.  I 
realize this model is somewhat in conflict with Hadoop, but nonetheless, that's 
what I'm doing.  Thus, for even resource utilization, it would be nice to not 
only assign one mapper per core, but only one mapper per machine.  I realize 
that if I saturate the cluster none of this really matters, but consider the 
following example for clarity: 4-core nodes, 10-node cluster, thus 40 slots, 
fully configured across mappers and reducers (40 slots of each).  Say I run 
this program with just two mappers.  It would run much more efficiently (in 
essentially half the time) if I could force the two mappers to go to slots on 
two separate machines instead of running the risk that Hadoop may assign them 
both to the same machine.

Can this be done?

Thanks.

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
   --  Edwin A. Abbott, Flatland




Re: Force one mapper per machine (not core)?

2014-01-28 Thread Keith Wiley
Yeah, it isn't, not even remotely, but thanks.

On Jan 28, 2014, at 14:06 , Bryan Beaudreault wrote:

> If this cluster is being used exclusively for this goal, you could just set 
> the mapred.tasktracker.map.tasks.maximum to 1.
> 
> 
> On Tue, Jan 28, 2014 at 5:00 PM, Keith Wiley  wrote:
> I'm running a program which in the streaming layer automatically multithreads 
> and does so by automatically detecting the number of cores on the machine.  I 
> realize this model is somewhat in conflict with Hadoop, but nonetheless, 
> that's what I'm doing.  Thus, for even resource utilization, it would be nice 
> to not only assign one mapper per core, but only one mapper per machine.  I 
> realize that if I saturate the cluster none of this really matters, but 
> consider the following example for clarity: 4-core nodes, 10-node cluster, 
> thus 40 slots, fully configured across mappers and reducers (40 slots of 
> each).  Say I run this program with just two mappers.  It would run much more 
> efficiently (in essentially half the time) if I could force the two mappers 
> to go to slots on two separate machines instead of running the risk that 
> Hadoop may assign them both to the same machine.
> 
> Can this be done?
> 
> Thanks.

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




Re: Force one mapper per machine (not core)?

2014-01-31 Thread Keith Wiley
Hmmm, okay.  I know it's running CDH4 4.4.0, as but for whether it was 
specifically configured with MR1 or MR2 (is there a distinction between MR2 and 
Yarn?) I'm not absolutely certain.  I know that the cluster "behaves" like the 
MR1 clusters I've worked with for years (I interact with the job tracker in a 
classical way for example).  Can I tell whether it's MR1 or MR2 from the job 
tracker or namename web UIs?

Thanks.

On Jan 29, 2014, at 00:52 , Harsh J wrote:

> Is your cluster running MR1 or MR2? On MR1, the CapacityScheduler
> would allow you to do this if you used appropriate memory based
> requests (see http://search-hadoop.com/m/gnFs91yIg1e), and on MR2
> (depending on the YARN scheduler resource request limits config) you
> can request your job be run with the maximum-most requests that would
> soak up all provided resources (of CPU and Memory) of a node such that
> only one container runs on a host at any given time.
> 
> On Wed, Jan 29, 2014 at 3:30 AM, Keith Wiley  wrote:
>> I'm running a program which in the streaming layer automatically 
>> multithreads and does so by automatically detecting the number of cores on 
>> the machine.  I realize this model is somewhat in conflict with Hadoop, but 
>> nonetheless, that's what I'm doing.  Thus, for even resource utilization, it 
>> would be nice to not only assign one mapper per core, but only one mapper 
>> per machine.  I realize that if I saturate the cluster none of this really 
>> matters, but consider the following example for clarity: 4-core nodes, 
>> 10-node cluster, thus 40 slots, fully configured across mappers and reducers 
>> (40 slots of each).  Say I run this program with just two mappers.  It would 
>> run much more efficiently (in essentially half the time) if I could force 
>> the two mappers to go to slots on two separate machines instead of running 
>> the risk that Hadoop may assign them both to the same machine.
>> 
>> Can this be done?
>> 
>> Thanks.



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
   --  Abe (Grandpa) Simpson




Re: Force one mapper per machine (not core)?

2014-01-31 Thread Keith Wiley
Hmmm, okay.  I thought that logic all acted at the level of "slots".  I didn't 
realize it could make "node" distinctions.  Thanks for the tip.

On Jan 29, 2014, at 05:18 , java8964 wrote:

> Or you can implement your own InputSplit and InputFormat, which you can 
> control how to send tasks to which node, and how many per node.
> 
> Some detail examples you can get from book "Professional Hadoop Solution" 
> Character 4.
> 
> Yong


____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
   --  Mark Twain




.deflate trouble

2013-02-14 Thread Keith Wiley
I just got hadoop running on EC2 (0.19 just because that's the AMI the scripts 
seemed to go for).  The PI example worked and I believe the wordcount example 
worked too.  However, the output file is in .deflate format.  "hadoop fs -text" 
fails to decompress the file -- it produces the same binary output as "hadoop 
fs -cat", which I find counterintuitive; isn't -text specifically supposed to 
handle this situation?

I copied the file to local and tried manually decompressing it with gunzip and 
lzop (by appending appropriate suffixes), but both tools failed to recognize 
the file.  To add to the confusion, I see this in the default configuration 
offered by the EC2 scripts:

  mapred.output.compress
  false
  Should the job outputs be compressed?
  

...so I don't understand why the output was compressed in the first place.

At this point, I'm kind of stuck.  The output shouldn't be compressed to begin 
with, and all attempts to decompress it have failed.

Any ideas?

Thanks.

____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
   --  Homer Simpson




Re: .deflate trouble

2013-02-14 Thread Keith Wiley
I'll look into the job.xml issue, thanks for the suggestion.  In the meantime, 
who is in charge of maintaining the "official" AWS Hadoop AMIs?  The following 
are the contents of the hadoop-images/ S3 bucket.  As you can see, it tops out 
at 19:

IMAGE   ami-65987c0chadoop-images/hadoop-0.17.1-i386.manifest.xml   
914733919441available   public  i386machine aki-a71cf9ce
ari-a51cf9ccinstance-store  paravirtual xen
IMAGE   ami-4b987c22hadoop-images/hadoop-0.17.1-x86_64.manifest.xml 
914733919441available   public  x86_64  machine aki-b51cf9dc
ari-b31cf9dainstance-store  paravirtual xen
IMAGE   ami-b0fe1ad9hadoop-images/hadoop-0.18.0-i386.manifest.xml   
914733919441available   public  i386machine aki-a71cf9ce
ari-a51cf9ccinstance-store  paravirtual xen
IMAGE   ami-90fe1af9hadoop-images/hadoop-0.18.0-x86_64.manifest.xml 
914733919441available   public  x86_64  machine aki-b51cf9dc
ari-b31cf9dainstance-store  paravirtual xen
IMAGE   ami-ea36d283hadoop-images/hadoop-0.18.1-i386.manifest.xml   
914733919441available   public  i386machine aki-a71cf9ce
ari-a51cf9ccinstance-store  paravirtual xen
IMAGE   ami-fe37d397hadoop-images/hadoop-0.18.1-x86_64.manifest.xml 
914733919441available   public  x86_64  machine aki-b51cf9dc
ari-b31cf9dainstance-store  paravirtual xen
IMAGE   ami-fa6a8e93hadoop-images/hadoop-0.19.0-i386.manifest.xml   
914733919441available   public  i386machine aki-a71cf9ce
ari-a51cf9ccinstance-store  paravirtual xen
IMAGE   ami-cd6a8ea4hadoop-images/hadoop-0.19.0-x86_64.manifest.xml 
914733919441available   public  x86_64  machine aki-b51cf9dc
ari-b31cf9dainstance-store  paravirtual xen
IMAGE   ami-15e80f7chadoop-images/hadoop-base-20090210-i386.manifest.xml
914733919441available   public  i386machine aki-a71cf9ce
ari-a51cf9ccinstance-store  paravirtual xen
IMAGE   ami-1ee80f77hadoop-images/hadoop-base-20090210-x86_64.manifest.xml  
914733919441available   public  x86_64  machine aki-b51cf9dc
ari-b31cf9dainstance-store  paravirtual xen


On Feb 14, 2013, at 15:02 , Harsh J wrote:

> 0.19 is really old and thats probably why the Text utility (fs -text)
> doesn't support automatic decompression based on extensions (or
> specifically, of .deflate).
> 
> Did the job.xml of the job that produced this output also carry
> mapred.output.compress=false in it? The file should be viewable on the
> JT UI page for the job. Unless explicitly turned out, even 0.19
> wouldn't have enabled compression on its own.
> 
> On Fri, Feb 15, 2013 at 3:50 AM, Keith Wiley  wrote:
>> I just got hadoop running on EC2 (0.19 just because that's the AMI the 
>> scripts seemed to go for).  The PI example worked and I believe the 
>> wordcount example worked too.  However, the output file is in .deflate 
>> format.  "hadoop fs -text" fails to decompress the file -- it produces the 
>> same binary output as "hadoop fs -cat", which I find counterintuitive; isn't 
>> -text specifically supposed to handle this situation?
>> 
>> I copied the file to local and tried manually decompressing it with gunzip 
>> and lzop (by appending appropriate suffixes), but both tools failed to 
>> recognize the file.  To add to the confusion, I see this in the default 
>> configuration offered by the EC2 scripts:
>> 
>>  mapred.output.compress
>>  false
>>  Should the job outputs be compressed?
>>  
>> 
>> ...so I don't understand why the output was compressed in the first place.
>> 
>> At this point, I'm kind of stuck.  The output shouldn't be compressed to 
>> begin with, and all attempts to decompress it have failed.
>> 
>> Any ideas?
>> 
>> Thanks.



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
   --  Abe (Grandpa) Simpson




Re: .deflate trouble

2013-02-14 Thread Keith Wiley
Good call.  We can't use the conventional web-based JT due to corporate access 
issues, but I looked at the job_XXX.xml file directly, and sure enough, it set 
mapred.output.compress to true.  Now I just need to remember how that occurs.  
I simply ran the wordcount example straight off the command line, I didn't 
specify any overridden conf settings for the job.

Ultimately, the solution (or part of it) is to get away from .19 to a more 
up-to-date version of Hadoop.  I would prefer 2.0 over 1.0 in fact, but due to 
a remarkable lack of concise EC2/Hadoop documentation (and the fact that what 
docs I did find were very old and therefore conformed to .19 style Hadoop), I 
have fallen back on old versions of Hadoop for my initial tests.  In the long 
run, I will need to get a more modern version of Hadoop to successfully deploy 
on EC2.

Thanks.

On Feb 14, 2013, at 15:02 , Harsh J wrote:

> Did the job.xml of the job that produced this output also carry
> mapred.output.compress=false in it? The file should be viewable on the
> JT UI page for the job. Unless explicitly turned out, even 0.19
> wouldn't have enabled compression on its own.


________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
   --  Mark Twain




Re: .deflate trouble

2013-02-15 Thread Keith Wiley
I might contact them but we are specifically avoiding EMR for this project.  We 
have already successfully deployed EMR but we want more precise control over 
the cluster, namely the ability to persist and reawaken it on demand.  We 
really want a direct Hadoop installation instead of an EMR-based installation.  
But I might contact them anyway to see what they recommend.  Thanks for he refs.

On Feb 14, 2013, at 19:09 , Marcos Ortiz Valmaseda wrote:

> Regards, Keith. For EMR issues and stuff, you can contact directly to Jeff 
> Barr(Chief Evangelist for AWS) or to Saurabh Baji (Product Manager for AWS 
> EMR).
> Best wishes.
> 
> De: "Keith Wiley" 
> Para: user@hadoop.apache.org
> Enviados: Jueves, 14 de Febrero 2013 15:46:05
> Asunto: Re: .deflate trouble
> 
> Good call.  We can't use the conventional web-based JT due to corporate 
> access issues, but I looked at the job_XXX.xml file directly, and sure 
> enough, it set mapred.output.compress to true.  Now I just need to remember 
> how that occurs.  I simply ran the wordcount example straight off the command 
> line, I didn't specify any overridden conf settings for the job.
> 
> Ultimately, the solution (or part of it) is to get away from .19 to a more 
> up-to-date version of Hadoop.  I would prefer 2.0 over 1.0 in fact, but due 
> to a remarkable lack of concise EC2/Hadoop documentation (and the fact that 
> what docs I did find were very old and therefore conformed to .19 style 
> Hadoop), I have fallen back on old versions of Hadoop for my initial tests.  
> In the long run, I will need to get a more modern version of Hadoop to 
> successfully deploy on EC2.
> 
> Thanks.
> 
> On Feb 14, 2013, at 15:02 , Harsh J wrote:
> 
> > Did the job.xml of the job that produced this output also carry
> > mapred.output.compress=false in it? The file should be viewable on the
> > JT UI page for the job. Unless explicitly turned out, even 0.19
> > wouldn't have enabled compression on its own.



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
   --  Keith Wiley




Namenode formatting problem

2013-02-18 Thread Keith Wiley
This is Hadoop 2.0.  Formatting the namenode produces no errors in the shell, 
but the log shows this:

2013-02-18 22:19:46,961 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Exception in namenode join
java.net.BindException: Problem binding to [ip-13-0-177-110:9212] 
java.net.BindException: Cannot assign requested address; For more details see:  
http://wiki.apache.org/hadoop/BindException
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:710)
at org.apache.hadoop.ipc.Server.bind(Server.java:356)
at org.apache.hadoop.ipc.Server$Listener.(Server.java:454)
at org.apache.hadoop.ipc.Server.(Server.java:1833)
at org.apache.hadoop.ipc.RPC$Server.(RPC.java:866)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:375)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:350)
at org.apache.hadoop.ipc.RPC.getServer(RPC.java:695)
at org.apache.hadoop.ipc.RPC.getServer(RPC.java:684)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:238)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:452)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:434)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:608)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:589)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
2013-02-18 22:19:46,988 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
2013-02-18 22:19:46,990 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down NameNode at ip-13-0-177-11/127.0.0.1
/

No java processes begin (although I wouldn't expect formatting the namenode to 
start any processes, only starting the namenode or datanode should do that), 
and "hadoop fs -ls /" gives me this:

ls: Call From [CLIENT_HOST]/127.0.0.1 to [MASTER_HOST]:9000 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused

My /etc/hosts looks like this:
127.0.0.1   localhost localhost.localdomain CLIENT_HOST
MASTER_IP MASTER_HOST master
SLAVE_IP SLAVE_HOST slave01

This is on EC2.  All of the nodes are in the same security group and the 
security group has full inbound access.  I can ssh between all three machines 
(client/master/slave) without a password ala authorized_keys.  I can ping the 
master node from the client machine (although I don't know how to ping a 
specific port, such as the hdfs port (9000)).  Telnet doesn't behave on EC2 
which makes port testing a little difficult.

Any ideas?

____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
   --  Mark Twain




Re: Namenode formatting problem

2013-02-19 Thread Keith Wiley
Hmmm, okay.  Thanks.  Umm, is this a Yarn thing because I also tried it with 
Hadoop 2.0 MR1 (which I think should behave almost exactly like older versions 
of Hadoop) and it had the exact same problem.  Does H2.0MR1 us journal nodes?  
I'll try to read up more on this later today.  Thanks for the tip.

On Feb 18, 2013, at 16:32 , Azuryy Yu wrote:

> Because journal nodes are also be formated during NN format, so you need to 
> start all JN daemons firstly.
> 
> On Feb 19, 2013 7:01 AM, "Keith Wiley"  wrote:
> This is Hadoop 2.0.  Formatting the namenode produces no errors in the shell, 
> but the log shows this:
> 
> 2013-02-18 22:19:46,961 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
> java.net.BindException: Problem binding to [ip-13-0-177-110:9212] 
> java.net.BindException: Cannot assign requested address; For more details 
> see:  http://wiki.apache.org/hadoop/BindException
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:710)
> at org.apache.hadoop.ipc.Server.bind(Server.java:356)
> at org.apache.hadoop.ipc.Server$Listener.(Server.java:454)
> at org.apache.hadoop.ipc.Server.(Server.java:1833)
> at org.apache.hadoop.ipc.RPC$Server.(RPC.java:866)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server.(ProtobufRpcEngine.java:375)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:350)
> at org.apache.hadoop.ipc.RPC.getServer(RPC.java:695)
> at org.apache.hadoop.ipc.RPC.getServer(RPC.java:684)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.(NameNodeRpcServer.java:238)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createRpcServer(NameNode.java:452)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:434)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:608)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:589)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> 2013-02-18 22:19:46,988 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1
> 2013-02-18 22:19:46,990 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG:
> /
> SHUTDOWN_MSG: Shutting down NameNode at ip-13-0-177-11/127.0.0.1
> /
> 
> No java processes begin (although I wouldn't expect formatting the namenode 
> to start any processes, only starting the namenode or datanode should do 
> that), and "hadoop fs -ls /" gives me this:
> 
> ls: Call From [CLIENT_HOST]/127.0.0.1 to [MASTER_HOST]:9000 failed on 
> connection exception: java.net.ConnectException: Connection refused; For more 
> details see:  http://wiki.apache.org/hadoop/ConnectionRefused
> 
> My /etc/hosts looks like this:
> 127.0.0.1   localhost localhost.localdomain CLIENT_HOST
> MASTER_IP MASTER_HOST master
> SLAVE_IP SLAVE_HOST slave01
> 
> This is on EC2.  All of the nodes are in the same security group and the 
> security group has full inbound access.  I can ssh between all three machines 
> (client/master/slave) without a password ala authorized_keys.  I can ping the 
> master node from the client machine (although I don't know how to ping a 
> specific port, such as the hdfs port (9000)).  Telnet doesn't behave on EC2 
> which makes port testing a little difficult.
> 
> Any ideas?


Keith Wiley kwi...@keithwiley.com keithwiley.com    music.keithwiley.com

"What I primarily learned in grad school is how much I *don't* know.
Consequently, I left grad school with a higher ignorance to knowledge ratio than
when I entered."
   --  Keith Wiley




webapps/ CLASSPATH err

2013-02-19 Thread Keith Wiley
java:608)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:589)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
2013-02-19 19:15:20,447 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
status 1
2013-02-19 19:15:20,474 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
SHUTDOWN_MSG: 
/
SHUTDOWN_MSG: Shutting down NameNode at ip-13-0-177-11/127.0.0.1
/

This is particularly confusing because, while the hadoop-2.0.0-mr1-cdh4.1.3/ 
dir does have a webapps/ dir, there is no "hdfs" file or dir in that webapps/.  
It contains job/, static/, and task/.

If I start over from a freshly formatted namenode and take a slightly different 
approach -- if I try to start the datanode immediately after starting the 
namenode -- once again it fails, and in a very similar way.  This time the 
command to start the datanode has two effects: the namenode log still can't 
find webapps/hdfs, just as shown above, and also, there is now a datanode log 
file, and it likewise can't find webapps/datanode 
("java.io.FileNotFoundException: webapps/datanode not found in CLASSPATH") so I 
get two very similar errors at once, one on the namenode and one on the 
datanode.

This webapps/ dir business makes no sense since the files (or directories) the 
logs claim to be looking for inside webapps/ ("hdfs" and "datanode") don't 
exist!

Thoughts?


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
   --  Keith Wiley




Re: webapps/ CLASSPATH err

2013-02-19 Thread Keith Wiley

On Feb 19, 2013, at 11:43 , Harsh J wrote:

> Hi Keith,
> 
> The webapps/hdfs bundle is present at
> $HADOOP_PREFIX/share/hadoop/hdfs/ directory of the Hadoop 2.x release
> tarball. This should get on the classpath automatically as well.

Hadoop 2.0 Yarn does indeed have a share/ dir but Hadoop 2.0 MR1 doesn't have a 
share/ dir at all.  Is MR1 not usable?  I was hoping to use it as a stepping 
stone between older versions of Hadoop (for which I have found some EC2 
support, not the least being an actual ec2/ dir and associated scripts in 
src/contrib/ec2) and Yarn, for which I have found no such support, provided 
scripts, or online walkthroughs yet).  However, I am discovering the H2 MR1 is 
sufficiently different from older versions of Hadoop that it does not easily 
extrapolate from those previous successes (the bin/ directory is quite 
different for one thing).  At the same time, H2 MR1 is also sufficiently 
different from Yarn that I can't easily extend Yarn advise onto it (as noted, I 
don't even see a share/ directory in H2 MR1, so I'm not sure how to apply the 
response above).

> What "bin/hadoop-daemon.sh" script are you using, the one from the MR1
> "aside" tarball or the chief hadoop-2 one?

I figured, as long as I'm trying to us MR1, I would use it exclusively and not 
touch the Yarn installation at all, so I'm relying entirely on the conf/ and 
bin/ dirs under MR1 (note that MR1's sbin/ dir only contains a nonexecutable 
"task-controller", not all the other stuff that Yarn's sbin/ dir contains)...so 
I'm using MR1's bin/hadoop and bin/hadoop-daemon.sh, nothin else).

> On my tarball setups, I 'start-dfs.sh' via the regular tarball, and it
> works fine.

MR1's bin/ dir has no such executable, nor does it have the conventional 
start-all.sh I'm used to.  I recognize those script names from older versions 
of Hadoop, but H2 MR1 doesn't provide them.  I'm using 
hadoop-2.0.0-mr1-cdh4.1.3.

> Another simple check you could do is to try to start with
> "$HADOOP_PREFIX/bin/hdfs namenode" to see if it at least starts well
> this way and brings up the NN as a foreground process.

H2 MR1's bin/ dir doesn't have an hdfs executable in it.  Admittedly, H2 Yarn's 
bin/ dir does.  The following are my H2 MR1 bin/ options:
~/hadoop-2.0.0-mr1-cdh4.1.3/ $ ls bin/
total 60
 4 drwxr-xr-x  2 ec2-user ec2-user  4096 Feb 18 23:45 ./
 4 drwxr-xr-x 17 ec2-user ec2-user  4096 Feb 19 00:08 ../
20 -rwxr-xr-x  1 ec2-user ec2-user 17405 Jan 27 01:07 hadoop*
 8 -rwxr-xr-x  1 ec2-user ec2-user  4356 Jan 27 01:07 hadoop-config.sh*
 4 -rwxr-xr-x  1 ec2-user ec2-user  3988 Jan 27 01:07 hadoop-daemon.sh*
 4 -rwxr-xr-x  1 ec2-user ec2-user  1227 Jan 27 01:07 hadoop-daemons.sh*
 4 -rwxr-xr-x  1 ec2-user ec2-user  2710 Jan 27 01:07 rcc*
 4 -rwxr-xr-x  1 ec2-user ec2-user  2043 Jan 27 01:07 slaves.sh*
 4 -rwxr-xr-x  1 ec2-user ec2-user  1159 Jan 27 01:07 start-mapred.sh*
 4 -rwxr-xr-x  1 ec2-user ec2-user  1068 Jan 27 01:07 stop-mapred.sh*


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
   --  Keith Wiley




Article: 'How to Deploy Hadoop 2 (Yarn) on EC2'

2013-04-17 Thread Keith Wiley
I've posted an article on my website that details precisely how to deploy 
Hadoop 2.0 with Yarn on AWS (or least how I did it, whether or not such an 
approach will translate to others' circumstances).  I had been disappointed 
that most articles of this type described the process with much older versions 
of Hadoop or relied on tools like Whirr and I wanted to document and publish my 
method.  Perhaps others will find it useful.

I'm sure others more expert than myself will see opportunities to stream-line 
or otherwise improve the process.  I don't claim that my method is the best, 
merely that I actually got it to work!

http://www.keithwiley.com/writing/HowToDeployHadoopYarnOnEC2.shtml

Cheers!

____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




Re: Learning hadoop

2012-08-23 Thread Keith Wiley
Tom White's book (O'Reilly).

Sent from my phone, please excuse my brevity.
Keith Wiley, kwi...@keithwiley.com, http://keithwiley.com


Pravin Sinha  wrote:

Hi,

I am new to Hadoop. What would be the best way to learn  hadoop and eco system 
around it?

Thanks,
Pravin




Adding additional storage

2012-08-27 Thread Keith Wiley
I'm running a pseudo-distributed cluster on a single machine and I would like 
to use a larger disk (mounted and ready to go of course).  I don't mind 
transferring to the new disk (as opposed to using both disks for the hdfs which 
seems much hairier), but I'm not sure how to transfer a hadoop cluster to a new 
disk...or if its even possible.  Even if I simply copy the entire directory 
where the hdfs is emulated, I still need to somehow switch the namenode to know 
about the new disk?

Is there any way to do this that beats manually "getting" the data, throwing 
away the old cluster, making a new cluster from scratch, and reuploading the 
data to hdfs?...or is that really the only feasible way to migrate a 
pseudo-distributed cluster to a second larger storage?

Thanks.

____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
       --  Keith Wiley




Re: Adding additional storage

2012-08-27 Thread Keith Wiley
That appears to have worked.  Thanks.

On Aug 27, 2012, at 11:52 , Harsh J wrote:

> Hey Keith,
> 
> Pseudo-distributed isn't any different from fully-distributed,
> operationally, except for nodes = 1 - so don't let it limit your
> thoughts :)
> 
> Stop the HDFS cluster, mv your existing dfs.name.dir and dfs.data.dir
> dir contents onto the new storage mount. Reconfigure dfs.data.dir and
> dfs.name.dir to point to these new locations and start it back up. All
> should be well.
> 
> On Tue, Aug 28, 2012 at 12:15 AM, Keith Wiley  wrote:
>> I'm running a pseudo-distributed cluster on a single machine and I would 
>> like to use a larger disk (mounted and ready to go of course).  I don't mind 
>> transferring to the new disk (as opposed to using both disks for the hdfs 
>> which seems much hairier), but I'm not sure how to transfer a hadoop cluster 
>> to a new disk...or if its even possible.  Even if I simply copy the entire 
>> directory where the hdfs is emulated, I still need to somehow switch the 
>> namenode to know about the new disk?
>> 
>> Is there any way to do this that beats manually "getting" the data, throwing 
>> away the old cluster, making a new cluster from scratch, and reuploading the 
>> data to hdfs?...or is that really the only feasible way to migrate a 
>> pseudo-distributed cluster to a second larger storage?
>> 
>> Thanks.
>> 
>> 
>> Keith Wiley kwi...@keithwiley.com keithwiley.com
>> music.keithwiley.com
>> 
>> "You can scratch an itch, but you can't itch a scratch. Furthermore, an itch 
>> can
>> itch but a scratch can't scratch. Finally, a scratch can itch, but an itch 
>> can't
>> scratch. All together this implies: He scratched the itch from the scratch 
>> that
>> itched but would never itch the scratch from the itch that scratched."
>>       --  Keith Wiley
>> 
>> 
> 
> 
> 
> -- 
> Harsh J



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
I've been running up against the good old fashioned "replicated to 0 nodes" 
gremlin quite a bit recently.  My system (a set of processes interacting with 
hadoop, and of course hadoop itself) runs for a while (a day or so) and then I 
get plagued with these errors.  This is a very simple system, a single node 
running pseudo-distributed.  Obviously, the replication factor is implicitly 1 
and the datanode is the same machine as the namenode.  None of the typical 
culprits seem to explain the situation and I'm not sure what to do.  I'm also 
not sure how I'm getting around it so far.  I fiddle desperately for a few 
hours and things start running again, but that's not really a solution...I've 
tried stopping and restarting hdfs, but that doesn't seem to improve things.

So, to go through the common suspects one by one, as quoted on 
http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo:

• No DataNode instances being up and running. Action: look at the servers, see 
if the processes are running.

I can interact with hdfs through the command line (doing directory listings for 
example).  Furthermore, I can see that the relevant java processes are all 
running (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker).

• The DataNode instances cannot talk to the server, through networking or 
Hadoop configuration problems. Action: look at the logs of one of the DataNodes.

Obviously irrelevant in a single-node scenario.  Anyway, like I said, I can 
perform basic hdfs listings, I just can't upload new data.

• Your DataNode instances have no hard disk space in their configured data 
directories. Action: look at the dfs.data.dir list in the node configurations, 
verify that at least one of the directories exists, and is writeable by the 
user running the Hadoop processes. Then look at the logs.

There's plenty of space, at least 50GB.

• Your DataNode instances have run out of space. Look at the disk capacity via 
the Namenode web pages. Delete old files. Compress under-used files. Buy more 
disks for existing servers (if there is room), upgrade the existing servers to 
bigger drives, or add some more servers.

Nope, 50GBs free, I'm only uploading a few KB at a time, maybe a few MB.

• The reserved space for a DN (as set in dfs.datanode.du.reserved is greater 
than the remaining free space, so the DN thinks it has no free space

I grepped all the files in the conf directory and couldn't find this parameter 
so I don't really know anything about it.  At any rate, it seems rather 
esoteric, I doubt it is related to my problem.  Any thoughts on this?

• You may also get this message due to permissions, eg if JT can not create 
jobtracker.info on startup.

Meh, like I said, the system basicaslly works...and then stops working.  The 
only explanation that would really make sense in that context is running out of 
space...which isn't happening. If this were a permission error, or a 
configuration error, or anything weird like that, then the whole system would 
never get up and running in the first place.

Why would a properly running hadoop system start exhibiting this error without 
running out of disk space?  THAT's the real question on the table here.

Any ideas?

____
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
   --  Edwin A. Abbott, Flatland




Re: could only be replicated to 0: TL;DR

2012-09-04 Thread Keith Wiley
If the datanode is definitely not running out of space, and the overall system 
has basically been working leading up to the "replicated to 0 nodes" error 
(which proves the configuration and permissions are all basically correct), 
then what other explanations are there for why hdfs would suddenly start 
exhibiting this error out of the blue?

Thanks.

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Luminous beings are we, not this crude matter."
   --  Yoda




Can't get out of safemode

2012-09-04 Thread Keith Wiley
Observe:

~/ $ hd fs -put test /test
put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create 
file/test. Name node is in safe mode.
~/ $ hadoop dfsadmin -safemode leave
Safe mode is OFF
~/ $ hadoop dfsadmin -safemode get
Safe mode is ON
~/ $ hadoop dfsadmin -safemode leave
Safe mode is OFF
~/ $ hadoop dfsadmin -safemode get
Safe mode is ON
~/ $ hd fs -put test /test
put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create 
file/test. Name node is in safe mode.
~/ $

Grrr!


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
   --  Galileo Galilei




Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
On Sep 4, 2012, at 10:05 , Suresh Srinivas wrote:

> When these errors are thrown, please send the namenode web UI information. It 
> has storage related information in the cluster summary. That will help debug.

Sure thing.  Thanks.  Here's what I currently see.  It looks like the problem 
isn't the datanode, but rather the namenode.  Would you agree with that 
assessment?

NameNode 'localhost:9000'

Started: Tue Sep 04 10:06:52 PDT 2012
Version: 0.20.2-cdh3u3, 03b655719d13929bd68bb2c2f9cee615b389cea9 
Compiled:Thu Jan 26 11:55:16 PST 2012 by root from Unknown
Upgrades:There are no upgrades in progress.

Browse the filesystem
Namenode Logs
Cluster Summary

Safe mode is ON. Resources are low on NN. Safe mode must be turned off manually.
1639 files and directories, 585 blocks = 2224 total. Heap Size is 39.55 MB / 
888.94 MB (4%) 
Configured Capacity  :   49.21 GB
DFS Used :   9.9 MB
Non DFS Used :   2.68 GB
DFS Remaining:   46.53 GB
DFS Used%:   0.02 %
DFS Remaining%   :   94.54 %
Live Nodes   :   1
Dead Nodes   :   0
Decommissioning Nodes:   0
Number of Under-Replicated Blocks:   5

NameNode Storage:

Storage Directory   TypeState
/var/lib/hadoop-0.20/cache/hadoop/dfs/name  IMAGE_AND_EDITS Active

Cloudera's Distribution including Apache Hadoop, 2012.

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"And what if we picked the wrong religion?  Every week, we're just making God
madder and madder!"
   --  Homer Simpson




Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
I had moved the data directory to the larger disk but left the namenode 
directory on the smaller disk figuring it didn't need much room.  Moving that 
to the larger disk seems to have improved the situation...although I'm still 
surprised the NN needed so much room.

Problem is solved for now.


Thanks.
________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
   --  Abe (Grandpa) Simpson




Re: could only be replicated to 0 nodes, instead of 1

2012-09-04 Thread Keith Wiley
Good to know.  The bottom line is I was really short-roping everything on 
resources.  I just need to jack the machine up some.

Thanks.

On Sep 4, 2012, at 19:41 , Harsh J wrote:

> Keith,
> 
> The NameNode has a resource-checker thread in it by design to help
> prevent cases of on-disk metadata corruption in event of filled up
> dfs.namenode.name.dir disks, etc.. By default, an NN will lock itself
> up if the free disk space (among its configured metadata mounts)
> reaches a value < 100 MB, controlled by
> dfs.namenode.resource.du.reserved. You can probably set that to 0 if
> you do not want such an automatic preventive measure. Its not exactly
> a need, just a check to help avoid accidental data loss due to
> non-monitoring of disk space.
> 
> On Tue, Sep 4, 2012 at 11:33 PM, Keith Wiley  wrote:
>> I had moved the data directory to the larger disk but left the namenode 
>> directory on the smaller disk figuring it didn't need much room.  Moving 
>> that to the larger disk seems to have improved the situation...although I'm 
>> still surprised the NN needed so much room.
>> 
>> Problem is solved for now.
>> 
>> 
>> Thanks.
>> 
>> Keith Wiley kwi...@keithwiley.com keithwiley.com
>> music.keithwiley.com
>> 
>> "I used to be with it, but then they changed what it was.  Now, what I'm with
>> isn't it, and what's it seems weird and scary to me."
>>   --  Abe (Grandpa) Simpson
>> ________
>> 
> 
> 
> 
> -- 
> Harsh J



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"You can scratch an itch, but you can't itch a scratch. Furthermore, an itch can
itch but a scratch can't scratch. Finally, a scratch can itch, but an itch can't
scratch. All together this implies: He scratched the itch from the scratch that
itched but would never itch the scratch from the itch that scratched."
   --  Keith Wiley




Can't get away from "replicated to 0 nodes, instead of 1"

2012-09-06 Thread Keith Wiley
ache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

at org.apache.hadoop.ipc.Client.call(Client.java:1107)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy0.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3553)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3421)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2100(DFSClient.java:2627)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2822)

________
Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
   --  Abe (Grandpa) Simpson




Re: Can't get out of safemode

2012-09-09 Thread Keith Wiley
Yep, moving the NN to the secondary larger disk seems to have improved it.  
Thanks again for your great help Harsh (and others).

Cheers!

On Sep 7, 2012, at 08:31 , Harsh J wrote:

> The issue was that Keith's NN was out of resources (low disk space)
> and that forced the NN into safemode. This was detected on another
> thread from him.
> 
> On Fri, Sep 7, 2012 at 1:12 PM, Dino Kečo  wrote:
>> Hi Keith,
>> 
>> I was having same problem on my small cluster and some blocks where missing
>> on DFS. I have used this command
>> 
>> hadoop dfsadmin -safemode leave
>> 
>> and after that i have run dfs disk check to see which files are corrupted.
>> 
>> Hope this helps,
>> Dino Kečo
>> msn: xdi...@hotmail.com
>> mail: dino.k...@gmail.com
>> skype: dino.keco
>> phone: +387 61 507 851
>> 
>> 
>> On Fri, Sep 7, 2012 at 12:13 AM, Adam Brown  wrote:
>>> 
>>> sorry
>>> 
>>> @Keith
>>> 
>>> are you sure your datanodes have reported in?
>>> 
>>> On Thu, Sep 6, 2012 at 3:12 PM, Adam Brown  wrote:
>>>> Hi Serge,
>>>> 
>>>> are you sure your datanodes have reported in ?
>>>> 
>>>> 
>>>> 
>>>> On Tue, Sep 4, 2012 at 10:10 AM, Serge Blazhiyevskyy
>>>>  wrote:
>>>>> Can look in name node logs and post last few lines?
>>>>> 
>>>>> 
>>>>> 
>>>>> On 9/4/12 10:07 AM, "Keith Wiley"  wrote:
>>>>> 
>>>>>> Observe:
>>>>>> 
>>>>>> ~/ $ hd fs -put test /test
>>>>>> put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
>>>>>> create file/test. Name node is in safe mode.
>>>>>> ~/ $ hadoop dfsadmin -safemode leave
>>>>>> Safe mode is OFF
>>>>>> ~/ $ hadoop dfsadmin -safemode get
>>>>>> Safe mode is ON
>>>>>> ~/ $ hadoop dfsadmin -safemode leave
>>>>>> Safe mode is OFF
>>>>>> ~/ $ hadoop dfsadmin -safemode get
>>>>>> Safe mode is ON
>>>>>> ~/ $ hd fs -put test /test
>>>>>> put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot
>>>>>> create file/test. Name node is in safe mode.
>>>>>> ~/ $
>>>>>> 
>>>>>> Grrr!



Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"Yet mark his perfect self-contentment, and hence learn his lesson, that to be
self-contented is to be vile and ignorant, and that to aspire is better than to
be blindly and impotently happy."
   --  Edwin A. Abbott, Flatland