help!!Where can I get the benchmark of text anal ysis using Hadoop?

2010-12-13 Thread Jander g
Hi, all

I recently research the text Analysis using Hadoop, but I don't have the
benchmark, so I can't evaluate the result of my application.

Any help will be appreciated.

-- 
Thanks,
Jander


Re: exceptions copying files into HDFS

2010-12-13 Thread Sanford Rockowitz
As I posted the problem about getting HDFS to work in pseudo-distributed 
mode, I should post the solution as well.  Apparently, the Java 
environment (JAVA_HOME, etc) was not set up  properly on the daemons, 
which in hindsight explains the exceptions in the Java NIO Socket code.  
I moved the definitions of JAVA_HOME, HADOOP_INSTALL, and PATH from 
.profile to .bashrc to ensure they get set for each shell, and the 
problems resolved.


Sanford


On 12/11/2010 10:41 PM, Sanford Rockowitz wrote:

Folks,

I'm a Hadoop newbie, and I hope this is an appropriate place to post 
this question.


I'm trying to work through the initial examples.  When I try to copy 
files into HDFS, hadoop throws exceptions.   I imagine it's something 
in my configuration, but I'm at a loss to figure out what.


I'm running on openSuSE 11.3, using Oracle Java 1.6.0_23.  The problem 
occurs whether I use 32 bit or 64 bit Java.   The problem occurs in 
both vanilla Apache hadoop 0.20.2 and Cloudera's 0.20.2+737.


Following are the console output, the datanode log file, and the 
relevant configuration files.


Thanks in advance for any pointers.

Sanford

=== CONSOLE ===

r...@ritter:~/programs/hadoop-0.20.2+737> hadoop fs -put conf input
10/12/11 21:04:41 INFO hdfs.DFSClient: Exception in 
createBlockOutputStream java.io.EOFException
10/12/11 21:04:41 INFO hdfs.DFSClient: Abandoning block 
blk_1699203955671139323_1010

10/12/11 21:04:41 INFO hdfs.DFSClient: Excluding datanode 127.0.0.1:50010
10/12/11 21:04:41 WARN hdfs.DFSClient: DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/user/rock/input/fair-scheduler.xml could only be replicated to 0 
nodes, instead of 1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1415)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:588)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1313)

at org.apache.hadoop.ipc.Client.call(Client.java:1054)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

at $Proxy0.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3166)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3036)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2288)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2483)


10/12/11 21:04:41 WARN hdfs.DFSClient: Error Recovery for block 
blk_1699203955671139323_1010 bad datanode[0] nodes == null
10/12/11 21:04:41 WARN hdfs.DFSClient: Could not get block locations. 
Source file "/user/rock/input/fair-scheduler.xml" - Aborting...
put: java.io.IOException: File /user/rock/input/fair-scheduler.xml 
could only be replicated to 0 nodes, instead of 1
10/12/11 21:04:41 ERROR hdfs.DFSClient: Exception closing file 
/user/rock/input/fair-scheduler.xml : 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/user/rock/input/fair-scheduler.xml could only be replicated to 0 
nodes, instead of 1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1415)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:588)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.M

Re: Task fails: starts over with first input key?

2010-12-13 Thread Eric Sammer
What you are seeing is correct and the intended behavior. The unit of work
in a MR job is the task. If something causes the task to fail, it starts
again. Any output from the failed task attempt is throw away. The reducers
will not see the output of the failed map tasks at all. There is no way
(within Hadoop proper) to teach a task to be stateful, nor should you as you
lose a lot of flexibility with respect to features like speculative
execution and the ability to deal with failures of the machine (unless you
maintained task state in HDFS or another external system). It's just not
worth.

On Mon, Dec 13, 2010 at 7:51 PM, Keith Wiley  wrote:

> I think I am seeing a behavior in which if a mapper task fails (crashes) on
> one input key/value, the entire task is rescheduled and rerun, starting over
> again from the first input key/value even if all of the inputs preceding the
> troublesome input were processed successfully.
>
> Am I correct about this or am I seeing something that isn't there?
>
> If I am correct, what happens to the outputs of the successful duplicate
> map() calls?  Which output key/value is the one that is sent to shuffle (and
> a reducer): Is it the result of the first attempt on the input in question
> or the result of the last attempt?
>
> Is there any way to prevent it from recalculating those duplicate inputs
> other than something manual on the side like keeping a job-log of the map
> attempts and scanning the log at the beginning of each map() call?
>
> Thanks.
>
>
> 
> Keith Wiley   kwi...@keithwiley.com
> www.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>  -- Abe (Grandpa) Simpson
>
> 
>
>
>
>


-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com


Re: how to run jobs every 30 minutes?

2010-12-13 Thread Ted Dunning
Or even simpler, try Azkaban: http://sna-projects.com/azkaban/

On Mon, Dec 13, 2010 at 9:26 PM, edward choi  wrote:

> Thanks for the tip. I took a look at it.
> Looks similar to Cascading I guess...?
> Anyway thanks for the info!!
>
> Ed
>
> 2010/12/8 Alejandro Abdelnur 
>
> > Or, if you want to do it in a reliable way you could use an Oozie
> > coordinator job.
> >
> > On Wed, Dec 8, 2010 at 1:53 PM, edward choi  wrote:
> > > My mistake. Come to think about it, you are right, I can just make an
> > > infinite loop inside the Hadoop application.
> > > Thanks for the reply.
> > >
> > > 2010/12/7 Harsh J 
> > >
> > >> Hi,
> > >>
> > >> On Tue, Dec 7, 2010 at 2:25 PM, edward choi  wrote:
> > >> > Hi,
> > >> >
> > >> > I'm planning to crawl a certain web site every 30 minutes.
> > >> > How would I get it done in Hadoop?
> > >> >
> > >> > In pure Java, I used Thread.sleep() method, but I guess this won't
> > work
> > >> in
> > >> > Hadoop.
> > >>
> > >> Why wouldn't it? You need to manage your post-job logic mostly, but
> > >> sleep and resubmission should work just fine.
> > >>
> > >> > Or if it could work, could anyone show me an example?
> > >> >
> > >> > Ed.
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Harsh J
> > >> www.harshj.com
> > >>
> > >
> >
>


Re: Question about AvatarNode

2010-12-13 Thread Ted Yu
Check out the code on github
You can find
contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarZooKeeperClient.java

On Sun, Dec 12, 2010 at 11:54 PM, ChingShen  wrote:

> Hi all,
>
>   I read the "Looking at the code behind our three uses of Apache Hadoop"(
> http://www.facebook.com/note.php?note_id=468211193919&comments) article
> that
> mentioned the primary AvatarNode and the standby AvatarNode are coordinated
> via Apache ZooKeeper, but why can't I find any code snippet about ZooKeeper
> in HDFS-976?
>
> Thanks.
>
> Shen
>


Re: how to run jobs every 30 minutes?

2010-12-13 Thread edward choi
Thanks for the tip. I took a look at it.
Looks similar to Cascading I guess...?
Anyway thanks for the info!!

Ed

2010/12/8 Alejandro Abdelnur 

> Or, if you want to do it in a reliable way you could use an Oozie
> coordinator job.
>
> On Wed, Dec 8, 2010 at 1:53 PM, edward choi  wrote:
> > My mistake. Come to think about it, you are right, I can just make an
> > infinite loop inside the Hadoop application.
> > Thanks for the reply.
> >
> > 2010/12/7 Harsh J 
> >
> >> Hi,
> >>
> >> On Tue, Dec 7, 2010 at 2:25 PM, edward choi  wrote:
> >> > Hi,
> >> >
> >> > I'm planning to crawl a certain web site every 30 minutes.
> >> > How would I get it done in Hadoop?
> >> >
> >> > In pure Java, I used Thread.sleep() method, but I guess this won't
> work
> >> in
> >> > Hadoop.
> >>
> >> Why wouldn't it? You need to manage your post-job logic mostly, but
> >> sleep and resubmission should work just fine.
> >>
> >> > Or if it could work, could anyone show me an example?
> >> >
> >> > Ed.
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >> www.harshj.com
> >>
> >
>


Re: Hadoop 0.20.2 with eclipse in windows

2010-12-13 Thread Matthew John
I tried installing using this link, but as in the tutorial when I try to run
bin/hadoop namenode -format
it gives the following error :

bin/hadoop: line 2 : $'\r' : command not found
and many such statements..

I ve given the local jdk folder as the java_home.
Not sure why this is showing up. Ve not used Cygwin till now..

Matthew

On Tue, Dec 14, 2010 at 9:38 AM, Harsh J  wrote:

> Hi,
>
> On Tue, Dec 14, 2010 at 9:22 AM, Matthew John
>  wrote:
> > Hi all,
> >
> >  I have been working with Hadoop0.20.2 in linux nodes. Now I want to try
> the
> > same version with eclipse on a windows xp machine. Could someone provide
> a
> > tutorial/guidelines on how to install this setup.
>
> This page's instruction still works for running a Hadoop cluster on
> Windows + the Plugin w/ Cygwin:
> http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html
>
> >
> > thanks,
> > Matthew
> >
>
>
>
> --
> Harsh J
> www.harshj.com
>


Re: Task fails: starts over with first input key?

2010-12-13 Thread 蔡超
I have met this problem. I think the behavior (whether start from the very
begining, whether override duplicate keys) depends on the inputformat and
outputformat. When I use DBInputFormat and DBOutputFormat, it will restart
for failed task rather than the very begining.

Hope to help. I want to make the mechanism clearer, too.


Cai Chao

On Tue, Dec 14, 2010 at 8:51 AM, Keith Wiley  wrote:

> I think I am seeing a behavior in which if a mapper task fails (crashes) on
> one input key/value, the entire task is rescheduled and rerun, starting over
> again from the first input key/value even if all of the inputs preceding the
> troublesome input were processed successfully.
>
> Am I correct about this or am I seeing something that isn't there?
>
> If I am correct, what happens to the outputs of the successful duplicate
> map() calls?  Which output key/value is the one that is sent to shuffle (and
> a reducer): Is it the result of the first attempt on the input in question
> or the result of the last attempt?
>
> Is there any way to prevent it from recalculating those duplicate inputs
> other than something manual on the side like keeping a job-log of the map
> attempts and scanning the log at the beginning of each map() call?
>
> Thanks.
>
>
> 
> Keith Wiley   kwi...@keithwiley.com
> www.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>  -- Abe (Grandpa) Simpson
>
> 
>
>
>
>


Re: Hadoop 0.20.2 with eclipse in windows

2010-12-13 Thread Harsh J
Hi,

On Tue, Dec 14, 2010 at 9:22 AM, Matthew John
 wrote:
> Hi all,
>
>  I have been working with Hadoop0.20.2 in linux nodes. Now I want to try the
> same version with eclipse on a windows xp machine. Could someone provide a
> tutorial/guidelines on how to install this setup.

This page's instruction still works for running a Hadoop cluster on
Windows + the Plugin w/ Cygwin:
http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html

>
> thanks,
> Matthew
>



-- 
Harsh J
www.harshj.com


Hadoop 0.20.2 with eclipse in windows

2010-12-13 Thread Matthew John
Hi all,

 I have been working with Hadoop0.20.2 in linux nodes. Now I want to try the
same version with eclipse on a windows xp machine. Could someone provide a
tutorial/guidelines on how to install this setup.

thanks,
Matthew


Re: Task fails: starts over with first input key?

2010-12-13 Thread li ping
I think the "*org.apache.hadoop.mapred.SkipBadRecords*" is you are looking
for.



On Tue, Dec 14, 2010 at 8:51 AM, Keith Wiley  wrote:

> I think I am seeing a behavior in which if a mapper task fails (crashes) on
> one input key/value, the entire task is rescheduled and rerun, starting over
> again from the first input key/value even if all of the inputs preceding the
> troublesome input were processed successfully.
>
> Am I correct about this or am I seeing something that isn't there?
>
> If I am correct, what happens to the outputs of the successful duplicate
> map() calls?  Which output key/value is the one that is sent to shuffle (and
> a reducer): Is it the result of the first attempt on the input in question
> or the result of the last attempt?
>
> Is there any way to prevent it from recalculating those duplicate inputs
> other than something manual on the side like keeping a job-log of the map
> attempts and scanning the log at the beginning of each map() call?
>
> Thanks.
>
>
> 
> Keith Wiley   kwi...@keithwiley.com
> www.keithwiley.com
>
> "I used to be with it, but then they changed what it was.  Now, what I'm
> with
> isn't it, and what's it seems weird and scary to me."
>  -- Abe (Grandpa) Simpson
>
> 
>
>
>
>


-- 
-李平


Task fails: starts over with first input key?

2010-12-13 Thread Keith Wiley
I think I am seeing a behavior in which if a mapper task fails (crashes) on one 
input key/value, the entire task is rescheduled and rerun, starting over again 
from the first input key/value even if all of the inputs preceding the 
troublesome input were processed successfully.

Am I correct about this or am I seeing something that isn't there?

If I am correct, what happens to the outputs of the successful duplicate map() 
calls?  Which output key/value is the one that is sent to shuffle (and a 
reducer): Is it the result of the first attempt on the input in question or the 
result of the last attempt?

Is there any way to prevent it from recalculating those duplicate inputs other 
than something manual on the side like keeping a job-log of the map attempts 
and scanning the log at the beginning of each map() call?

Thanks.


Keith Wiley   kwi...@keithwiley.com   www.keithwiley.com

"I used to be with it, but then they changed what it was.  Now, what I'm with
isn't it, and what's it seems weird and scary to me."
  -- Abe (Grandpa) Simpson






Re: files that don't really exist?

2010-12-13 Thread Seth Lepzelter
Alright, a little further investigation along that line (thanks for the 
hint, can't believe I didn't think of that), shows that there's actually a 
carriage return character (%0D, aka \r) at the end of the filename.

using the hdfs web ui, I browsed to his directory, and the ui wants to send 
me to /user/ken/testoutput5%0D

which also kind of matches the output of hadoop fs -lsr:

drwx--   - ken users0 2010-08-26 19:48 /user/ken/testoutput5
/_logs   - ken users  0 2010-08-26 19:48 /user/ken/testoutput5
/_logs/history ken users  0 2010-08-26 19:48 /user/ken/testoutput5


hadoop fs -ls /user/ken/test*

results in:
ls: Cannot access /user/ken/test*: No such file or directory.


I guess *, in hadoop's parlance, doesn't include a \r.

got a \r into the command line, -rmr'ed that, it's now fixed.


Thanks!
-Seth

On Mon, Dec 13, 2010 at 08:57:35PM +, Allen Wittenauer wrote:
> 
> On Dec 13, 2010, at 8:51 AM, Seth Lepzelter wrote:
> 
> > I've got a smallish cluster of 12 nodes up from 6, that we're using to dip 
> > our feet into hadoop.  One of my users has a few directories in his HDFS 
> > home which he was using to test, and which exist, according to 
> > 
> > hadoop fs -ls 
> > 
> > ie:
> > 
> > ...
> > /user/ken/testoutput4
> > /user/ken/testoutput5
> > ...
> > 
> > but if you do:
> > 
> > hadoop fs -ls /user/ken/testoutput5
> > 
> > you get:
> > 
> > ls: Cannot access /user/ken/testoutput5: No such file or directory.
> 
> 
> There is likely one or more spaces after the testoutput5 .  Try using hadoop 
> fs -ls /user/ken/*/* .
> 


Re: Multicore Nodes

2010-12-13 Thread Allen Wittenauer

On Dec 11, 2010, at 3:09 AM, Rob Stewart wrote:
> Or - is there support in Hadoop for multi-core nodes? 

Be aware that writing a job that is specifically uses multi-threaded 
tasks usually means that a) you probably aren't really doing map/reduce anymore 
and b) the job will likely tickle bugs in the system, as some APIs are more 
multi-thread safe than others.  (See HDFS-1526, for example). 

Re: files that don't really exist?

2010-12-13 Thread Allen Wittenauer

On Dec 13, 2010, at 8:51 AM, Seth Lepzelter wrote:

> I've got a smallish cluster of 12 nodes up from 6, that we're using to dip 
> our feet into hadoop.  One of my users has a few directories in his HDFS 
> home which he was using to test, and which exist, according to 
> 
> hadoop fs -ls 
> 
> ie:
> 
> ...
> /user/ken/testoutput4
> /user/ken/testoutput5
> ...
> 
> but if you do:
> 
> hadoop fs -ls /user/ken/testoutput5
> 
> you get:
> 
> ls: Cannot access /user/ken/testoutput5: No such file or directory.


There is likely one or more spaces after the testoutput5 .  Try using hadoop fs 
-ls /user/ken/*/* .



How do I log from my map/reduce application?

2010-12-13 Thread W.P. McNeill
I would like to use Hadoop's Log4j infrastructure to do logging from my
map/reduce application.  I think I've got everything set up correctly, but I
am still unable to specify the logging level I want.

By default Hadoop is set up to log at level INFO.  The first line of its
log4j.properties file looks like this:

hadoop.root.logger=INFO,console


I have an application whose reducer looks like this:

package com.me;

public class MyReducer<...> extends Reducer<...> {
   private static Logger logger =
Logger.getLogger(MyReducer.class.getName());

   ...
   protected void reduce(...) {
   logger.debug("My message");
   ...
   }
}


I've added the following line to the Hadoop log4j.properties file:

log4j.logger.com.me.MyReducer=DEBUG


I expect the Hadoop system to log at level INFO, but my application to log
at level DEBUG, so that I see "My message" in the logs for the reducer task.
 However, my application does not produce any log4j output.  If I change the
line in my reducer to read logger.info("My message") the message does get
logged, so somehow I'm failing to specify that log level for this class.

I've also tried changing the log4j line for my app to
read log4j.logger.com.me.MyReducer=DEBUG,console and get the same result.

I've been through the Hadoop and log4j documentation and I can't figure out
what I'm doing wrong.  Any suggestions?

Thanks.


files that don't really exist?

2010-12-13 Thread Seth Lepzelter
I've got a smallish cluster of 12 nodes up from 6, that we're using to dip 
our feet into hadoop.  One of my users has a few directories in his HDFS 
home which he was using to test, and which exist, according to 

hadoop fs -ls 

ie:

...
/user/ken/testoutput4
/user/ken/testoutput5
...

but if you do:

hadoop fs -ls /user/ken/testoutput5

you get:

ls: Cannot access /user/ken/testoutput5: No such file or directory.


I can even hadoop fs -mkdir /user/ken/testoutput5, and then -rmr it, and it 
works fine, but then -ls still shows it.

same for -rmr.  There's nothing important in the directories, so I'd just 
remove them, but it won't let me.  I've tried fsck'ing, no luck there.  
Anyone have an idea how I might clean this up, and how it might have 
happened in the first place?

I'm pretty sure the directories were copied over from another cluster in 
their current state, if that helps shine a light.


Any help is much appreciated.

Thanks,
Seth


Re: How can i realize the “count(distinct )” fun ction in hive ?

2010-12-13 Thread Harsh J
You don't really need to store all incoming keys. If the input comes
sorted, you can rely on matching every two values and incrementing the
count accordingly (If you do it in the reduce side it comes sorted by
the key, so for non-distinct keys, you would have more than one value;
thus all you need to do is count all reduce calls as the grouping does
the rest). Just a suggestion to avoid possible memory issues. Correct
me if am wrong, please.

On Mon, Dec 13, 2010 at 5:36 PM, 1983 ddi  wrote:
> by I  am  confused about how can I write the UDAF class, is there anybody
> who can give me a favor and thanks a lot if there is an example .

About UDFs, read this developer article at Bizo that covers it well
enough: http://dev.bizo.com/2009/06/custom-udfs-and-hive.html

-- 
Harsh J
www.harshj.com


How can i realize the “count(distinct )” functio n in hive ?

2010-12-13 Thread 1983 ddi
Hi all:
 I am trying to develop a function like "count ( distinct )" in hive ,
so here I  am trying to write a  UDAF which using a HashMap container to
store all the keys.

and at last ,It is expected to get the key size by calling map.size() in my
UDAF class , after which I will get the result like "count(distinct)" .

by I  am  confused about how can I write the UDAF class, is there anybody
who can give me a favor and thanks a lot if there is an example .


Re: exceptions copying files into HDFS

2010-12-13 Thread Adarsh Sharma

Sanford Rockowitz wrote:

Folks,

I'm a Hadoop newbie, and I hope this is an appropriate place to post 
this question.


I'm trying to work through the initial examples.  When I try to copy 
files into HDFS, hadoop throws exceptions.   I imagine it's something 
in my configuration, but I'm at a loss to figure out what.


I'm running on openSuSE 11.3, using Oracle Java 1.6.0_23.  The problem 
occurs whether I use 32 bit or 64 bit Java.   The problem occurs in 
both vanilla Apache hadoop 0.20.2 and Cloudera's 0.20.2+737.


Following are the console output, the datanode log file, and the 
relevant configuration files.


Thanks in advance for any pointers.

Sanford

=== CONSOLE ===

r...@ritter:~/programs/hadoop-0.20.2+737> hadoop fs -put conf input
10/12/11 21:04:41 INFO hdfs.DFSClient: Exception in 
createBlockOutputStream java.io.EOFException
10/12/11 21:04:41 INFO hdfs.DFSClient: Abandoning block 
blk_1699203955671139323_1010

10/12/11 21:04:41 INFO hdfs.DFSClient: Excluding datanode 127.0.0.1:50010
10/12/11 21:04:41 WARN hdfs.DFSClient: DataStreamer Exception: 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/user/rock/input/fair-scheduler.xml could only be replicated to 0 
nodes, instead of 1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1415) 

at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:588) 


at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 


at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) 


at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1313)

at org.apache.hadoop.ipc.Client.call(Client.java:1054)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 


at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) 

at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) 


at $Proxy0.addBlock(Unknown Source)
at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3166) 

at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3036) 

at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2288) 

at 
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2483) 



10/12/11 21:04:41 WARN hdfs.DFSClient: Error Recovery for block 
blk_1699203955671139323_1010 bad datanode[0] nodes == null
10/12/11 21:04:41 WARN hdfs.DFSClient: Could not get block locations. 
Source file "/user/rock/input/fair-scheduler.xml" - Aborting...
put: java.io.IOException: File /user/rock/input/fair-scheduler.xml 
could only be replicated to 0 nodes, instead of 1
10/12/11 21:04:41 ERROR hdfs.DFSClient: Exception closing file 
/user/rock/input/fair-scheduler.xml : 
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File 
/user/rock/input/fair-scheduler.xml could only be replicated to 0 
nodes, instead of 1
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1415) 

at 
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:588) 


at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 


at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:528)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1319)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1315)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGro