Re: Socket closed Exception

2009-04-01 Thread lohit

Thanks Koji, Raghu.
This seemed to solve our problem, havent seen this happen in the past 2 days. 
What is the typical value of ipc.client.idlethreshold on big clusters.
Does default value of 4000 suffice?

Lohit



- Original Message 
From: Koji Noguchi 
To: core-user@hadoop.apache.org
Sent: Monday, March 30, 2009 9:30:04 AM
Subject: RE: Socket closed Exception

Lohit, 

You're right. We saw " java.net.SocketTimeoutException: timed out
waiting for rpc response" and not Socket closed exception.

If you're getting "closed exception", then I don't remember seeing that
problem on our clusters.

Our users often report "Socket closed exception" as a problem, but in
most cases those failures are due to jobs failing with completely
different reasons and race condition between 1) JobTracker removing
directory/killing tasks and 2) tasks failing with closed exception
before they get killed.

Koji



-Original Message-
From: lohit [mailto:lohit...@yahoo.com] 
Sent: Monday, March 30, 2009 8:51 AM
To: core-user@hadoop.apache.org
Subject: Re: Socket closed Exception


Thanks Koji. 
If I look at the code, NameNode (RPC Server) seems to tear down idle
connections. Did you see 'Socket closed' exception instead of 'timed out
waiting for socket'?
We seem to hit the 'Socket closed' exception where client do not
timeout, but get back socket closed exception when they do RPC for
create/open/getFileInfo.

I will give this a try. Thanks again,
Lohit



- Original Message 
From: Koji Noguchi 
To: core-user@hadoop.apache.org
Sent: Sunday, March 29, 2009 11:44:29 PM
Subject: RE: Socket closed Exception

Hi Lohit,

My initial guess would be
https://issues.apache.org/jira/browse/HADOOP-4040

When this happened on our 0.17 cluster, all of our (task) clients were
using the max idle time of 1 hour due to this bug instead of the
configured value of a few seconds.
Thus each client kept the connection up much longer than we expected.
(Not sure if this applies to your 0.15 cluster, but it sounds similar to
what we observed.)

This worked until namenode started hitting the max limit of '
ipc.client.idlethreshold'.  

  ipc.client.idlethreshold
  4000
  Defines the threshold number of connections after which
   connections will be inspected for idleness.
  

When inspecting for idleness, namenode uses

  ipc.client.maxidletime
  12
  Defines the maximum idle time for a connected client 
   after which it may be disconnected.
  

As a result, many connections got disconnected at once.
Clients only see the timeouts when they try to re-use that sockets the
next time and wait for 1 minute.  That's why they are not exactly at the
same time, but *almost* the same time.


# If this solves your problem, Raghu should get the credit. 
  He spent so many hours to solve this mystery for us. :)


Koji


-Original Message-
From: lohit [mailto:lohit...@yahoo.com] 
Sent: Sunday, March 29, 2009 11:56 AM
To: core-user@hadoop.apache.org
Subject: Socket closed Exception


Recently we are seeing lot of Socket closed exception in our cluster.
Many task's open/create/getFileInfo calls get back 'SocketException'
with message 'Socket closed'. We seem to see many tasks fail with same
error around same time. There are no warning or info messages in
NameNode /TaskTracker/Task logs. (This is on HDFS 0.15) Are there cases
where NameNode closes socket due heavy load or during conention of
resource of anykind?

Thanks,
Lohit



Re: Socket closed Exception

2009-03-30 Thread lohit

Thanks Raghu, is the log level at DEBUG? I do not see any socket close 
exception at NameNode at WARN/INFO level.
Lohit



- Original Message 
From: Raghu Angadi 
To: core-user@hadoop.apache.org
Sent: Monday, March 30, 2009 12:08:19 PM
Subject: Re: Socket closed Exception


If it is NameNode, then there is probably a log about closing the socket around 
that time.

Raghu.

lohit wrote:
> Recently we are seeing lot of Socket closed exception in our cluster. Many 
> task's open/create/getFileInfo calls get back 'SocketException' with message 
> 'Socket closed'. We seem to see many tasks fail with same error around same 
> time. There are no warning or info messages in NameNode /TaskTracker/Task 
> logs. (This is on HDFS 0.15) Are there cases where NameNode closes socket due 
> heavy load or during conention of resource of anykind?
> 
> Thanks,
> Lohit
> 


Re: Socket closed Exception

2009-03-30 Thread lohit

Thanks Koji. 
If I look at the code, NameNode (RPC Server) seems to tear down idle 
connections. Did you see 'Socket closed' exception instead of 'timed out 
waiting for socket'?
We seem to hit the 'Socket closed' exception where client do not timeout, but 
get back socket closed exception when they do RPC for create/open/getFileInfo.

I will give this a try. Thanks again,
Lohit



- Original Message 
From: Koji Noguchi 
To: core-user@hadoop.apache.org
Sent: Sunday, March 29, 2009 11:44:29 PM
Subject: RE: Socket closed Exception

Hi Lohit,

My initial guess would be
https://issues.apache.org/jira/browse/HADOOP-4040

When this happened on our 0.17 cluster, all of our (task) clients were
using the max idle time of 1 hour due to this bug instead of the
configured value of a few seconds.
Thus each client kept the connection up much longer than we expected.
(Not sure if this applies to your 0.15 cluster, but it sounds similar to
what we observed.)

This worked until namenode started hitting the max limit of '
ipc.client.idlethreshold'.  

  ipc.client.idlethreshold
  4000
  Defines the threshold number of connections after which
   connections will be inspected for idleness.
  

When inspecting for idleness, namenode uses

  ipc.client.maxidletime
  12
  Defines the maximum idle time for a connected client 
   after which it may be disconnected.
  

As a result, many connections got disconnected at once.
Clients only see the timeouts when they try to re-use that sockets the
next time and wait for 1 minute.  That's why they are not exactly at the
same time, but *almost* the same time.


# If this solves your problem, Raghu should get the credit. 
  He spent so many hours to solve this mystery for us. :)


Koji


-Original Message-
From: lohit [mailto:lohit...@yahoo.com] 
Sent: Sunday, March 29, 2009 11:56 AM
To: core-user@hadoop.apache.org
Subject: Socket closed Exception


Recently we are seeing lot of Socket closed exception in our cluster.
Many task's open/create/getFileInfo calls get back 'SocketException'
with message 'Socket closed'. We seem to see many tasks fail with same
error around same time. There are no warning or info messages in
NameNode /TaskTracker/Task logs. (This is on HDFS 0.15) Are there cases
where NameNode closes socket due heavy load or during conention of
resource of anykind?

Thanks,
Lohit


Socket closed Exception

2009-03-29 Thread lohit

Recently we are seeing lot of Socket closed exception in our cluster. Many 
task's open/create/getFileInfo calls get back 'SocketException' with message 
'Socket closed'. We seem to see many tasks fail with same error around same 
time. There are no warning or info messages in NameNode /TaskTracker/Task logs. 
(This is on HDFS 0.15) Are there cases where NameNode closes socket due heavy 
load or during conention of resource of anykind?

Thanks,
Lohit



Re: How to read output files over HDFS

2009-03-11 Thread lohit

http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

Loht



- Original Message 
From: Amandeep Khurana 
To: core-user@hadoop.apache.org
Sent: Wednesday, March 11, 2009 9:46:09 PM
Subject: Re: How to read output files over HDFS

2 ways that I can think of:
1. Write another MR job without a reducer. The mapper can be made to do
whatever logic you want to do.
OR
2. Take an instance of DistributedFileSystem class in your java code and use
it to read the file from HDFS.




Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Mar 11, 2009 at 9:23 PM, Muhammad Arshad  wrote:

> Hi,
>
> I am running multiple MapReduce jobs which generate their output in
> directories named output0, output1, output2, ...etc. Once these jobs
> complete i want to read the output stored in these files(line by line) using
> a Java code automatically.
>
> Kindly tell me how i can do this.
>
> I do not want to use 'hadoop dfs -get ... ...' command to first bring the
> output files to local directory. I would be greatful if somebody can write
> me a snipped of code for doing this task.
>
> thanks,
> --umer
>
>
>
>



Re: What happens when you do a ctrl-c on a big dfs -rmr

2009-03-11 Thread lohit

When you issue -rmr with directory, namenode get a directory name and starts 
deleting files recursively. It adds the blocks belonging to files to invalidate 
list. NameNode then deletes those blocks lazily. So, yes it will issue command 
to datanodes to delete those blocks, just give it some time. You do not need to 
reformat HDFS.
Lohit



- Original Message 
From: bzheng 
To: core-user@hadoop.apache.org
Sent: Wednesday, March 11, 2009 7:48:41 PM
Subject: What happens when you do a ctrl-c on a big dfs -rmr


I did a ctrl-c immediately after issuing a hadoop dfs -rmr command.  The rmr
target is no longer visible from the dfs -ls command.  The number of files
deleted is huge and I don't think it can possibly delete them all between
the time the command is issued and ctrl-c.  Does this mean it leaves behind
unreachable files on the slave nodes and making them dead weights?  We can
always reformat hdfs to be sure.  But is there a way to check?  Thanks.
-- 
View this message in context: 
http://www.nabble.com/What-happens-when-you-do-a-ctrl-c-on-a-big-dfs--rmr-tp22468909p22468909.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: HDFS is corrupt, need to salvage the data.

2009-03-09 Thread lohit

How many Datanodes do you have.
>From the output it looks like at the point when you ran fsck, you had only one 
>datanode connected to your NameNode. Did you have others?
Also, I see that your default replication is set to 1. Can you check if  your 
datanodes are up and running.
Lohit



- Original Message 
From: Mayuran Yogarajah 
To: core-user@hadoop.apache.org
Sent: Monday, March 9, 2009 5:20:37 PM
Subject: HDFS is corrupt, need to salvage the data.

Hello, it seems the HDFS in my cluster is corrupt.  This is the output from 
hadoop fsck:
Total size:9196815693 B
Total dirs:17
Total files:   157
Total blocks:  157 (avg. block size 58578443 B)

CORRUPT FILES:157
MISSING BLOCKS:   157
MISSING SIZE: 9196815693 B

Minimally replicated blocks:   0 (0.0 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:1
Average block replication: 0.0
Missing replicas:  0
Number of data-nodes:  1
Number of racks:   1

It seems to say that there is 1 block missing from every file that was in the 
cluster..

I'm not sure how to proceed so any guidance would be much appreciated.  My 
primary
concern is recovering the data.

thanks



Re: using HDFS for a distributed storage system

2009-02-09 Thread lohit
> I am planning to add the individual files initially, and after a while (lets
> say 2 days after insertion) will make a SequenceFile out of each directory
> (I am currently looking into SequenceFile) and delete the previous files of
> that directory from HDFS. That way in future, I can access any file given
> its directory without much effort.

Have you considered Hadoop archive? 
http://hadoop.apache.org/core/docs/current/hadoop_archives.html
Depending on your access pattern, you could store files in archive step in the 
first place.



- Original Message 
From: Brian Bockelman 
To: core-user@hadoop.apache.org
Sent: Monday, February 9, 2009 4:00:42 PM
Subject: Re: using HDFS for a distributed storage system

Hey Amit,

That plan sounds much better.  I think you will find the system much more 
scalable.

>From our experience, it takes a while to get the right amount of monitoring 
>and infrastructure in place to have a very dependable system with 2 replicas.  
>I would recommend using 3 replicas until you feel you've mastered the setup.

Brian

On Feb 9, 2009, at 4:27 PM, Amit Chandel wrote:

> Thanks Brian for your inputs.
> 
> I am eventually targeting to store 200k directories each containing  75
> files on avg, with average size of directory being 300MB (ranging from 50MB
> to 650MB) in this storage system.
> 
> It will mostly be an archival storage from where I should be able to access
> any of the old files easily. But the recent directories would be accessed
> frequently for a day or 2 as they are being added. They are added in batches
> of 500-1000 per week, and there can be rare bursts of adding 50k directories
> once in 3 months. One such burst is about to come in a month, and I want to
> test the current test setup against that burst. We have upgraded our test
> hardware a little bit from what I last mentioned. The test setup will have 3
> DataNodes with 15TB space on each, 6G RAM, dual core processor, and a
> NameNode 500G storage, 6G RAM, dual core processor.
> 
> I am planning to add the individual files initially, and after a while (lets
> say 2 days after insertion) will make a SequenceFile out of each directory
> (I am currently looking into SequenceFile) and delete the previous files of
> that directory from HDFS. That way in future, I can access any file given
> its directory without much effort.
> Now that SequenceFile is in picture, I can make default block size to 64MB
> or even 128MB. For replication, I am just replicating a file at 1 extra
> location (i.e. replication factor = 2, since a replication factor 3 will
> leave me with only 33% of the usable storage). Regarding reading back from
> HDFS, if I can read at ~50MBps (for recent files), that would be enough.
> 
> Let me know if you see any more pitfalls in this setup, or have more
> suggestions. I really appreciate it. Once I test this setup, I will put the
> results back to the list.
> 
> Thanks,
> Amit
> 
> 
> On Mon, Feb 9, 2009 at 12:39 PM, Brian Bockelman wrote:
> 
>> Hey Amit,
>> 
>> Your current thoughts on keeping block size larger and removing the very
>> small files are along the right line.  Why not chose the default size of
>> 64MB or larger?  You don't seem too concerned about the number of replicas.
>> 
>> However, you're still fighting against the tide.  You've got enough files
>> that you'll be pushing against block report and namenode limitations,
>> especially with 20 - 50 million files.  We find that about 500k blocks per
>> node is a good stopping point right now.
>> 
>> You really, really need to figure out how to organize your files in such a
>> way that the average file size is above 64MB.  Is there a "primary key" for
>> each file?  If so, maybe consider HBASE?  If you just are going to be
>> sequentially scanning through all your files, consider archiving them all to
>> a single sequence file.
>> 
>> Your individual data nodes are quite large ... I hope you're not expecting
>> to measure throughput in 10's of Gbps?
>> 
>> It's hard to give advice without knowing more about your application.  I
>> can tell you that you're going to run into a significant wall if you can't
>> figure out a means for making your average file size at least greater than
>> 64MB.
>> 
>> Brian
>> 
>> On Feb 8, 2009, at 10:06 PM, Amit Chandel wrote:
>> 
>> Hi Group,
>>> 
>>> I am planning to use HDFS as a reliable and distributed file system for
>>> batch operations. No plans as of now to run any map reduce job on top of
>>> it,
>>> but in future we will be having map reduce operations on files in HDFS.
>>> 
>>> The current (test) system has 3 machines:
>>> NameNode: dual core CPU, 2GB RAM, 500GB HDD
>>> 2 DataNodes: Both of them with a dual core CPU, 2GB of RAM and 1.5TB of
>>> space with ext3 filesystem.
>>> 
>>> I just need to put and retrieve files from this system. The files which I
>>> will put in HDFS varies from a few Bytes to a around 100MB, with the
>>> average
>>> file-size being 5MB. and the number of files wo

Re: copyFromLocal *

2009-02-09 Thread lohit
Which version of hadoop are you using.
I think from 0.18 or 0.19 copyFromLocal accepts multiple files as input but 
destination should be a directory.

Lohit



- Original Message 
From: S D 
To: Hadoop Mailing List 
Sent: Monday, February 9, 2009 3:34:22 PM
Subject: copyFromLocal *

I'm using the Hadoop FS shell to move files into my data store (either HDFS
or S3Native). I'd like to use wildcard with copyFromLocal but this doesn't
seem to work. Is there any way I can get that kind of functionality?

Thanks,
John



Re: Backing up HDFS?

2009-02-09 Thread lohit
We copy over selected files from HDFS to KFS and use an instance of KFS as 
backup file system.
We use distcp to take backup.
Lohit



- Original Message 
From: Allen Wittenauer 
To: core-user@hadoop.apache.org
Sent: Monday, February 9, 2009 5:22:38 PM
Subject: Re: Backing up HDFS?

On 2/9/09 4:41 PM, "Amandeep Khurana"  wrote:
> Why would you want to have another backup beyond HDFS? HDFS itself
> replicates your data so if the reliability of the system shouldnt be a
> concern (if at all it is)...

I'm reminded of a previous job where a site administrator refused to make
tape backups (despite our continual harassment and pointing out that he was
in violation of the contract) because he said RAID was "good enough".

Then the RAID controller failed. When we couldn't recover data "from the
other mirror" he was fired.  Not sure how they ever recovered, esp.
considering what the data was they lost.  Hopefully they had a paper trail.

To answer Nathan's question:

> On Mon, Feb 9, 2009 at 4:17 PM, Nathan Marz  wrote:
> 
>> How do people back up their data that they keep on HDFS? We have many TB of
>> data which we need to get backed up but are unclear on how to do this
>> efficiently/reliably.

The content of our HDFSes is loaded from elsewhere and is not considered
'the source of authority'.  It is the responsibility of the original sources
to maintain backups and we then follow their policies for data retention.
For user generated content, we provide *limited* (read: quota'ed) NFS space
that is backed up regularly.

Another strategy we take is multiple grids in multiple locations that get
the data loaded simultaneously.

The key here is to prioritize your data.  Impossible to replicate data gets
backed up using whatever means necessary, hard-to-regenerate data, next
priority. Easy to regenerate and ok to nuke data, doesn't get backed up.


Re: Bad connection to FS.

2009-02-04 Thread lohit
As noted by others NameNode is not running.
Before formatting anything (which is like deleting your data), try to see why 
NameNode isnt running.
search for value of HADOOP_LOG_DIR in ./conf/hadoop-env.sh if you have not set 
it explicitly it would default to /logs/*namenode*.log
Lohit



- Original Message 
From: Amandeep Khurana 
To: core-user@hadoop.apache.org
Sent: Wednesday, February 4, 2009 5:26:43 PM
Subject: Re: Bad connection to FS.

Here's what I had done..

1. Stop the whole system
2. Delete all the data in the directories where the data and the metadata is
being stored.
3. Format the namenode
4. Start the system

This solved my problem. I'm not sure if this is a good idea to do for you or
not. I was pretty much installing from scratch so didnt mind deleting the
files in those directories..

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Wed, Feb 4, 2009 at 3:49 PM, TCK  wrote:

>
> I believe the debug logs location is still specified in hadoop-env.sh (I
> just read the 0.19.0 doc). I think you have to shut down all nodes first
> (stop-all), then format the namenode, and then restart (start-all) and make
> sure that NameNode comes up too. We are using a very old version, 0.12.3,
> and are upgrading.
> -TCK
>
>
>
> --- On Wed, 2/4/09, Mithila Nagendra  wrote:
> From: Mithila Nagendra 
> Subject: Re: Bad connection to FS.
> To: core-user@hadoop.apache.org, moonwatcher32...@yahoo.com
> Date: Wednesday, February 4, 2009, 6:30 PM
>
> @TCK: Which version of hadoop have you installed?
> @Amandeep: I did tried reformatting the namenode, but it hasn't helped me
> out in anyway.
> Mithila
>
>
> On Wed, Feb 4, 2009 at 4:18 PM, TCK  wrote:
>
>
>
> Mithila, how come there is no NameNode java process listed by your jps
> command? I would check the hadoop namenode logs to see if there was some
> startup problem (the location of those logs would be specified in
> hadoop-env.sh, at least in the version I'm using).
>
>
> -TCK
>
>
>
>
>
>
>
> --- On Wed, 2/4/09, Mithila Nagendra  wrote:
>
> From: Mithila Nagendra 
>
> Subject: Bad connection to FS.
>
> To: "core-user@hadoop.apache.org" , "
> core-user-subscr...@hadoop.apache.org" <
> core-user-subscr...@hadoop.apache.org>
>
>
> Date: Wednesday, February 4, 2009, 6:06 PM
>
>
>
> Hey all
>
>
>
> When I try to copy a folder from the local file system in to the HDFS using
>
> the command hadoop dfs -copyFromLocal, the copy fails and it gives an error
>
> which says "Bad connection to FS". How do I get past this? The
>
> following is
>
> the output at the time of execution:
>
>
>
> had...@renweiyu-desktop:/usr/local/hadoop$ jps
>
> 6873 Jps
>
> 6299 JobTracker
>
> 6029 DataNode
>
> 6430 TaskTracker
>
> 6189 SecondaryNameNode
>
> had...@renweiyu-desktop:/usr/local/hadoop$ ls
>
> bin  docslib  README.txt
>
> build.xmlhadoop-0.18.3-ant.jar   libhdfs  src
>
> c++  hadoop-0.18.3-core.jar  librecordio  webapps
>
> CHANGES.txt  hadoop-0.18.3-examples.jar  LICENSE.txt
>
> conf hadoop-0.18.3-test.jar  logs
>
> contrib  hadoop-0.18.3-tools.jar NOTICE.txt
>
> had...@renweiyu-desktop:/usr/local/hadoop$ cd ..
>
> had...@renweiyu-desktop:/usr/local$ ls
>
> bin  etc  games  gutenberg  hadoop  hadoop-0.18.3.tar.gz  hadoop-datastore
>
> include  lib  man  sbin  share  src
>
> had...@renweiyu-desktop:/usr/local$ hadoop/bin/hadoop dfs -copyFromLocal
>
> gutenberg gutenberg
>
> 09/02/04 15:58:21 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 0 time(s).
>
> 09/02/04 15:58:22 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 1 time(s).
>
> 09/02/04 15:58:23 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 2 time(s).
>
> 09/02/04 15:58:24 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 3 time(s).
>
> 09/02/04 15:58:25 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 4 time(s).
>
> 09/02/04 15:58:26 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 5 time(s).
>
> 09/02/04 15:58:27 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 6 time(s).
>
> 09/02/04 15:58:28 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 7 time(s).
>
> 09/02/04 15:58:29 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 8 time(s).
>
> 09/02/04 15:58:30 INFO ipc.Client: Retrying connect to server: localhost/
>
> 127.0.0.1:54310. Already tried 9 time(s).
>
> Bad connection to FS. command aborted.
>
>
>
> The commmand jps shows that the hadoop system s up and running. So I have
> no
>
> idea whats wrong!
>
>
>
> Thanks!
>
> Mithila
>
>
>
>
>
>
>
>
>
>
>
>
>
>



Re: Hadoop-KFS-FileSystem API

2009-02-03 Thread lohit
Hi Wasim,

Here is what you could do.
1. Deploy KFS
2. Build a hadoop-site.xml config file and set fs.default.name and other config 
variables to point to KFS as described by 
(http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/kfs/package-summary.html)
3. If you place this hadoop-site.xml is a directory say ~/foo/myconf, then you 
could use hadoop commands as ./bin/hadoop --config ~/foo/myconf fs -ls 
/foo/bar.txt 
4. If you want to use Hadoop FileSysemt API, just put this directory as the 
first entry in your classpath, so that new configuration object loads this 
hadoop-site.xml and your FileSystem API talk to KFS.
5. Alternatively you could also create an object of KosmosFileSystem, which 
extends from FileSystem. Look at org.apache.hadoop.fs.kfs.KosmosFileSystem for 
example.

Lohit



- Original Message 
From: Wasim Bari 
To: core-user@hadoop.apache.org
Sent: Tuesday, February 3, 2009 3:03:51 AM
Subject: Hadoop-KFS-FileSystem API

Hi,
I am looking to use KFS as storage with Hadoop FileSystem API.

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/kfs/package-summary.html

This page states about KFS usage with Hadoop and stated as last step to run 
map/reduce tracker.

Is it necessary to turn it on? 

How only storage works with FileSystem API ?

Thanks

Wasim


Re: Setting up version 0.19.0

2009-02-01 Thread lohit
from 0.19, package structure was changed.
DistributedFileSystem is no longer in org.apache.hadoop.dfs , it has been 
renamed to org.apache.hadoop.hdfs.

Check your sample config file in 0.19 branch conf/hadoop-default.xml, in 
particular for 

  fs.hdfs.impl
  org.apache.hadoop.hdfs.DistributedFileSystem
  The FileSystem for hdfs: uris.



Thanks,
Lohit



- Original Message 
From: Amandeep Khurana 
To: core-user@hadoop.apache.org
Sent: Sunday, February 1, 2009 10:19:56 PM
Subject: Setting up version 0.19.0

I was using 0.18.2 till and needed to upgrade to 0.19.0 for the JDBC
support. Strangely, I'm unable to get 0.19.0 to start up using exactly the
same configuration as the earlier version. Here's the error from the
namenode log file:

2009-02-01 22:16:13,775 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.RuntimeException:
java.lang.ClassNotFoundException:
org.apache.hadoop.dfs.DistributedFileSystem
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:720)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1362)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:56)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1379)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120)
at org.apache.hadoop.fs.Trash.(Trash.java:62)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:166)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.dfs.DistributedFileSystem
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:673)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:718)
... 11 more

Does anyone have any idea about this?

Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz



Re: How to add nodes to existing cluster?

2009-01-30 Thread lohit
Just starting DataNode and TaskTracker would add it to cluster.
http://wiki.apache.org/hadoop/FAQ#25

Lohit



- Original Message 
From: Amandeep Khurana 
To: core-user@hadoop.apache.org
Sent: Friday, January 30, 2009 6:55:00 PM
Subject: How to add nodes to existing cluster?

I am trying to add nodes to an existing working cluster. Do I need to bring
the entire cluster down or just shutting down and restarting the namenode
after adding the new machine list to the slaves would work?

Amandeep



Re: stop the running job?

2009-01-12 Thread Lohit
Try
./bin/hadoop job -h


Lohit

On Jan 12, 2009, at 6:10 PM, "Samuel Guo"  wrote:

Hi all,

Is there any method that I can use to stop or suspend a runing job in
Hadoop?

Regards,

Samuel



Re: NotReplicatedYetException by 'bin/hadoop dfs' commands

2008-12-30 Thread lohit
It looks like you do not have datanodes running.
Can you check datanodes logs and see if they were started without errors.

Thanks,
Lohit



- Original Message 
From: sagar arlekar 
To: core-user@hadoop.apache.org
Sent: Tuesday, December 30, 2008 1:00:04 PM
Subject: NotReplicatedYetException by 'bin/hadoop dfs' commands

Hello,

I am new to hadoop. I am using hadoop 0.17, I am trying to run it
Pseudo-Distributed.
I get NotReplicatedYetException while executing 'bin/hadoop dfs' commands.
The following is the partial text of the exception

08/12/31 02:38:26 INFO dfs.DFSClient:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/root/in7 could only be replicated to 0 nodes, instead of 1
at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1145)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

at org.apache.hadoop.ipc.Client.call(Client.java:557)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)

08/12/31 02:38:26 WARN dfs.DFSClient: NotReplicatedYetException
sleeping /user/root/in7 retries left 4
08/12/31 02:38:27 INFO dfs.DFSClient:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/root/in7 could only be replicated to 0 nodes, instead of 1
at 
org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1145)
at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:300)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:446)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:896)

at org.apache.hadoop.ipc.Client.call(Client.java:557)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2335)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2220)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1700(DFSClient.java:1702)
at 
org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1842)


I googled but did not find a remedy, and the mailing list archive did
not have a search.

Kindly help me fix the exception.


Regards,
Sagar



Re: Copy data between HDFS instances...

2008-12-17 Thread lohit
try
hadoop distcp

more info here
http://hadoop.apache.org/core/docs/current/distcp.html
 Documentation is for current release, but looking hadoop distcp should print 
out help message.


Thanks,
Lohit

- Original Message 
From: C G 
To: core-user@hadoop.apache.org
Sent: Wednesday, December 17, 2008 7:18:51 PM
Subject: Copy data between HDFS instances...

Hi All:

I am setting up 2 grids, each with its own HDFS.  The grids are unaware of each 
other but exist on the same network.

I'd like to copy data from one HDFS to the other.  Is there a way to do this 
simply, or do I need to cobble together scripts to copy from HDFS on one side 
and pipe to a dfs -cp on the other side?

I tried something like this:

 hadoop dfs -ls hdfs://grid1NameNode:portNo/

from grid2 trying to ls on grid1 but got a "wrong FS" error message.  I also 
tried:

hadoop dfs -ls hdfs://grid1NameNode:portNo/foo

on grid2 where "/foo" exists on grid1 and got 0 files found.

I assume there is some way to do this and I just don't have the right command 
line magic.  This is Hadoop 0.15.0.

Any help appreciated.

Thanks,
C G


Re: dead node

2008-12-03 Thread lohit
I should have said does *not*stop whole cluster.


- Original Message 
From: lohit <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, December 3, 2008 3:20:25 PM
Subject: Re: dead node

Hi Nik,

Can you explain the steps you did. Was NameNode/JobTracker running on the node 
where datanode ran. 
In a cluster with more than one node stopping one datanode does stop whole 
cluster.
Thanks,
Lohit



- Original Message 
From: Nikolay Grebnev <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, December 3, 2008 1:41:01 PM
Subject: dead node

Hello

I am testing Hadoop DFS cluster and when I turn off one of my datanode
all the cluster stops answering requests.
Hadoop version 0.19.0  with default configuration.

How can I configure the cluster to prevent it?

Nik


Re: dead node

2008-12-03 Thread lohit
Hi Nik,

Can you explain the steps you did. Was NameNode/JobTracker running on the node 
where datanode ran. 
In a cluster with more than one node stopping one datanode does stop whole 
cluster.
Thanks,
Lohit



- Original Message 
From: Nikolay Grebnev <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, December 3, 2008 1:41:01 PM
Subject: dead node

Hello

I am testing Hadoop DFS cluster and when I turn off one of my datanode
all the cluster stops answering requests.
Hadoop version 0.19.0  with default configuration.

How can I configure the cluster to prevent it?

Nik



Re: Can I ignore some errors in map step?

2008-12-03 Thread lohit
There is a config variable (mapred.max.map.failures.percent) which says how 
much percentage of failure you can tolerate before marking the job as failed.
By default it is set to zero. Set this value to your desired percentage. Eg 
mapred.max.map.failures.percent =10 and if you have 100 map tasks, then you can 
have 10 map tasks fail without failing the job.
Thanks,
Lohit



- Original Message 
From: "Zhou, Yunqing" <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, December 3, 2008 5:49:57 AM
Subject: Can I ignore some errors in map step?

I'm running a job on a data with size 5TB. But currently it reports
there is a checksum error block in the file. Then it cause a map task
failure then the whole job failed.
But the lack of a 64MB block will almost not affect the final result.
So can I ignore some map task failure and continue with reduce step?

I'm using hadoop-0.18.2 with a replication factor of 1.

Thanks



Re: How to retrieve rack ID of a datanode

2008-11-26 Thread lohit
I take that back. I forgot about the changes in new version of HDFS.
If you are testing this take a look at TestReplication.java
Lohit



- Original Message 
From: Ramya R <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Cc: [EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 11:15:28 PM
Subject: RE: How to retrieve rack ID of a datanode

Hi Lohit,
  I have not set the datanode to tell namenode which rack it belongs to.
Can you please tell me how do I do it? Is it using setNetworkLocation()?

My intention is to kill the datanodes in a given rack. So it would be
useful even if I obtain the subnet each datanode belongs to.

Thanks
Ramya

-Original Message-
From: lohit [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 26, 2008 12:26 PM
To: core-user@hadoop.apache.org
Subject: Re: How to retrieve rack ID of a datanode

/default-rack is set when datanode has not set rackID. It is upto the
datanode to tell namenode which rack it belongs to.
Is your datanode doing that explicitly ?
-Lohit



- Original Message 
From: Ramya R <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 25, 2008 10:36:46 PM
Subject: How to retrieve rack ID of a datanode

Hi all,



I want to retrieve the Rack ID of every datanode. How can I do this?

I tried using getNetworkLocation() in
org.apache.hadoop.hdfs.protocol.DatanodeInfo. I am getting /default-rack
as the output for all datanodes.



Any advice?



Thank in advance

Ramya



Re: how can I decommission nodes on-the-fly?

2008-11-26 Thread lohit
As Amareshwari said, you can almost safely stop TaskTracker process on node. 
Task(s) running on that would be considered failed and would be re-executed by 
JobTracker on another node. Reason why we decomission DataNode is to protect 
against data loss. DataNode stores HDFS blocks, by decomissioning you would be 
asking NameNode to copy over the block is has to some other datanode. 

Thanks,
Lohit



- Original Message 
From: Amareshwari Sriramadasu <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 25, 2008 11:51:21 PM
Subject: Re: how can I decommission nodes on-the-fly?

Jeremy Chow wrote:
> Hi list,
>
>  I added a property dfs.hosts.exclude to my conf/hadoop-site.xml. Then
> refreshed my cluster with command
>  bin/hadoop dfsadmin -refreshNodes
> It showed that it can only shut down the DataNode process but not included
> the TaskTracker process on each slaver specified in the excludes file.
>  
Presently, decommissioning TaskTracker on-the-fly is not available.
> The jobtracker web still show that I hadnot shut down these nodes.
> How can i totally decommission these slaver nodes on-the-fly? Is it can be
> achieved only by operation on the master node?
>
>  
I think one way to shutdown a TaskTracker is to kill it.

Thanks
Amareshwari
> Thanks,
> Jeremy
>
>  


Re: How to retrieve rack ID of a datanode

2008-11-25 Thread lohit
/default-rack is set when datanode has not set rackID. It is upto the datanode 
to tell namenode which rack it belongs to.
Is your datanode doing that explicitly ?
-Lohit



- Original Message 
From: Ramya R <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 25, 2008 10:36:46 PM
Subject: How to retrieve rack ID of a datanode

Hi all,



I want to retrieve the Rack ID of every datanode. How can I do this?

I tried using getNetworkLocation() in
org.apache.hadoop.hdfs.protocol.DatanodeInfo. I am getting /default-rack
as the output for all datanodes.



Any advice?



Thank in advance

Ramya


Re: HDFS directory listing from the Java API?

2008-11-25 Thread lohit
You can see how 'hadoop dfs -ls' is implemented in FsShell::ls(Path src, 
boolean recursive, boolean printHeader) in FsShell.java 

Thanks,
Lohit

- Original Message 
From: Shane Butler <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Sent: Tuesday, November 25, 2008 8:04:48 PM
Subject: HDFS directory listing from the Java API?

Hi all,

Can someone pls guide me on how to get a directory listing of files on
HDFS using the java API (0.19.0)?

Regards,
Shane



Re: 64 bit namenode and secondary namenode & 32 bit datanode

2008-11-25 Thread lohit
Well, if I think about,  image corruption might not happen, since each 
checkpoint initiation would have unique number.

I was just wondering what would happen in this case
Consider this scenario.
Time 1 <-- SN1 asks NN image and edits to merge
Time 2 <-- SN2 asks NN image and edits to merge
Time 2 <-- SN2 returns new image
Time 3 <-- SN1 returns new image. 
I am not sure what happens here, but its best to test it out before setting up 
something like this.

And if you have multiple entries in NN file, then one SNN checkpoint would 
update all NN entries, so redundant SNN isnt buying you much.

Thanks,
Lohit



- Original Message 
From: Sagar Naik <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 25, 2008 4:32:26 PM
Subject: Re: 64 bit namenode and secondary namenode & 32 bit datanode



lohit wrote:
> I might be wrong, but my assumption is running SN either in 64/32 shouldn't 
> matter. 
> But I am curious how two instances of Secondary namenode is setup, will both 
> of them talk to same NN and running in parallel? 
> what are the advantages here.
>  
I just have multiple entries master file. I am not aware of image 
corruption (did not take look into it). I did for SNN redundancy
Pl correct me if I am wrong
Thanks
Sagar
> Wondering if there are chances of image corruption.
>
> Thanks,
> lohit
>
> - Original Message 
> From: Sagar Naik <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Tuesday, November 25, 2008 3:58:53 PM
> Subject: 64 bit namenode and secondary namenode & 32 bit datanode
>
> I am trying to migrate from 32 bit jvm and 64 bit for namenode only.
> *setup*
> NN - 64 bit
> Secondary namenode (instance 1) - 64 bit
> Secondary namenode (instance 2)  - 32 bit
> datanode- 32 bit
>
> From the mailing list I deduced that NN-64 bit and Datanode -32 bit combo 
> works
> But, I am not sure if S-NN-(instance 1--- 64 bit ) and S-NN (instance 2 -- 32 
> bit) will work with this setup.
>
> Also, do shud I be aware of any other issues for migrating over to 64 bit 
> namenode
>
> Thanks in advance for all the suggestions
>
>
> -Sagar
>
>  


Re: 64 bit namenode and secondary namenode & 32 bit datanode

2008-11-25 Thread lohit

I might be wrong, but my assumption is running SN either in 64/32 shouldn't 
matter. 
But I am curious how two instances of Secondary namenode is setup, will both of 
them talk to same NN and running in parallel? 
what are the advantages here.
Wondering if there are chances of image corruption.

Thanks,
lohit

- Original Message 
From: Sagar Naik <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 25, 2008 3:58:53 PM
Subject: 64 bit namenode and secondary namenode & 32 bit datanode

I am trying to migrate from 32 bit jvm and 64 bit for namenode only.
*setup*
NN - 64 bit
Secondary namenode (instance 1) - 64 bit
Secondary namenode (instance 2)  - 32 bit
datanode- 32 bit

>From the mailing list I deduced that NN-64 bit and Datanode -32 bit combo works
But, I am not sure if S-NN-(instance 1--- 64 bit ) and S-NN (instance 2 -- 32 
bit) will work with this setup.

Also, do shud I be aware of any other issues for migrating over to 64 bit 
namenode

Thanks in advance for all the suggestions


-Sagar



Re: Getting Reduce Output Bytes

2008-11-25 Thread Lohit
Thanks sharad and paco.

Lohit

On Nov 25, 2008, at 5:34 AM, "Paco NATHAN" <[EMAIL PROTECTED]> wrote:

Hi Lohit,

Our teams collects those kinds of measurements using this patch:
  https://issues.apache.org/jira/browse/HADOOP-4559

Some example Java code in the comments shows how to access the data,
which is serialized as JSON.  Looks like the "red_hdfs_bytes_written"
value would give you that.

Best,
Paco


On Tue, Nov 25, 2008 at 00:28, lohit <[EMAIL PROTECTED]> wrote:
Hello,

Is there an easy way to get Reduce Output Bytes?

Thanks,
Lohit





Getting Reduce Output Bytes

2008-11-24 Thread lohit
Hello,

Is there an easy way to get Reduce Output Bytes?

Thanks,
Lohit



Re: Performing a Lookup in Multiple MapFiles?

2008-11-18 Thread lohit
Hi Dan,

You could do one few things to get around this.
1. In a subsequent step you could merge all your MapFile outputs into one file. 
This is if your MapFile output is small.
2. Else, you can use the same partition function which hadoop used to find the 
partition ID. Partition ID can tell you which output file (out of the 150 
files) your key is present in. 
Eg. if the partition ID was 23, then the output file you would have to look for 
would be part-00023 in the generated output. 

You can use your own Partition class (make sure you use it for your first job 
as well as second) or reuse the one already used by Hadoop. 
http://hadoop.apache.org/core/docs/r0.18.2/api/org/apache/hadoop/mapred/Partitioner.html
 has details.

I think this 
http://hadoop.apache.org/core/docs/r0.18.2/api/org/apache/hadoop/examples/SleepJob.html
 has its usage example. (look for SleepJob.java)

-Lohit




- Original Message 
From: Dan Benjamin <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 18, 2008 10:53:47 AM
Subject: Performing a Lookup in Multiple MapFiles?


I've got a Hadoop process that creates as its output a MapFile.  Using one
reducer this is very slow (as the map is large), but with 150 (on a cluster
of 80 nodes) it runs quickly.  The problem is that it produces 150 output
files as well.  In a subsequent process I need to perform lookups on this
map - how is it recommended that I do this, given that I may not know the
number of existing MapFiles or their names?  Is there a cleaner solution
than listing the contents of the directory containing all of the MapFiles
and then just querying each in sequence?
-- 
View this message in context: 
http://www.nabble.com/Performing-a-Lookup-in-Multiple-MapFiles--tp20565940p20565940.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Cleaning up files in HDFS?

2008-11-14 Thread lohit
Have you tried fs.trash.interval 


  fs.trash.interval
  0
  Number of minutes between trash checkpoints.
  If zero, the trash feature is disabled.
  


more info about trash feature here.
http://hadoop.apache.org/core/docs/current/hdfs_design.html


Thanks,
Lohit

- Original Message 
From: Erik Holstad <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 5:08:03 PM
Subject: Cleaning up files in HDFS?

Hi!
We would like to run a delete script that deletes all files older than
x days that are stored in lib l in hdfs, what is the best way of doing that?

Regards Erik



Re: Recovery of files in hadoop 18

2008-11-14 Thread lohit
Yes that is right whatever you did. One last check.
In secondary namenode log you should see the timestamp of last checkpoint. (or 
download of edits). Just make sure those are before when you run delete 
command. 
Basically, trying to make sure your delete command isn't in edits. (Another way 
woudl have been to open edits in hex editor or similar to check) , but this 
should work.
Once done, you could start.
Thanks,
Lohit



- Original Message 
From: Sagar Naik <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 1:59:04 PM
Subject: Re: Recovery of files in hadoop 18

I had a secondary namenode running on the namenode machine.
I deleted the dfs.name.dir
then bin/hadoop namenode -importCheckpoint.

and restarted the dfs.

I guess the deletion of name.dir will delete the edit logs.
Can u pl tell me that this will not lead to replaying the delete 
transactions ?

Thanks for help/advice


-Sagar

lohit wrote:
> NameNode would not come out of safe mode as it is still waiting for datanodes 
> to report those blocks which it expects. 
> I should have added, try to get a full output of fsck
> fsck  -openforwrite -files -blocks -location.
> -openforwrite files should tell you what files where open during the 
> checkpoint, you might want to double check that is the case, the files were 
> being writting during that moment. May be by looking at the filename you 
> could tell if that was part of a job which was running.
>
> For any missing block, you might also want to cross verify on the datanode to 
> see if is really missing.
>
> Once you are convinced that those are the only corrupt files which you can 
> live with, start datanodes. 
> Namenode woudl still not come out of safemode as you have missing blocks, 
> leave it for a while, run fsck look around, if everything ok, bring namenode 
> out of safemode.
> I hope you had started this namenode with old image and empty edits. You do 
> not want your latest edits to be replayed, which has your delete transactions.
>
> Thanks,
> Lohit
>
>
>
> - Original Message 
> From: Sagar Naik <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, November 14, 2008 12:11:46 PM
> Subject: Re: Recovery of files in hadoop 18
>
> Hey Lohit,
>
> Thanks for you help.
> I did as per your suggestion. imported from secondary namenode.
> we have some corrupted files.
>
> But for some reason, the namenode is still in safe_mode. It has been an hour 
> or so.
> The fsck report is :
>
> Total size:6954466496842 B (Total open files size: 543469222 B)
> Total dirs:1159
> Total files:   1354155 (Files currently being written: 7673)
> Total blocks (validated):  1375725 (avg. block size 5055128 B) (Total 
> open file blocks (not validated): 50)
> 
> CORRUPT FILES:1574
> MISSING BLOCKS:   1574
> MISSING SIZE: 1165735334 B
> CORRUPT BLOCKS:   1574
> 
> Minimally replicated blocks:   1374151 (99.88559 %)
> Over-replicated blocks:0 (0.0 %)
> Under-replicated blocks:   26619 (1.9349071 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor:3
> Average block replication: 2.977127
> Corrupt blocks:    1574
> Missing replicas:  26752 (0.65317154 %)
>
>
> Do you think, I should manually override the safemode and delete all the 
> corrupted files and restart
>
> -Sagar
>
>
> lohit wrote:
>  
>> If you have enabled thrash. They should be moved to trash folder before 
>> permanently deleting them, restore them back. (hope you have that set 
>> fs.trash.interval)
>>
>> If not Shut down the cluster.
>> Take backup of you dfs.data.dir (both on namenode and secondary namenode).
>>
>> Secondary namenode should have last updated image, try to start namenode 
>> from that image, dont use the edits from namenode yet. Try do 
>> importCheckpoint explained in here 
>> https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173.
>>  Start only namenode and run fsck -files. it will throw lot of messages 
>> saying you are missing blocks but thats fine since you havent started 
>> datanodes yet. But if it shows your files, that means they havent been 
>> deleted yet. This will give you a view of system of last backup. Start 
>> datanode If its up, try running fsck and check consistency of the sytem. you 
>> would lose all changes that has happened since the last checkpoint. 
>>
>> Hope that helps,
>> Lohit
>>
>>
>>
>> - Original Message 
>> From: Sagar Naik <[EMAIL PROTECTED]>
>> To: core-user@hadoop.apache.org
>> Sent: Friday, November 14, 2008 10:38:45 AM
>> Subject: Recovery of files in hadoop 18
>>
>> Hi,
>> I accidentally deleted the root folder in our hdfs.
>> I have stopped the hdfs
>>
>> Is there any way to recover the files from secondary namenode
>>
>> Pl help
>>
>>
>> -Sagar
>>  
>>


Re: Recovery of files in hadoop 18

2008-11-14 Thread lohit
NameNode would not come out of safe mode as it is still waiting for datanodes 
to report those blocks which it expects. 
I should have added, try to get a full output of fsck
fsck  -openforwrite -files -blocks -location.
-openforwrite files should tell you what files where open during the 
checkpoint, you might want to double check that is the case, the files were 
being writting during that moment. May be by looking at the filename you could 
tell if that was part of a job which was running.

For any missing block, you might also want to cross verify on the datanode to 
see if is really missing.

Once you are convinced that those are the only corrupt files which you can live 
with, start datanodes. 
Namenode woudl still not come out of safemode as you have missing blocks, leave 
it for a while, run fsck look around, if everything ok, bring namenode out of 
safemode.
I hope you had started this namenode with old image and empty edits. You do not 
want your latest edits to be replayed, which has your delete transactions.

Thanks,
Lohit



- Original Message 
From: Sagar Naik <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 12:11:46 PM
Subject: Re: Recovery of files in hadoop 18

Hey Lohit,

Thanks for you help.
I did as per your suggestion. imported from secondary namenode.
we have some corrupted files.

But for some reason, the namenode is still in safe_mode. It has been an hour or 
so.
The fsck report is :

Total size:6954466496842 B (Total open files size: 543469222 B)
Total dirs:1159
Total files:   1354155 (Files currently being written: 7673)
Total blocks (validated):  1375725 (avg. block size 5055128 B) (Total open 
file blocks (not validated): 50)

CORRUPT FILES:1574
MISSING BLOCKS:   1574
MISSING SIZE: 1165735334 B
CORRUPT BLOCKS:   1574

Minimally replicated blocks:   1374151 (99.88559 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   26619 (1.9349071 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 2.977127
Corrupt blocks:1574
Missing replicas:  26752 (0.65317154 %)


Do you think, I should manually override the safemode and delete all the 
corrupted files and restart

-Sagar


lohit wrote:
> If you have enabled thrash. They should be moved to trash folder before 
> permanently deleting them, restore them back. (hope you have that set 
> fs.trash.interval)
> 
> If not Shut down the cluster.
> Take backup of you dfs.data.dir (both on namenode and secondary namenode).
> 
> Secondary namenode should have last updated image, try to start namenode from 
> that image, dont use the edits from namenode yet. Try do importCheckpoint 
> explained in here 
> https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173.
>  Start only namenode and run fsck -files. it will throw lot of messages 
> saying you are missing blocks but thats fine since you havent started 
> datanodes yet. But if it shows your files, that means they havent been 
> deleted yet. This will give you a view of system of last backup. Start 
> datanode If its up, try running fsck and check consistency of the sytem. you 
> would lose all changes that has happened since the last checkpoint. 
> 
> Hope that helps,
> Lohit
> 
> 
> 
> - Original Message 
> From: Sagar Naik <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, November 14, 2008 10:38:45 AM
> Subject: Recovery of files in hadoop 18
> 
> Hi,
> I accidentally deleted the root folder in our hdfs.
> I have stopped the hdfs
> 
> Is there any way to recover the files from secondary namenode
> 
> Pl help
> 
> 
> -Sagar
>  


Re: Recovery of files in hadoop 18

2008-11-14 Thread lohit


If you have enabled thrash. They should be moved to trash folder before 
permanently deleting them, restore them back. (hope you have that set 
fs.trash.interval)

If not Shut down the cluster.
Take backup of you dfs.data.dir (both on namenode and secondary namenode).

Secondary namenode should have last updated image, try to start namenode from 
that image, dont use the edits from namenode yet. Try do importCheckpoint 
explained in here 
https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173.
 Start only namenode and run fsck -files. it will throw lot of messages saying 
you are missing blocks but thats fine since you havent started datanodes yet. 
But if it shows your files, that means they havent been deleted yet. 
This will give you a view of system of last backup. Start datanode If its up, 
try running fsck and check consistency of the sytem. you would lose all changes 
that has happened since the last checkpoint. 


Hope that helps,
Lohit



- Original Message 
From: Sagar Naik <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 10:38:45 AM
Subject: Recovery of files in hadoop 18

Hi,
I accidentally deleted the root folder in our hdfs.
I have stopped the hdfs

Is there any way to recover the files from secondary namenode

Pl help


-Sagar


Re: NameNode does not come out of Safemode automatically in Hadoop-0.17.2

2008-11-13 Thread lohit
Namenode does not come out of safemode until it gets confirmation from 
datanodes about the blocks it has. Namenode has a view of filesystem and its 
blocks, it expects those blocks to be reported by datanode until which it 
decides that the filesystem is not ready to use yet. You can exit the namenode 
from safemode and do 'dfs fsck / -files -blocks -locations' which will tell you 
the missing blocks and their locations where it expects from. Check if those 
nodes are up and running an they have those blocks. 
Thanks,
Lohit



- Original Message 
From: Pratyush Banerjee <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, November 13, 2008 8:39:20 PM
Subject: NameNode does not come out of Safemode automatically in Hadoop-0.17.2

Hi All,

We have been using hadoop-0.17.2 for some time now and we just had a case of 
namenode crash due to disk being full.
In order to get the namenode up again with minimal loss of data, we had to 
manually edit the edits file in a Hex editor and restart the namenode.

However after restarting, the namenode went to the safe mode (as expected), but 
it has been hours since it is like that, and it has not yet come out of the  
safemode.
We can obviously force it to come out but should it not come out automatically ?
Even after 12 hours of remaining in safemode the ratio of reported block size 
is still stuck at  0.9768.

Running fsck on / in the hdfs does report about some corrupt files.

What is the  issue which is blocking namenode form coming out of safemode ? If 
we have to do it manually (hadoop dfsadmin -safemode leave) then what procedure 
do we follow in the process to ensure data safety ?

thanks and regards,

Pratyush



Re: How to exclude machines from a cluster

2008-11-13 Thread lohit
Hi,

You could try to decommission the node. More information is provided here 
http://wiki.apache.org/hadoop/FAQ#17
When you request the NN to decommission a node, it tries to replicate all the 
blocks on that node to different node. 

PS : 1 replica is always at risk of losing data, try atleast two, may be having 
the files compressed might help.

Lohit


- Original Message 
From: "Zhou, Yunqing" <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, November 13, 2008 1:06:00 AM
Subject: How to exclude machines from a cluster

Here is a cluster with 13 machines. And due to the lack of storage
space, we set the replication factor to 1.
but recently we found 2 machines in the cluster are not stable. so I'd
like to exclude them from the cluster.
but I can't simply set the replication factor to 1 and remove them due
to the large amount of data.
so is there a way I can force hadoop to move the block stored on them
to other machines?

Thanks.



Re: Can you load one NN data (fsimage , edits..) to another NN?

2008-11-13 Thread lohit
Hi Yossi,

Yes you can start NN on different node using dfs.data.dir files.
set your dfs.data.dir config variable to point to this directory and you should 
be able to start NN on different node.
Try to search for previous discussions on this list regarding failover, lot of 
email and ideas have been recorded.

Thanks,
Lohit



- Original Message 
From: Yossi Ittach <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, November 13, 2008 8:43:00 AM
Subject: Can you load one NN data (fsimage , edits..) to another NN?

Hey all

If I have 1 NN , can I transport it's data (copy) and start another NN with
it? it will bring me very close to creating a failover and failback
mechanisem.

Thanks!

Vale et me ama
Yossi



Re: Namenode Failure

2008-11-13 Thread lohit
Hi Ankur,

We have had this kind of failure reported by others earlier on this list.
This might help you

http://markmail.org/message/u6l6lwus33oeivcd

Thanks,
Lohit



- Original Message 
From: ANKUR GOEL <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]; core-user@hadoop.apache.org
Sent: Thursday, November 13, 2008 4:34:15 AM
Subject: Namenode Failure

Hi Folks,
We have been running hadoop-0.17.2 release on a 50 machine cluster 
and we recently experience a namenode failure because of disk becoming full. 
The node is unable to start-up now and throws the following exception

2008-11-13 06:41:18,618 INFO org.apache.hadoop.ipc.Server: Stopping server on 
9000
2008-11-13 06:41:18,619 ERROR org.apache.hadoop.dfs.NameNode: 
java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
   at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
   at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599)
   at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)
   at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)
   at org.apache.hadoop.dfs.FSImage.doUpgrade(FSImage.java:250)
   at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:217)
   at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
   at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)
   at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:255)
   at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)

What is the best way to recover this failure with minimal data loss ?
I could not find instructions on wiki or anywhere else for release 0.17.2 to do 
recovery using files from secondary namenode ?

Any help is greatly appreciated.

Thanks
-Ankur



Re: Caching data selectively on slaves

2008-11-11 Thread lohit
DistributedCache would copy the cache data on all nodes. If you know the 
mapping of R* to D*, how about Reduce reading the data from DFS, the D which it 
expects to. Distributed cache will only help if the data you are using is used 
by multiple tasks on same node, in that you would not try to access DFS 
multiple times. If you know that the each 'D' is read by one 'R' then you are 
not buying much with DistributedCache. Although you should also keep in mind if 
you are read takes long time you reducers might timeout failing to report 
status.

Thanks,
Lohit



- Original Message 
From: Tarandeep Singh <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, November 11, 2008 10:56:41 AM
Subject: Caching data selectively on slaves

Hi,

Is is possible to cache data selectively on slave machines?

Lets say I have data partitioned as D1, D2... and so on. D1 is required by
Reducer R1, D2 by R2 and so on. I know this before hand because
HashPartitioner.getPartition was used to partition the data.

If I put D1, D2.. in distributed cache, then the data is copied on all
machines. Is is possible to cache data selectively on machines?

Thanks,
Taran



Re: reduce more than one way

2008-11-07 Thread lohit
There is mapper called IdentityMapper (look of IdentityMapper.java), which 
basically reads input and outputs without doing anything.
May be you can run your mapper with no reducers and store intermediate output 
and then run your 2 hadoop programs with Identity mapper and different set of 
reducers.
Thanks,
Lohit



- Original Message 
From: Elia Mazzawi <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, November 7, 2008 12:35:44 PM
Subject: reduce more than one way

Hello,

I'm writing hadoop programs in Java,
I have 2 hadooop map/reduce programs that have the same map, but a different 
reduce methods.

can i run them in a way so that the map only happens once?

maybe store the intermediate result or something?



Re: Urgent: Need -importCheckpoint equivalent for 0.15.3

2008-10-13 Thread lohit
Its good to take backup of existing data storage (namenode & secondary 
namenode).
Konstantine has explained the steps in this JIRA
https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173

HTH,
Lohit



- Original Message 
From: Stu Hood <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Monday, October 13, 2008 12:11:29 AM
Subject: Urgent: Need -importCheckpoint equivalent for 0.15.3

We're running Hadoop 0.15.3 (yes, time for an upgrade), and for the first time, 
we need to have our secondary namenode take over for our failed namenode.

There may not be an equivalent command to `namenode -importCheckpoint` in 
Hadoop 0.15.3, but I need to know the proper way to restore from a checkpoint, 
in case our namenode cannot be recovered.

Thanks,

Stu Hood
Architecture Software Developer
Mailtrust, a Division of Rackspace


Re: counter for number of mapper records

2008-09-24 Thread lohit
Yes, take a look at
src/mapred/org/apache/hadoop/mapred/Task_Counter.properties

Those are all the counters available for a task.
-Lohit



- Original Message 
From: Sandy <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, September 24, 2008 5:09:39 PM
Subject: counter for number of mapper records

If I understand correctly, each mapper is sent a set number of records. Is
there a counter or variable that tells you how many records is sent to a
particular mapper?
Likewise, is there a similar thing for reducers?

Thanks in advance.

-SM



Re: NotYetReplicated exceptions when pushing large files into HDFS

2008-09-23 Thread Lohit
As of now, no. It's fixed for 3 retries. The ides is, if your dfs put fails 
even after three retries then there is something wrong which needs to be seen.

Lohit

On Sep 23, 2008, at 4:24 AM, "Ryan LeCompte" <[EMAIL PROTECTED]> wrote:

Thanks. Is there a way to increase the retry amount?

Ryan

On Mon, Sep 22, 2008 at 8:21 PM, lohit <[EMAIL PROTECTED]> wrote:
Yes, these are warning unless they fail for 3 times. In which case your dfs 
-put command would fail with stack trace.
Thanks,
Lohit



- Original Message 
From: Ryan LeCompte <[EMAIL PROTECTED]>
To: "core-user@hadoop.apache.org" 
Sent: Monday, September 22, 2008 5:18:01 PM
Subject: Re: NotYetReplicated exceptions when pushing large files into HDFS

I've noticed that although I get a few of these exceptions, the file
is ultimately uploaded to the HDFS cluster. Does this mean that my
file ended up getting there in 1 piece? The exceptions are just logged
at the WARN level and indicate retry attempts.

Thanks,
Ryan


On Mon, Sep 22, 2008 at 11:08 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
Hello all,

I'd love to be able to upload into HDFS very large files (e.g., 8 or
10GB), but it seems like my only option is to chop up the file into
smaller pieces. Otherwise, after a while I get NotYetReplication
exceptions while the transfer is in progress. I'm using 0.18.1. Is
there any way I can do this? Perhaps use something else besides
bin/hadoop -put input output?

Thanks,
Ryan






Re: NotYetReplicated exceptions when pushing large files into HDFS

2008-09-22 Thread lohit
Yes, these are warning unless they fail for 3 times. In which case your dfs 
-put command would fail with stack trace.
Thanks,
Lohit



- Original Message 
From: Ryan LeCompte <[EMAIL PROTECTED]>
To: "core-user@hadoop.apache.org" 
Sent: Monday, September 22, 2008 5:18:01 PM
Subject: Re: NotYetReplicated exceptions when pushing large files into HDFS

I've noticed that although I get a few of these exceptions, the file
is ultimately uploaded to the HDFS cluster. Does this mean that my
file ended up getting there in 1 piece? The exceptions are just logged
at the WARN level and indicate retry attempts.

Thanks,
Ryan


On Mon, Sep 22, 2008 at 11:08 AM, Ryan LeCompte <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> I'd love to be able to upload into HDFS very large files (e.g., 8 or
> 10GB), but it seems like my only option is to chop up the file into
> smaller pieces. Otherwise, after a while I get NotYetReplication
> exceptions while the transfer is in progress. I'm using 0.18.1. Is
> there any way I can do this? Perhaps use something else besides
> bin/hadoop -put input output?
>
> Thanks,
> Ryan
>



Re: Tips on sorting using Hadoop

2008-09-20 Thread lohit
Since this is sorting, does it help if you run map/reduce twice? Number of 
output bytes should be same as input bytes.
To do total order sorting, you have to make your partition function split the 
keyspace equally in order among the number of reducers. 
For example look at the TeraSort as to how this is done. 
http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/terasort/TeraSort.java

Thanks,
Lohit



- Original Message 
From: Edward J. Yoon <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Saturday, September 20, 2008 10:53:40 AM
Subject: Re: Tips on sorting using Hadoop

I would recommend that run map/reduce twice.

/Edward

On Sat, Sep 13, 2008 at 5:58 AM, Tenaali Ram <[EMAIL PROTECTED]> wrote:
> Hi,
> I want to sort my records ( consisting of string, int, float) using Hadoop.
>
> One way I have found is to set number of reducers = 1, but this would mean
> all the records go to 1 reducer and it won't be optimized. Can anyone point
> me to some better way to do sorting using Hadoop ?
>
> Thanks,
> Tenaali
>



-- 
Best regards, Edward J. Yoon
[EMAIL PROTECTED]
http://blog.udanax.org



Re: Lots of files in a single hdfs directory?

2008-09-17 Thread Lohit
Last when I tried to load an image with lots of files in same directory, it was 
like ten times slow. This is to do with the data structures. My numbers were 
million though. Try to have a directory structure.

Lohit

On Sep 17, 2008, at 11:57 AM, Nathan Marz <[EMAIL PROTECTED]> wrote:

Hello all,

Is it bad to have a lot of files in a single HDFS directory (aka, on the order 
of hundreds of thousands)? Or should we split our files into a directory 
structure of some sort?

Thanks,
Nathan Marz



Re: scp to namenode faster than dfs put?

2008-09-17 Thread Lohit
Also dfs put copies multiple replicas unlike scp.

Lohit

On Sep 17, 2008, at 6:03 AM, "叶双明" <[EMAIL PROTECTED]> wrote:

Actually, No.
As you said, I understand that "dfs -put" breaks the data into blocksand
then copies to datanodes,
but scp do not breaks the data into blocksand , and just copy the data to
the namenode!


2008/9/17, Prasad Pingali <[EMAIL PROTECTED]>:

Hello,
 I observe that scp of data to the namenode is faster than actually
putting
into dfs (all nodes coming from same switch and have same ethernet cards,
homogenous nodes)? I understand that "dfs -put" breaks the data into blocks
and then copies to datanodes, but shouldn't that be atleast as fast as
copying data to namenode from a single machine, if not faster?

thanks and regards,
Prasad Pingali,
IIIT Hyderabad.





-- 
Sorry for my english!!  明
Please help me to correct my english expression and error in syntax



Re: Hadoop Streaming and Multiline Input

2008-09-09 Thread lohit
If your webpage is xml tagged and you are looking into using streaming.
This might help 
http://hadoop.apache.org/core/docs/r0.18.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F
-Lohit



- Original Message 
From: Jim Twensky <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, September 9, 2008 11:23:37 AM
Subject: Re: Hadoop Streaming and Multiline Input

If I understand your question correctly, you need to write your own
FileInputFormat. Please see
http://hadoop.apache.org/core/docs/r0.18.0/api/index.html for details.

Regards,
Tim

On Sat, Sep 6, 2008 at 9:20 PM, Dennis Kubes <[EMAIL PROTECTED]> wrote:

> Is is possible to set a multiline text input in streaming to be used as a
> single record?  For example say I wanted to scan a webpage for a specific
> regex that is multiline, is this possible in streaming?
>
> Dennis
>



Re: can i run multiple datanode in one pc?

2008-09-04 Thread lohit
Good way is to have different 'conf' dirs
So, you would end up with dir conf1 and conf2
and startup of datanode would be
./bin/hadoop-daemons.sh --config conf1 start datanode
./bin/hadoop-daemons.sh --config conf2 start datanode


make sure you have different hadoop-site.xml in conf1 and conf2 dirs.

-Lohit

- Original Message 
From: 叶双明 <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, September 4, 2008 12:01:48 AM
Subject: Re: can i run multiple datanode in one pc?

Thanks lohit.

I run start datanod by comman: bin/hadoop datanode -conf
conf/hadoop-site.xml, it can't work, but command: bin/hadoop datanode can
work.

Something wrong have I done?



bin/hadoop datanode -conf conf/hadoop-site.xml


2008/9/4 lohit <[EMAIL PROTECTED]>

> Yes, each datanode should point to different config.
> So, if you have conf/hadoop-site.xml make another conf2/hadoop-site.xml
> with ports for datanode specific stuff and you should be able to start
> multiple datanodes on same node.
> -Lohit
>
>
>
> - Original Message 
> From: 叶双明 <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Wednesday, September 3, 2008 8:19:59 PM
> Subject: can i run multiple datanode in one pc?
>
> i know it is running one datanode in one computer normally。
> i wondering can i run multiple datanode in one pc?
>
>



Re: Compare data on HDFS side

2008-09-04 Thread Lohit Vijayarenu


One way is to write a small program which does diff at block level. Open both 
files, read data with same offset do a diff. This will tell you diffs at your 
offset boundry and usefull to check if two files differ. There is also an open 
jira which can get you chechsum of files which would make this task trivial.
Lohit

On Sep 4, 2008, at 6:51 AM, "Andrey Pankov" <[EMAIL PROTECTED]> wrote:

Hello,

Does anyone know is it possible to compare data on HDFS but avoid
coping data to local box? I mean if I'd like to find difference
between local text files I can use diff command. If files are at HDFS
then I have to get them from HDFS to local box and only then do diff.
Coping files to local fs is a bit annoying and could be problematical
when files are huge, say 2-5 Gb.

Thanks in advance.

-- 
Andrey Pankov



Re: can i run multiple datanode in one pc?

2008-09-03 Thread lohit
Yes, each datanode should point to different config.
So, if you have conf/hadoop-site.xml make another conf2/hadoop-site.xml with 
ports for datanode specific stuff and you should be able to start multiple 
datanodes on same node.
-Lohit



- Original Message 
From: 叶双明 <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, September 3, 2008 8:19:59 PM
Subject: can i run multiple datanode in one pc?

i know it is running one datanode in one computer normally。
i wondering can i run multiple datanode in one pc?



Re: parallel hadoop process reading same input file

2008-08-28 Thread lohit
Hi Deepak,
Can you explain what process and what files they are trying to read? If you are 
talking about map/reduce tasks reading files on DFS, then, yes parallel reads 
are allowed. Multiple writers are not. 
-Lohit



- Original Message 
From: Deepak Diwakar <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, August 28, 2008 6:06:58 AM
Subject: parallel hadoop process reading same input file

Hi,

When I am running two hadoop processes in parallel and both process has to
read same file. It fails.
Of course one solution is to keep copy of file into different location so
that accessing simultaneously would not cause any problem. But what if we
don't want to do so because it costs extra space.
Plz do suggest me any suitable solution to this.

Thanks & Regards,
Deepak



Re: Load balancing in HDFS

2008-08-27 Thread lohit
If you have a fixed set of nodes in cluster and load data onto HDFS, it tries 
to automatically balance the distribution across nodes by selecting random 
nodes to store replicas. This has to be done with a client which is outside the 
datanodes for random distribution. If you add new nodes to your cluster or 
would like to rebalance your cluster you could use the rebalancer utility
http://hadoop.apache.org/core/docs/current/hdfs_user_guide.html#Rebalancer

-Lohit


- Original Message 
From: Mork0075 <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, August 27, 2008 12:54:48 AM
Subject: Load balancing in HDFS

Hello,

i'am planning to use HDFS as a DFS in a web application evenvironment.
There are two requirements: fault tolerence, which is ensured by the
replicas and load balancing. Is load balancing part of HDFS and how is
it configurable?

Thanks a lot



Re: Un-Blacklist Node

2008-08-14 Thread lohit
One way I could think of is to just restart mapred daemons.
./bin/stop-mapred.sh
./bin/start-mapred.sh


Thanks,
Lohit

- Original Message 
From: Xavier Stevens <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, August 14, 2008 12:18:18 PM
Subject: Un-Blacklist Node

Is there a way to un-blacklist a node without restarting hadoop?

Thanks,

-Xavier

Re: how to config secondary namenode

2008-08-14 Thread lohit
You could use the same config you use for namenode, (bcn151). 
In addition you might want to change these
fs.checkpoint.dir
fs.checkpoint.period (default is one hour)
dfs.secondary.http.address (if you do not want the default)

Thanks,
Lohit



- Original Message 
From: 志远 <[EMAIL PROTECTED]>
To: core-user 
Sent: Thursday, August 14, 2008 3:02:15 AM
Subject: how to config secondary namenode

  
How to config secondary namenode with
another machine
 
Namenode: bcn151
Secondary namenode: bcn152
Datanodes: hdp1 hdp2
 
Thanks!

Re: Dynamically adding datanodes

2008-08-14 Thread lohit
if the config is right, then this is the procedure to add a new datanode. 
Do you see any exceptions logged in your datanode log?
Run it as daemon so it logs everything into a file under HADOOP_LOG_DIR
./bin/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode 

Thanks,
Lohit

- Original Message 
From: Kai Mosebach <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, August 14, 2008 1:48:02 AM
Subject: Dynamically adding datanodes

Hi,

how can i add a datanode dynamically to a hadoop cluster without restarting the 
whole cluster?
I was trying to run "hadoop datanode" on the new datanode with the appropriate 
config (pointing to my correct namenode) but it does not show up there.

Is there a way?

Thanks Kai



Re: Random block placement

2008-08-12 Thread lohit
Hi John,

This file should be a good starting point for you. 
src/hdfs/org/apache/hadoop/hdfs/server/namenode/ReplicationtargetChooser.java


There has been discussions about a pluggable block place policy 
https://issues.apache.org/jira/browse/HADOOP-3799

Thanks,
Lohit

- Original Message 
From: John DeTreville <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, August 12, 2008 3:31:49 PM
Subject: Random block placement

My understanding is that HDFS places blocks randomly. As I would expect,
then,
when I use "hadoop fsck" to look at block placements for my files, I see
that
some nodes have more blocks than the average. I would expect that these
hot
spots would cause a performance hit relative to a more even placement of
blocks.

I'd like to experiment with non-random block placement to see if this
can
provide a performance improvement. Where in the code would I start
looking to
find the existing code for random placement?

Cheers,
John



Re: Difference between Hadoop Streaming and "Normal" mode

2008-08-12 Thread lohit
To add to this,
If you use streaming you would be operating on Text fields. If you have 
sequence files you would have to have your own input format to convert it and 
deal with the format in your scripts while with java implementation its 
trivial. There is a performance hit if you use streaming, but other than that 
you should be able to do most of the stuff. Lot of applications use streaming. 


-Lohit

- Original Message 
From: John DeTreville <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, August 12, 2008 3:33:57 PM
Subject: RE: Difference between Hadoop Streaming and "Normal" mode

I think you will find that the Streaming model buys you convenience,
but costs you performance and generality. I'll let others quantify
how much of each.

Cheers,
John

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
Gaurav Veda
Sent: Tuesday, August 12, 2008 3:10 PM
To: core-user@hadoop.apache.org
Subject: Difference between Hadoop Streaming and "Normal" mode

Hi All,

This might seem too silly, but I couldn't find a satisfactory answer
to this yet. What are the advantages / disadvantages of using Hadoop
Streaming over the normal mode (wherein you write your own mapper and
reducer in Java)? From what I gather, the real advantage of Hadoop
Streaming is that you can use any executable (in c / perl / python
etc) as a mapper / reducer.
A slight disadvantage is that the default is to read (write) from the
standard input (output) ... though one can specify their own Input and
Output format (and package it with the default hadoop streaming jar
file).

My point is, why should I ever use the normal mode? Streaming seems
just as good. Is there a performance problem or do I have only limited
control over my job if I use the streaming mode or some other issue?

Thanks!
Gaurav
-- 
Share what you know, learn what you don't !



Re: NameNode hardware specs

2008-08-12 Thread lohit
Hi Manish,

>- why 15+ GBs?  Do we allocate all memory to the NameNode? or  
>just allocate some number using -Xmx and leave the rest available so  
>the machine doesnt start swapping?


We allocated memory using -Xmx. NameNode stores the HDFS namespace in memory, 
so, the bigger your namespace, the bigger would be your heap. My guess is that 
if you have more than 15 million files  with 20 million blocks you might need 
such a big system. But again, its best to see how your namenode is performing 
and how much memory it is consuming. 

>  - why RAID5?
> - If running RAID 5, why is this necessary?
Not absolute necessary. So, the namenode index or metadata is critical piece of 
data. You cannot afford to lose or corrupt it. That is the reason, we have an 
option of specifying multiple directories to have different copies in parallel. 
You could configure the directories to whatever you would like it to be. 
Multiple drives, NFS

>- Configure the name node to store one set of transaction logs on a  
>separate disk from the index.
> why?
This feature is not yet supported, but a good one to have. Right now both 
transaction logs and index (I am assuming this means image) are in same 
directory and cannot to be configured to be placed in separate directories. We 
should correct the wiki.

> - Configure the name node to store another set of transaction logs to  
> a network mounted disk.
>  - why?
As explained above, this is to have multiple copies of your metadata 
(dfs.name.dir in particular)

>- Do not host DataNode, JobTracker or TaskTracker services on the  
>same system.
typically Datanode and TaskTracker are run on all nodes while JobTracker is run 
on dedicated node like NameNode (SecondaryNameNode).
Sometimes, TaskTracker might crash and bring down a node and you do not want 
your JobTracker or NameNode to be on that system.

PS: Could you point to the wiki you are referring to? We might need to make 
some corrections.

Thanks,
Lohit

- Original Message 
From: Manish Shah <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, August 12, 2008 11:24:45 AM
Subject: NameNode hardware specs

Can someone help explain in a little more detail some of the reasons  
for the hardware specs that were recently added to the wiki for the  
NameNode.  I guess i'm interested in learning how others have settled  
on these specs?  Is it by observed behavior, or just recommended by  
other hadoop users?

- Use a good server with lots (15GB+) of RAM.
  - why 15+ GBs?  Do we allocate all memory to the NameNode? or  
just allocate some number using -Xmx and leave the rest available so  
the machine doesnt start swapping?

- Consider using fast RAID5 storage for keeping the index.
  - why RAID5?

- List more than one name node directory in the configuration, so  
that multiple copies of the indices will be stored. As long as the  
directories are on separate disks, a single full disk will not  
corrupt the index.
  - If running RAID 5, why is this necessary?

- Configure the name node to store one set of transaction logs on a  
separate disk from the index.
  - why?

- Configure the name node to store another set of transaction logs to  
a network mounted disk.
  - why?

- Do not host DataNode, JobTracker or TaskTracker services on the  
same system.
  - how much memory would the job tracker need?  Does it use a  
lot of CPU? In general, what are good specs for a job tracker machine  
and can the machine be shared with other services?

Thanks so much for the help.  I think it would be hugely helpful for  
the community to start describing their respective setups for hadoop  
clusters in more detail than just the config for datanodes and  
cluster size.  I think we all want to be confident that we are  
spending money on the right machines to grow our cluster the right way.


Most appreciated,

- Manish
Co-Founder Rapleaf.com

We're looking for a product manager, sys admin, and software  
engineers...$10K referral award


Re: java.io.IOException: Could not get block locations. Aborting...

2008-08-12 Thread lohit
>The "Could not get block locations" exception was gone after a Hadoop
>restart, but further down the road our job failed again. I checked the
>logs for "discarding calls" and found a bunch of them, plus the namenode
>appeared to have a load spike at that time, so it seems it is getting
>overloaded. Do you know how can we prevent this? Currently the namenode
>machine is not running anything but the namenode and the secondary
>namenode, and the cluster only has 16 machines.

Typically secondary namenode should be running on a different machine. It 
requires the same amount of resources as a namenode.
So, if you have say 8G ram node and your namenode is taking like 2-3G of space, 
your secondary namenode would also take up so much of space.
To cross check try to see if the load spike on namenode was during the time 
secondary namenode was checkpointing (by looking at secondary namenode logs).

Thanks,
Lohit



- Original Message 
From: Piotr Kozikowski <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Monday, August 11, 2008 12:20:05 PM
Subject: Re: java.io.IOException: Could not get block locations. Aborting...

Hi again,

The "Could not get block locations" exception was gone after a Hadoop
restart, but further down the road our job failed again. I checked the
logs for "discarding calls" and found a bunch of them, plus the namenode
appeared to have a load spike at that time, so it seems it is getting
overloaded. Do you know how can we prevent this? Currently the namenode
machine is not running anything but the namenode and the secondary
namenode, and the cluster only has 16 machines.

Thank you

Piotr

On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote:
> It is possible that your namenode is overloaded and is not able to
> respond to RPC requests from clients. Please check the namenode logs
> to see if you see lines of the form "discarding calls...".
> 
> dhrua
> 
> On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov
> <[EMAIL PROTECTED]> wrote:
> > I come across the same issue and also with hadoop 0.17.1
> >
> > would be interesting if someone say the cause of the issue.
> >
> > Alex
> >
> > 2008/8/8 Steve Loughran <[EMAIL PROTECTED]>
> >
> >> Piotr Kozikowski wrote:
> >>
> >>> Hi there:
> >>>
> >>> We would like to know what are the most likely causes of this sort of
> >>> error:
> >>>
> >>> Exception closing
> >>> file
> >>> /data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022
> >>> java.io.IOException: Could not get block locations. Aborting...
> >>>at
> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2080)
> >>>at
> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1300(DFSClient.java:1702)
> >>>at
> >>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1818)
> >>>
> >>> Our map-reduce job does not fail completely but over 50% of the map tasks
> >>> fail with this same error.
> >>> We recently migrated our cluster from 0.16.4 to 0.17.1, previously we
> >>> didn't have this problem using the same input data in a similar map-reduce
> >>> job
> >>>
> >>> Thank you,
> >>>
> >>> Piotr
> >>>
> >>>
> >> When I see this, its because the filesystem isnt completely up: there are
> >> no locations for a specific file, meaning the client isn't getting back the
> >> names of any datanodes holding the data from the name nodes.
> >>
> >> I've got a patch in JIRA that prints out the name of the file in question,
> >> as that could be useful.
> >>
> >
> >
> >
> > --
> > Best Regards
> > Alexander Aristov
> >


Re: Stopping two reducer tasks on two machines from working on the same keys?

2008-08-11 Thread lohit
>redoing each other's work and stomping on each others output files.
I am assuming your tasks (reducers) are generating these files and these are 
not the output file like part-0

Looks like you have speculative execution turned on.
hadoop tries to execute parallel attempts of map/reduce tasks if it finds out 
one of them is falling behind. All those task attempts are appended with a 
number as you can see _0 and _1.
If you have tasks which generate files to common files, then you hit this 
problem. 
There are two ways out of this
1. turn off speculative execution by setting mapred.speculative.execution to 
false
2. if you are generating outputs, try to use taskID for unique attempt. 

>I've attached the JSP output that indicates this; let me know if you
>need any other details.
No attachement.

Thanks,
Lohit



- Original Message 
From: Anthony Urso <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Monday, August 11, 2008 7:03:45 PM
Subject: Stopping two reducer tasks on two machines from working on the same 
keys?

I have a Hadoop 0.16.4 cluster that effectively has no HDFS.  It's
running a job analyzing data stored on a NAS type system mounted on
each tasktracker.

Unfortunately, the reducers task_200808062237_0031_r_00_0 and
task_200808062237_0031_r_00_1 are running simultaneously on the
same keys, redoing each other's work and stomping on each others
output files.

I've attached the JSP output that indicates this; let me know if you
need any other details.

Is this a configuration error, or is it a bug in Hadoop?

Cheers,
Anthony



Re: How to enable compression of blockfiles?

2008-08-08 Thread lohit
I think at present only SequenceFiles can be compressed. 
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
If you have plain text files, they are stored as is into blocks. You can store 
them as .gz and hadoop recognizes it and process the gz files. But its not 
splittable, meaning each map will consume whole of .gz
Thanks,
Lohit



- Original Message 
From: Michael K. Tung <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, August 8, 2008 1:09:01 PM
Subject: How to enable compression of blockfiles?

Hello, I have a simple question.  How do I configure DFS to store compressed
block files?  I've noticed by looking at the "blk_" files that the text
documents I am storing are uncompressed.  Currently our hadoop deployment is
taking up 10x the diskspace as compared to our system before moving to
hadoop. I've tried modifying the io.seqfile.compress.blocksize option
without success and haven't been able to find anything online regarding
this.  Is there any way to do this or do I need to manually compress my data
before storing to HDFS?

Thanks,

Michael Tung


Re: namenode & jobtracker: joint or separate, which is better?

2008-08-08 Thread lohit
It depends on your machine configuration, how much resource it has and what you 
can afford to lose in case of failures. 
It would be good to run NameNode and jobtracker on their own dedicate nodes and 
datanodes and tasktracker on rest of the nodes. We have seen cases where 
tasktrackers take down nodes for malicious programs, in such cases you do not 
want your jobtracker or namenode to be on those nodes. 
Also, running multiple jvms might slow down the node and your process. I would 
recommend you run atleast the NameNode on dedicated node.
Thanks,
Lohit



- Original Message 
From: James Graham (Greywolf) <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, August 8, 2008 1:29:08 PM
Subject: namenode & jobtracker: joint or separate, which is better?

Which is better, to have the namenode and jobtracker as distinct nodes
or as a single node, and are there pros/cons regarding using either or
both as datanodes?
-- 
James Graham (Greywolf)  |
650.930.1138|925.768.4053  *
[EMAIL PROTECTED]  |
Check out what people are saying about SearchMe! -- click below
http://www.searchme.com/stack/109aa



Re: what is the correct usage of hdfs metrics

2008-08-08 Thread lohit
I have tried to connect to it via jconsole.
Apart from that I have seen people of this list use Ganglia to collect metrics 
or just dump to a file. 
To start off you could easily use FileContext (dumping metrics to file). Check 
out the metrics config file (hadoop-metrics.properties) under conf directory.  
Specify file name and period to monitor the metrics.
Thanks,
Lohit



- Original Message 
From: Ivan Georgiev <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, August 8, 2008 4:39:36 AM
Subject: what is the correct usage of hdfs metrics

Hi,

I have been unable to find any examples on how to use the MBeans 
provided from HDFS.
Could anyone that has any experience on the topic share some info.
What is the URL to use to connect to the MBeanServer ?
Is it done through rmi, or only through jvm ?

Any help is highly appreciated.

Please cc me as i am not a member of the list.

Regards:
Ivan



Re: DFS. How to read from a specific datanode

2008-08-06 Thread lohit
>I need this because I do not want to trust namenode's ordering. For
>applications where network congestion is rare, we should let the
>client to decide which data node to load from.

If this is the case, then providing a method to re-order the datanode list 
shouldnt be hard. May be open a JIRA 
(https://issues.apache.org/jira/secure/CreateIssue!default.jspa) as improvement 
request and continue the discussion there?

-Lohit


- Original Message 
From: Kevin <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, August 6, 2008 10:37:44 AM
Subject: Re: DFS. How to read from a specific datanode

Thank you for the suggestion. I looked at DFSClient. It appears that
chooseDataNode method decides which data node to connect to. Currently
it chooses the first non-dead data node returned by namenode, which
have sorted the nodes by proximity to the client. However,
chooseDataNode is private, so overriding it seems infeasible. Neither
are the callers of chooseDataNode public or protected.

I need this because I do not want to trust namenode's ordering. For
applications where network congestion is rare, we should let the
client to decide which data node to load from.

-Kevin



On Tue, Aug 5, 2008 at 7:57 PM, lohit <[EMAIL PROTECTED]> wrote:
>  I havent tried it, but see if you can create DFSClient object and use its 
> open() and read() calls to get the job done. Basically you would have to 
> force currentNode to be your node of interest in there.
> Just curious, what is the use case for your request?
>
> Thanks,
> Lohit
>
>
>
> - Original Message 
> From: Kevin <[EMAIL PROTECTED]>
> To: "core-user@hadoop.apache.org" 
> Sent: Tuesday, August 5, 2008 6:59:55 PM
> Subject: DFS. How to read from a specific datanode
>
> Hi,
>
> This is about dfs only, not to consider mapreduce. It may sound like a
> strange need, but sometimes I want to read a block from a specific
> data node which holds a replica. Figuring out which datanodes have the
> block is easy. But is there an easy way to specify which datanode I
> want to load from?
>
> Best,
> -Kevin
>
>



Re: DFS. How to read from a specific datanode

2008-08-05 Thread lohit
 I havent tried it, but see if you can create DFSClient object and use its 
open() and read() calls to get the job done. Basically you would have to force 
currentNode to be your node of interest in there.
Just curious, what is the use case for your request? 

Thanks,
Lohit



- Original Message 
From: Kevin <[EMAIL PROTECTED]>
To: "core-user@hadoop.apache.org" 
Sent: Tuesday, August 5, 2008 6:59:55 PM
Subject: DFS. How to read from a specific datanode

Hi,

This is about dfs only, not to consider mapreduce. It may sound like a
strange need, but sometimes I want to read a block from a specific
data node which holds a replica. Figuring out which datanodes have the
block is easy. But is there an easy way to specify which datanode I
want to load from?

Best,
-Kevin



Re: EOFException while starting name node

2008-08-04 Thread lohit
We had seen similar exception earlier reported by others on the list. What you 
might want to try is to use a hex editor or equivalent to open up 'edits' and 
get rid of the last record. In all cases, the last record might not be complete 
so your namenode is not starting. Once you update your edits, start the 
namenode and run 'hadoop fsck /' to see if you have any corrupt files and 
fix/get rid of them. 
PS : Take a back up of  dfs.name.dir before updating and playing around with it.

Thanks,
Lohit



- Original Message 
From: steph <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Monday, August 4, 2008 8:31:07 AM
Subject: Re: EOFException while starting name node


2008-08-03 21:58:33,108 INFO org.apache.hadoop.ipc.Server: Stopping  
server on 9000
2008-08-03 21:58:33,109 ERROR org.apache.hadoop.dfs.NameNode:  
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:433)
at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:759)
at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:639)
at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java: 
222)
at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:235)
at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:131)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:176)
at org.apache.hadoop.dfs.NameNode.(NameNode.java:162)
at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:846)
at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)


Actually my exception is slightly different than yours. Maybe moving  
edits file
and recreating a new one will work for you.


On Aug 4, 2008, at 2:53 AM, Wanjari, Amol wrote:

> I'm getting the following exceptions while starting the name node -
>
> ERROR dfs.NameNode: java.io.EOFException
>at java.io.DataInputStream.readInt(DataInputStream.java:375)
>at
> org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:87)
>at
> org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:455)
>at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:733)
>at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:620)
>at
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222)
>at
> org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:76)
>at
> org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:221)
>at org.apache.hadoop.dfs.NameNode.init(NameNode.java:130)
>at org.apache.hadoop.dfs.NameNode.(NameNode.java:168)
>at
> org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:795)
>at org.apache.hadoop.dfs.NameNode.main(NameNode.java:804)
>
> Is there a way to recover the name node without losing any data.
>
> Thanks,
> Amol


Re: Multiple master nodes

2008-07-29 Thread lohit
It would be really helpful for many if you could create a twiki of this. Those 
ideas could be used while implementing HA.
Thanks,
Lohit



- Original Message 
From: paul <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, July 29, 2008 11:56:44 AM
Subject: Re: Multiple master nodes

I'm currently running with your option B setup and it seems to be reliable
for me (so far).  I use a combination of drbd and various hearbeat/LinuxHA
scripts that handle the failover process, including a virtual IP for the
namenode.  I haven't had any real-world unexpected failures to deal with,
yet, but all manual testing has had consistent and reliable results.



-paul


On Tue, Jul 29, 2008 at 1:54 PM, Ryan Shih <[EMAIL PROTECTED]> wrote:

> Dear Hadoop Community --
>
> I am wondering if it is already possible or in the plans to add capability
> for multiple master nodes. I'm in a situation where I have a master node
> that may potentially be in a less than ideal execution and networking
> environment. For this reason, it's possible that the master node could die
> at any time. On the other hand, the application must always be available. I
> have accessible to me other machines but I'm still unclear on the best
> method to add reliability.
>
> Here are a few options that I'm exploring:
> a) To create a completely secondary Hadoop cluster that we can flip to when
> we detect that the master node has died. This will double hardware costs,
> so
> if we originally have a 5 node cluster, then we would need to pull 5 more
> machines out of somewhere for this decision. This is not the preferable
> choice.
> b) Just mirror the master node via other always available software, such as
> DRBD for real time synchronization. Upon detection we could swap to the
> alternate node.
> c) Or if Hadoop had some functionality already in place, it would be
> fantastic to be able to take advantage of that. I don't know if anything
> like this is available but I could not find anything as of yet. It seems to
> me, however, that having multiple master nodes would be the direction
> Hadoop
> needs to go if it is to be useful in high availability applications. I was
> told there are some papers on Amazon's Elastic Computing that I'm about to
> look for that follow this approach.
>
> In any case, could someone with experience in solving this type of problem
> share how they approached this issue?
>
> Thanks!
>



Re: How to control the map and reduce step sequentially

2008-07-29 Thread lohit
Wiki和文件应该帮助。 否则,请打开JIRA要求将帮助大家:)的更好的文献



- Original Message 
From: Daniel Yu <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, July 29, 2008 9:22:00 AM
Subject: Re: How to control the map and reduce step sequentially

我现在在国外读书 我的毕业设计课题正好是用hadoop和hbase的 有一个中文社区是件挺不错的事
希望相关的文档资料都能及时跟进
2008/7/29 Xuebing Yan <[EMAIL PROTECTED]>

>
> 阿里巴巴搜索技术研发中心已经在和Hadoop PMC协商Hadoop中文社区的事情了,
> Hadoop 0.17的中文文档有可能在近期发布。
>
> -闫雪冰
>
> On Tue, 2008-07-29 at 16:25 +0800, rae l wrote:
> > 2008/7/29 晋光峰 <[EMAIL PROTECTED]>:
> > > I got it. Thanks!
> > >
> > > 2008/7/28 Shengkai Zhu <[EMAIL PROTECTED]>
> > >
> > >> The real reduce logic is actually started when all map tasks are
> finished.
> > >>
> > >> Is it still unexpected?
> > >>
> > >> 朱盛凯
> > >>
> > >> Jash Zhu
> > >>
> > >> 复旦大学软件学院
> >
> 根据我使用Hadoop和看过的Hadoop代码的经验,Reducer不会在Mapper之前运行;有时能观察到mapper先启动了,但也没有对程序运行的结果有影响;
> >
> > BTW:
> 原来有这么多国内的朋友在研究Hadoop啊,我也是在几个月前根据公司的任务开始研究和部署Hadoop;照此看来,如果我们建设一个Hadoop中文讨论区不知如何?或者哪位已知有了中文的Hadoop讨论区?根据PowerBy页面国内已经有了Koubei网已经在用上了:
> > http://wiki.apache.org/hadoop/PoweredBy
> >
> > --
> > 程任全
>
>



Re: partitioning the inputs to the mapper

2008-07-27 Thread lohit


>How do I partition the inputs to the mapper, such that a mapper  
>processes an entire file or files? What is happening now is that each  
>mapper receives only portions of a file and I want them to receive an  
>entire file. Is there a way to do that within the scope of the  
>framework?

http://wiki.apache.org/hadoop/FAQ#10

-Lohit



Re: Text search on a PDF file using hadoop

2008-07-23 Thread lohit
Can you provide more information. How are you passing your input, are you 
passing raw pdf files? If so, are you using your own record reader. Default 
record reader wont read pdf files and you wont get the text out of it as is. 
Thanks,
Lohit



- Original Message 
From: GaneshG <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, July 23, 2008 1:51:52 AM
Subject: Text search on a PDF file using hadoop


while i search a text in a pdf file using hadoop, the results are not coming
properly. i tried to debug my program, i could see the lines red from pdf
file is not formatted. please help me to resolve this.
-- 
View this message in context: 
http://www.nabble.com/Text-search-on-a-PDF-file-using-hadoop-tp18606475p18606475.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: How to add/remove slave nodes on run time

2008-07-11 Thread lohit
that should also work if you have set HADOOP_CONF_DIR in env. best way is to 
follow down the the shell script ./bin/start-all.sh which invokes 
./bin/start-dfs.sh which starts datanode like this
"$bin"/hadoop-daemons.sh --config $HADOOP_CONF_DIR start datanode $dataStartOpt
Yes, you need to start tasktracker as well.
Thanks,
Lohit

- Original Message 
From: Keliang Zhao <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, July 11, 2008 4:31:05 PM
Subject: Re: How to add/remove slave nodes on run time

May I ask what is the right command to start a datanode on a slave?

I used a simple one "bin/hadoop datanode &", but I am not sure.

Also. Should I start the tasktracker manually as well?

-Kevin


On Fri, Jul 11, 2008 at 3:56 PM, lohit <[EMAIL PROTECTED]> wrote:
> To add new datanodes, use the same hadoop version already running on your 
> cluster, the right config and start datanode on any node. The datanode would 
> be configured to talk to the namenode by reading the configs and it would 
> join the cluster. To remove datanode(s) you could decommission the datanode 
> and once decommissioned just kill DataNode process. This is described in 
> there http://wiki.apache.org/hadoop/FAQ#17
>
> Thanks,
> Lohit
>
> - Original Message 
> From: Kevin <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, July 11, 2008 3:43:41 PM
> Subject: How to add/remove slave nodes on run time
>
> Hi,
>
> I searched a bit but could not find the answer. What is the right way
> to add (and remove) new slave nodes on run time? Thank you.
>
> -Kevin
>
>



Re: How to add/remove slave nodes on run time

2008-07-11 Thread lohit
To add new datanodes, use the same hadoop version already running on your 
cluster, the right config and start datanode on any node. The datanode would be 
configured to talk to the namenode by reading the configs and it would join the 
cluster. To remove datanode(s) you could decommission the datanode and once 
decommissioned just kill DataNode process. This is described in there 
http://wiki.apache.org/hadoop/FAQ#17

Thanks,
Lohit

- Original Message 
From: Kevin <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, July 11, 2008 3:43:41 PM
Subject: How to add/remove slave nodes on run time

Hi,

I searched a bit but could not find the answer. What is the right way
to add (and remove) new slave nodes on run time? Thank you.

-Kevin



Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread lohit
Its not released yet. There are 2 options
1. download the un-released 0.18 branch from here 
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18
svn co http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 
branch-0.18

2. get the NLineInputFormat.java from 
http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18/src/mapred/org/apache/hadoop/mapred/lib/NLineInputFormat.java
copy it to your .mapred/lib directory, rebuild everything and try it out. I 
assume it should work, but I havent tried it out yet.

Thanks,
Lohit

- Original Message 
From: Sandy <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, July 10, 2008 3:45:34 PM
Subject: Re: Is Hadoop Really the right framework for me?

Thank for the responses..

Lohit and Mahadev: this sounds fantastic; however, where may I got hadoop
0.18? I went to http://hadoop.apache.org/core/releases.html

But did not see a link for hadoop 0.18. After I did a brief search on
google, it did not seem that Hadoop has been officially released yes. If
this is indeed the case, when is the release date scheduled? In the
meantime, could you please point me in the direction on where to acquire it?
If it a better idea for me to wait for the release?

Thank you kindly.

-SM

On Thu, Jul 10, 2008 at 5:18 PM, lohit <[EMAIL PROTECTED]> wrote:

> Hello Sandy,
>
> If you are using hadoop 0.18, you can use NLineInputFormat input format to
> get you job done. What this says is give exactly one line for each mapper.
> In your mapper you might have to encode your keys something like
> 
> So output from your mapper would be key/value pair as ,1
> Reducer would sum up all word:linenumber and in your reduce funtion, you
> would have to extract the work, linenumber and its count. The delimiter ':'
> should not be part of your word though.
>
> You might want to take a look at the example usage of NLineInputFormat from
> this test src/test/org/apache/hadoop/mapred/lib/TestLineInputFormat.java
>
> HTH,
> Lohit
> - Original Message 
> From: Sandy <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Thursday, July 10, 2008 2:47:21 PM
> Subject: Is Hadoop Really the right framework for me?
>
> Hello,
>
> I have been posting on the forums for a couple of weeks now, and I really
> appreciate all the help that I've been receiving. I am fairly new to Java,
> and even newer to the Hadoop framework. While I am sufficiently impressed
> with the Hadoop, quite a bit of the underlying functionality is masked to
> the user (which, while I understand is the point of a Map Reduce Framework,
> can be a touch frustrating for someone who is still trying to learn their
> way around), and the documentation is sometimes difficult to navigate. I
> have been thusfar unable to sufficiently find an answer to this question on
> my own.
>
> My goal is to implement a fairly simple map reduce algorithm. My question
> is, "Is Hadoop really the right framework to use for this algorithm?"
>
> I have one very large file containing multiple lines of text. I want to
> assign a mapper job to each line. Furthermore, the mapper needs to be able
> to know what line it is processing. If we were thinking about this in terms
> of the Word Count Example, let's say we have a modification where we want
> to
> just see where the words came from, rather than just the count of the
> words.
>
>
> For this example, we have the file:
>
> Hello World
> Hello Hadoop
> Goodbye Hadoop
>
>
> I want to assign a mapper to each line. Each mapper will emit a word and
> its
> corresponding line number. For this example, we would have three mappers,
> (call them m1, m2, and m3). Each mapper will emit the following:
>
> m1 emits:
> <"Hello", 1> <"World", 1>
>
> m2 emits:
> <"Hello", 2> <"Hadoop", 2>
>
> m3 emits:
> <"Goodbye",3> <"Hadoop", 3>
>
>
> My reduce function will count the number of words based on the -instances-
> of line numbers they have, which is necessary, because I wish to use the
> line numbers for another purpose.
>
>
> I have tried Hadoop Pipes, and the Hadoop Python interface. I am now
> looking
> at the Java interface, and am still puzzled how quite to implement this,
> mainly because I don't see how to assign mappers to lines of files, rather
> than to files themselves. From what I can see from the documentation,
> Hadoop
> seems to be more suitable for applications that deal multiple files rather
> than multiple lines. I want it to be able to spawn for any input file, a
> number of mappers corresponding to the number of lines. There can be a cap
> on the number of mappe

Re: Is Hadoop Really the right framework for me?

2008-07-10 Thread lohit
Hello Sandy,

If you are using hadoop 0.18, you can use NLineInputFormat input format to get 
you job done. What this says is give exactly one line for each mapper.
In your mapper you might have to encode your keys something like 

So output from your mapper would be key/value pair as ,1 
Reducer would sum up all word:linenumber and in your reduce funtion, you would 
have to extract the work, linenumber and its count. The delimiter ':' should 
not be part of your word though. 

You might want to take a look at the example usage of NLineInputFormat from 
this test src/test/org/apache/hadoop/mapred/lib/TestLineInputFormat.java

HTH,
Lohit
- Original Message 
From: Sandy <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, July 10, 2008 2:47:21 PM
Subject: Is Hadoop Really the right framework for me?

Hello,

I have been posting on the forums for a couple of weeks now, and I really
appreciate all the help that I've been receiving. I am fairly new to Java,
and even newer to the Hadoop framework. While I am sufficiently impressed
with the Hadoop, quite a bit of the underlying functionality is masked to
the user (which, while I understand is the point of a Map Reduce Framework,
can be a touch frustrating for someone who is still trying to learn their
way around), and the documentation is sometimes difficult to navigate. I
have been thusfar unable to sufficiently find an answer to this question on
my own.

My goal is to implement a fairly simple map reduce algorithm. My question
is, "Is Hadoop really the right framework to use for this algorithm?"

I have one very large file containing multiple lines of text. I want to
assign a mapper job to each line. Furthermore, the mapper needs to be able
to know what line it is processing. If we were thinking about this in terms
of the Word Count Example, let's say we have a modification where we want to
just see where the words came from, rather than just the count of the words.


For this example, we have the file:

Hello World
Hello Hadoop
Goodbye Hadoop


I want to assign a mapper to each line. Each mapper will emit a word and its
corresponding line number. For this example, we would have three mappers,
(call them m1, m2, and m3). Each mapper will emit the following:

m1 emits:
<"Hello", 1> <"World", 1>

m2 emits:
<"Hello", 2> <"Hadoop", 2>

m3 emits:
<"Goodbye",3> <"Hadoop", 3>


My reduce function will count the number of words based on the -instances-
of line numbers they have, which is necessary, because I wish to use the
line numbers for another purpose.


I have tried Hadoop Pipes, and the Hadoop Python interface. I am now looking
at the Java interface, and am still puzzled how quite to implement this,
mainly because I don't see how to assign mappers to lines of files, rather
than to files themselves. From what I can see from the documentation, Hadoop
seems to be more suitable for applications that deal multiple files rather
than multiple lines. I want it to be able to spawn for any input file, a
number of mappers corresponding to the number of lines. There can be a cap
on the number of mappers spawned (e.g. 128) so that if the number of lines
exceed the number of mappers, then the mappers can concurrently process
lines until all lines are exhausted. I can't see a straightfoward way to do
this using the Hadoop framework.

Please keep in mind that I cannot put each line in its own separate file;
the number of lines in my file is sufficiently large that this is really not
a good idea.


Given this information, is Hadoop really the right framework to use? If not,
could you please suggest alternative frameworks? I am currently looking at
Skynet and Erlang, though I am not too familiar with either.

I would appreciate any feedback. Thank you for your time.

Sincerely,

-SM



Re: Cannot decommission on 16.4

2008-07-08 Thread lohit
there are few things which aren't documented.
- you should have defined full path of file as part of dfs.hosts.exclude before 
starting the namenode. This file must exists, could be a zero length file.
- While the system is running, you add the hostname (fully qualified) to this 
file and then invoke hadoop dfsadmin -refreshNodes
- You should have enough free datanodes in the cluster so that the blocks from 
this node can be replicated to other nodes. Eg if you replication factor is 3 
its good to have atleast 4 datanodes before you decomission any one of them.
- dfs.namenode.decommission.interval defines interval at which namenode checks 
if decomission is complete, after which it removes from its list and out of 
service. 

PS : IP address instead of hostname in excludes file should also work. 

Thanks,
Lohit

- Original Message 
From: Chris Kline <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Tuesday, July 8, 2008 9:38:36 AM
Subject: Cannot decommission on 16.4

I followed the instruction on the wiki and searched JIRA tickets for  
more info, and still cannot decommission a node on 16.4.  I've tried  
different combinations of hostnames and IPs and nothing seems to work.

Has anyone successfully decommissioned a node on 16.4?  If so, was  
there some trick?  I'm using the exclude method.

-Chris


Re: ERROR dfs.NameNode - java.io.EOFException

2008-07-05 Thread lohit
I remember dhruba telling me about this once.
Yes, Take a backup of the whole current directory.
As you have seen, remove the last line from edits and try to start the 
NameNode. 
If it starts, then run fsck to find out which file had the problem. 
Thanks,
Lohit

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, July 4, 2008 4:46:57 PM
Subject: Re: ERROR dfs.NameNode - java.io.EOFException

Hi,

If it helps with the problem below -- I don't mind losing some data.
For instance, I see my "edits" file has about 74K lines.
Can I just nuke the edits file or remove the last N lines?

I am looking at the edits file with vi and I see the very last line is very 
short - it looks like it was cut off, incomplete, and some of the logs do 
mention running out of disk space (even though the NN machine has some more 
free space).

Could I simply remove this last incomplete line?

Any help would be greatly appreciated.

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Otis Gospodnetic <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, July 4, 2008 2:00:58 AM
> Subject: ERROR dfs.NameNode - java.io.EOFException
> 
> Hi,
> 
> Using Hadoop 0.16.2, I am seeing seeing the following in the NN log:
> 
> 2008-07-03 19:46:26,715 ERROR dfs.NameNode - java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:180)
> at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
> at 
> org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
> at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:433)
> at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:756)
> at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:639)
> at 
> org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222)
> at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
> at 
> org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
> at org.apache.hadoop.dfs.FSNamesystem.(FSNamesystem.java:235)
> at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:131)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:176)
> at org.apache.hadoop.dfs.NameNode.(NameNode.java:162)
> at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:846)
> at org.apache.hadoop.dfs.NameNode.main(NameNode.java:855)
> 
> The exception doesn't include the name and location of the file whose reading 
> is 
> failing and causing EOFException :(
> But it looks like it's the fsedit log (the "edits" file, I think).
> 
> There is no secondary NN in the cluster.
> 
> Is there any way I can revive this NN?  Any way to "fix" the corrupt "edits" 
> file?
> 
> Thanks,
> Otis


Re: best command line way to check up/down status of HDFS?

2008-06-27 Thread lohit
If NameNode is down, secondary namenode does not serve requests. It is used to 
update the fsimage. 
(http://hadoop.apache.org/core/docs/r0.17.0/hdfs_user_guide.html#Secondary+Namenode)

Thanks,
Lohit

- Original Message 
From: Miles Osborne <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, June 27, 2008 10:30:14 AM
Subject: Re: best command line way to check up/down status of HDFS?

in that case, do:

jps

and look for both the namenode and also the secondary node

Miles

2008/6/27 Meng Mao <[EMAIL PROTECTED]>:

> I was thinking of checking for both independently, and taking a logical OR.
> Would that be sufficient?
>
> I'm trying to avoid file reading if possible. Not that reading through a
> log
> is that intensive,
> but it'd be cleaner if I could poll either Hadoop itself or inspect the
> processes running.
>
> On Fri, Jun 27, 2008 at 1:23 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
>
> > that won't work since the namenode may be down, but the secondary
> namenode
> > may be up instead
> >
> > why not instead just look at the respective logs?
> >
> > Miles
> >
> > 2008/6/27 Meng Mao <[EMAIL PROTECTED]>:
> >
> > > Is running:
> > > ps aux | grep [\\.]NameNode
> > >
> > > and looking for a non empty response a good way to test HDFS up status?
> > >
> > > I'm assuming that if the NameNode process is down, then DFS is
> definitely
> > > down?
> > > Worried that there'd be frequent cases of DFS being messed up but the
> > > process still running just fine.
> > >
> > > On Fri, Jun 27, 2008 at 10:48 AM, Meng Mao <[EMAIL PROTECTED]> wrote:
> > >
> > > > For a Nagios script I'm writing, I'd like a command-line method that
> > > checks
> > > > if HDFS is up and running.
> > > > Is there a better way than to attempt a hadoop dfs command and check
> > the
> > > > error code?
> > > >
> > >
> > >
> > >
> > > --
> > > hustlin, hustlin, everyday I'm hustlin
> > >
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in Scotland,
> > with registration number SC005336.
> >
>
>
>
> --
> hustlin, hustlin, everyday I'm hustlin
>



-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.



Re: HDFS blocks

2008-06-27 Thread lohit
>1. Can we have multiple files in DFS use different block sizes ?
No, current this might not be possible, we have fixed sized blocks.

>2. If we use default block size for these small chunks, is the DFS space
>wasted ?
 
DFS space is not wasted, all the blocks are stored on individual datanode's 
filesystem as is. But you would be wasting NameNode's namespace. NameNode holds 
the entire namespace in memory, so, instead of using 1 file with 128M block if 
you do multiple files of size 6M you would be having so many entries.

> If not then does it mean that a single DFS block can hold data from
>more than one file ?
DFS Block cannot hold data from more than one file. If your file size say 5M 
which is less than your default block size say 128M, then the block stored in 
DFS would be 5M alone.

To over come this, ppl usually run a map/reduce job with 1 reducer and Identity 
mapper, which basically merges all small files into one file. In hadoop 0.18 we 
have archives and once HADOOP-1700 is done, one could open the file to append 
to it.

Thanks,
Lohit


- Original Message 
From: "Goel, Ankur" <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, June 27, 2008 2:27:57 AM
Subject: HDFS blocks


Hi Folks,
I have a setup where in I am streaming data into HDFS from a
remote location and creating a new files every X min. The file generated
is of a very small size (512 KB - 6 MB) size. Since that is the size
range the streaming code sets the block size to 6MB whereas default that
we have set for the cluster is 128 MB. The idea behind such a thing is
to generate small temporal data chunks from multiple sources and merge
them periodically into a big chunk with our default (128 MB) block size.

The webUI for DFS reports the block size for these files to be 6 MB. My
questions are.
1. Can we have multiple files in DFS use different block sizes ?
2. If we use default block size for these small chunks, is the DFS space
wasted ? 
   If not then does it mean that a single DFS block can hold data from
more than one file ?

Thanks
-Ankur



Re: hadoop file system error

2008-06-26 Thread lohit
Hi Roman,

Which version of hadoop are you running. And do you see any errors/stack trace 
dumps in log files?
Can you check $HADOOP_LOG_DIR/*-datanode-*.log and 
$HADOOP_LOG_DIR/*-namenode-*.log

Can you also make sure you have NameNode and DataNode running. 

Thanks,
Lohit

- Original Message 
From: brainstorm <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, June 26, 2008 8:24:49 AM
Subject: Re: hadoop file system error

I'm having a similar problem but with the hadoop CLI tool (not
programatically), and it's driving me nuts:

[EMAIL PROTECTED]:~/nutch/trunk$ cat urls/urls.txt
http://escert.upc.edu/

[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -ls
Found 0 items
[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -put urls urls

[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -ls
Found 1 items
/user/hadoop/urls2008-06-26 17:20rwxr-xr-xhadoop
supergroup
[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -ls urls
Found 1 items
/user/hadoop/urls/urls.txt02008-06-26 17:20rw-r--r--
hadoopsupergroup

[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -cat urls/urls.txt
[EMAIL PROTECTED]:~/nutch/trunk$ bin/hadoop dfs -get urls/urls.txt .
[EMAIL PROTECTED]:~/nutch/trunk$ cat urls.txt
[EMAIL PROTECTED]:~/nutch/trunk$

As you see, I put a txt file on HDFS from local, containing a line,
but afterwards, this file is empty... amb I missing any "close",
"flush" or "commit" command ?

Thanks in advance,
Roman

On Thu, Jun 19, 2008 at 7:23 PM, Mori Bellamy <[EMAIL PROTECTED]> wrote:
> might it be a synchronization problem? i don't know if hadoops DFS magically
> takes care of that, but if it doesn't then you might have a problem because
> of multiple processes trying to write to the same file?
>
> perhaps as a control experiment you could run your process on some small
> input, making sure that each reduce task outputs to a different filename (i
> just use Math.random()*Integer.MAX_VALUE and cross my fingers).
> On Jun 18, 2008, at 6:01 PM, 晋光峰 wrote:
>
>> i'm sure i close all the files in the reduce step. Any other reasons cause
>> this problem?
>>
>> 2008/6/18 Konstantin Shvachko <[EMAIL PROTECTED]>:
>>
>>> Did you close those files?
>>> If not they may be empty.
>>>
>>>
>>>
>>> ??? wrote:
>>>
>>>> Dears,
>>>>
>>>> I use hadoop-0.16.4 to do some work and found a error which i can't get
>>>> the
>>>> reasons.
>>>>
>>>> The scenario is like this: In the reduce step, instead of using
>>>> OutputCollector to write result, i use FSDataOutputStream to write
>>>> result
>>>> to
>>>> files on HDFS(becouse i want to split the result by some rules). After
>>>> the
>>>> job finished, i found that *some* files(but not all) are empty on HDFS.
>>>> But
>>>> i'm sure in the reduce step the files are not empty since i added some
>>>> logs
>>>> to read the generated file. It seems that some file's contents are lost
>>>> after the reduce step. Is anyone happen to face such errors? or it's a
>>>> hadoop bug?
>>>>
>>>> Please help me to find the reason if you some guys know
>>>>
>>>> Thanks & Regards
>>>> Guangfeng
>>>>
>>>>
>>
>>
>> --
>> Guangfeng Jin
>>
>> Software Engineer
>>
>> iZENEsoft (Shanghai) Co., Ltd
>> Room 601 Marine Tower, No. 1 Pudong Ave.
>> Tel:86-21-68860698
>> Fax:86-21-68860699
>> Mobile: 86-13621906422
>> Company Website:www.izenesoft.com
>
>



Re: Compiling Word Count in C++ : Hadoop Pipes

2008-06-25 Thread lohit
ant -Dcompile.c++=yes compile-c++-examples
I picked it up from build.xml

Thanks,
Lohit

- Original Message 
From: Sandy <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, June 25, 2008 10:44:20 AM
Subject: Compiling Word Count in C++ : Hadoop Pipes

Hi,

I am currently trying to get Hadoop Pipes working. I am following the
instructions at the hadoop wiki, where it provides code for a C++
implementation of Word Count (located here:
http://wiki.apache.org/hadoop/C++WordCount?highlight=%28C%2B%2B%29)

I am having some trouble parsing the instructions. What should the file
containing the new word count program be called? "examples"?

If I were to call the file "example" and type in the following:
$ ant -Dcompile.c++=yes example
Buildfile: build.xml

BUILD FAILED
Target `example' does not exist in this project.

Total time: 0 seconds


If I try and compile with "examples" as stated on the wiki, I get:
$ ant -Dcompile.c++=yes examples
Buildfile: build.xml

clover.setup:

clover.info:
 [echo]
 [echo]  Clover not found. Code coverage reports disabled.
 [echo]

clover:

init:
[touch] Creating /tmp/null810513231
   [delete] Deleting: /tmp/null810513231
 [exec] svn: '.' is not a working copy
 [exec] svn: '.' is not a working copy

record-parser:

compile-rcc-compiler:
[javac] Compiling 29 source files to
/home/sjm/Desktop/hadoop-0.16.4/build/classes

BUILD FAILED
/home/sjm/Desktop/hadoop-0.16.4/build.xml:241: Unable to find a javac
compiler;
com.sun.tools.javac.Main is not on the classpath.
Perhaps JAVA_HOME does not point to the JDK

Total time: 1 second



I am a bit puzzled by this. Originally I got the error that tools.jar was
not found, because it was looking for it under
/usr/java/jre1.6.0_06/lib/tools.jar . There is a tools.jar under
/usr/java/jdk1.6.0_06/lib/tools.jar. If I copy this file over to the jre
folder, that message goes away and its replaced with the above message.

My hadoop-env.sh file looks something like:
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.  Required.
# export JAVA_HOME=$JAVA_HOME


and my .bash_profile file has this line in it:
JAVA_HOME=/usr/java/jre1.6.0_06; export JAVA_HOME
export PATH


Furthermore, if I go to the command line and type in javac -version, I get:
$ javac -version
javac 1.6.0_06


I also had no problem getting through the hadoop word count map reduce
tutorial in Java. It was able to find my java compiler fine. Could someone
please point me in the right direction? Also, since it is an sh file, should
that export line in hadoop-env.sh really start with a hash sign?

Thank you in advance for your assistance.

-SM



Re: Global Variables via DFS

2008-06-25 Thread lohit
As steve mentioned you could open up a HDFS file from within your map/reduce 
task.
Also instead of using DistributedFileSystem, you would actually use FileSystem. 
This is what I do.


FileSystem fs = FileSystem.get( new Configuration() );
FSDataInputStream file = fs.open(new Path("/user/foo/jambajuice");


Thanks,
Lohit
- Original Message 
From: Steve Loughran <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, June 25, 2008 9:15:55 AM
Subject: Re: Global Variables via DFS

javaxtreme wrote:
> Hello all,
> I am having a bit of a problem with a seemingly simple problem. I would like
> to have some global variable which is a byte array that all of my map tasks
> have access to. The best way that I currently know of to do this is to have
> a file sitting on the DFS and load that into each map task (note: the global
> variable is very small ~20kB). My problem is that I can't seem to load any
> file from the Hadoop DFS into my program via the API. I know that the
> DistributedFileSystem class has to come into play, but for the life of me I
> can't get it to work. 
> 
> I noticed there is an initialize() method within the DistributedFileSystem
> class, and I thought that I would need to call that, however I'm unsure what
> the URI parameter ought to be. I tried "localhost:50070" which stalled the
> system and threw a connectionTimeout error. I went on to just attempt to
> call DistributedFileSystem.open() but again my program failed this time with
> a NullPointerException. I'm assuming that is stemming from he fact that my
> DFS object is not "initialized".
> 
> Does anyone have any information on how exactly one programatically goes
> about loading in a file from the DFS? I would greatly appreciate any help.
> 

If the data changes, this sounds more like the kind of data that a 
distributed hash table or tuple space should be looking after...sharing 
facts between nodes

1. what is the rate of change of the data?
2. what are your requirements for consistency?

If the data is static, then yes, a shared file works.  Here's my code 
fragments to work with one. You grab the URI from the configuration, 
then initialise the DFS with both the URI and the configuration.

 public static DistributedFileSystem 
createFileSystem(ManagedConfiguration conf) throws 
SmartFrogRuntimeException {
 String filesystemURL = 
conf.get(HadoopConfiguration.FS_DEFAULT_NAME);
 URI uri = null;
 try {
 uri = new URI(filesystemURL);
 } catch (URISyntaxException e) {
 throw (SmartFrogRuntimeException) SmartFrogRuntimeException
 .forward(ERROR_INVALID_FILESYSTEM_URI + filesystemURL,
 e);
 }
 DistributedFileSystem dfs = new DistributedFileSystem();
 try {
 dfs.initialize(uri, conf);
 } catch (IOException e) {
 throw (SmartFrogRuntimeException) SmartFrogRuntimeException
 .forward(ERROR_FAILED_TO_INITIALISE_FILESYSTEM, e);

 }
 return dfs;
 }

As to what URLs work, try  "localhost:9000"; this works on machines 
where I've brought a DFS up on that port. Use netstat to verify your 
chosen port is live.



Re: Map Task timed out?

2008-06-12 Thread lohit
Check RandomWriter.java
look for reporter.setStatus("wrote record " + itemCount + "..

- Original Message 
From: Edward J. Yoon <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, June 12, 2008 10:35:14 PM
Subject: Re: Map Task timed out?

Thanks for your nice answer. :)
Then, Can I handle the *reporter*? If so, please give me some examples.

Thanks.

On Fri, Jun 13, 2008 at 2:18 PM, lohit <[EMAIL PROTECTED]> wrote:
> Yes, there is a timeout defined by mapred.task.timeout
> default was 600 seconds. And here silent means the task (either map or reduce 
> ) has not reported any status using the reporter you get with map/reduce 
> function
>
> Thanks,
> Lohit
>
> - Original Message 
> From: Edward J. Yoon <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Thursday, June 12, 2008 7:12:50 PM
> Subject: Map Task timed out?
>
> Hi,
>
> If map task is unexpectedly "silent" for a long time (e.g. wait for
> other application response), What happen? Is there any limit for
> staying?
>
> Thanks
> --
> Best regards,
> Edward J. Yoon,
> http://blog.udanax.org
>
>



-- 
Best regards,
Edward J. Yoon,
http://blog.udanax.org



Re: Map Task timed out?

2008-06-12 Thread lohit
Yes, there is a timeout defined by mapred.task.timeout
default was 600 seconds. And here silent means the task (either map or reduce ) 
has not reported any status using the reporter you get with map/reduce function

Thanks,
Lohit

- Original Message 
From: Edward J. Yoon <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, June 12, 2008 7:12:50 PM
Subject: Map Task timed out?

Hi,

If map task is unexpectedly "silent" for a long time (e.g. wait for
other application response), What happen? Is there any limit for
staying?

Thanks
-- 
Best regards,
Edward J. Yoon,
http://blog.udanax.org



Re: Question about Hadoop

2008-06-12 Thread lohit
Ideally what you would want is your data to be on HDFS and run your map/reduce 
jobs on that data. Hadoop framework splits you data and feeds in those splits 
to each map or reduce task. One problem with Image files is that you will not 
be able to split them. Alternatively people have done this, they wrap Image 
files within xml and create huge files which has multiple image files in them. 
Hadoop offers something called streaming using which you will be able to split 
the files at xml boundry and feed it to your map/reduce tasks. Streaming also 
enables you to use any code like perl/php/c++. 
Check info about streaming here 
http://hadoop.apache.org/core/docs/r0.17.0/streaming.html
And information about parsing XML files in streaming in here 
http://hadoop.apache.org/core/docs/r0.17.0/streaming.html#How+do+I+parse+XML+documents+using+streaming%3F

Thanks,
Lohit

- Original Message 
From: Chanchal James <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, June 12, 2008 9:42:46 AM
Subject: Question about Hadoop

Hi,



I have a question about Hadoop. I am a beginner and just testing Hadoop.

Would like to know how a php application would benefit from this, say an

application that needs to work on a large number of image files. Do I have
to

store the application in HDFS always, or do I just copy it to HDFS when

needed, do the processing, and then copy it back to the local file system ?

Is that the case with the data files too ? Once I have Hadoop running, do I

keep all data & application files in HDFS always, and not use local file

system storage ?



Thank you.



Re: confusing about decommission in HDFS

2008-06-04 Thread lohit
The 3 steps you mentioned, were they done while namenode was still running?
I think (I might be wrong as well), that the config is read only once, when the 
namenode is started. So, you should have defined dfs.hosts.exclude file before 
hand. 
When you want to refresh, you just updated the file already defined in the 
config and call refresh. And this relates to namenode config. Is it possible to 
test it by defining it in config, restarting it and then trying to decomission 
the nodes?
If you are still seeing issues, would it be possible to open a JIRA 
(https://issues.apache.org/jira/secure/CreateIssue!default.jspa) describing 
steps to reproduce it.

PS: Even if it works, feel free to open JIRA for better documentation. I could 
not find one in HDFS User Guide. 
Thanks,
Lohit

- Original Message 
From: Xiangna Li <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, June 4, 2008 8:54:07 AM
Subject: confusing about decommission in HDFS

hi,

I try to decommission a node by the following the steps:
  (1) write the hostname of decommission node in a file as the exclude file.
  (2) let the exclude file be specified as a configuration
parameter  dfs.hosts.exclude.
  (3) run "bin/hadoop dfsadmin -refreshNodes".

 It is surprising that the node is found both in the "Live
Datanodes" list with "In Service" status, and also in the "Dead
Datanodes" lists of the dfs namenode web ui. I copy GB-files to the
HDFS to confirm whether it is in Service, and the result is that its
Used size is increasing as others. So could say the decommission
feature don't work ?

 the more strange thing, I put some nodes in the include file and
then add the configuration and then"refreshNodes", but these nodes and
the exclude node are all only in the "Dead Datanodes" lists. Is it a
bug?


Sincerely for your reply!



Re: setrep

2008-06-03 Thread lohit


>It seems that setrep won't force replication change to the specified number 
>immediately, it changed really slowly. just wondering if this is the expected 
>behavior? what's the rational for this behavior? is there way to speed it up? 

Yes, it wont force replication to be instant. Once you increase the replication 
factor of a file, namenode adds it to neededReplication list. Namenode has a 
thread running which periodically scans this list and chooses a set of blocks 
which are under replicated and request the datanodes to replicate them. This 
interval is configured using dfs.replication.interval config variable. Interval 
dfs.replication.interval is in seconds.
The list of neededReplication also maintains a priority policy, where in blocks 
with only one copy would be replicated first. 

If you do not have lot of underReplicated blocks, this should happen pretty 
fast. Are you seeing very long delays? 
Thanks,
Lohit



Re: About Metrics update

2008-06-02 Thread lohit
In MetricsIntValue, incrMetrics() was being called on pushMetrics(), instead of 
setMetrics(). This used to cause the values to be incremented periodically.
Thanks,
Lohit

- Original Message 
From: Ion Badita <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Saturday, May 31, 2008 4:11:00 AM
Subject: Re: About Metrics update

Hi,

I'm using version 0.17.
Do you know how is fixed?

Thanks
Ion


lohit wrote:
> Hi Ion,
>
> Which version of Hadoop are you using? The problem you reported about 
> safeModeTime and fsImageLoadTime keep growing was fixed in 0.18 (or trunk)
>
> Thanks,
> Lohit
>
> - Original Message 
> From: Ion Badita <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, May 30, 2008 8:10:52 AM
> Subject: Re: About Metrics update
>
> Hi,
>
> I found (because of the Metrics behavior reported in the previous 
> e-mail) some errors in the metrics reported by the NameNodeMetrics:
> safeModeTime and fsImageLoadTime keep growing (they should be the same 
> over time). The mentioned metrics use MetricsIntValue for the values, on 
> MetricsIntValue .pushMetric() if "changed" field is marked true then the 
> value is "published" in MetricsRecod else the method does nothing.
>
> public synchronized void pushMetric(final MetricsRecord mr) {
> if (changed)
>   mr.incrMetric(name, value);
> changed = false;
>   }
>
> The problem is in AbstractMetricsContext.update() method, because the 
> metricUpdates are not cleared after been merged in the "record's 
> internal data".
>
> Ion
>
>
>
> Ion Badita wrote:
>  
>> Hi,
>>
>> A looked over the class 
>> org.apache.hadoop.metrics.spi.AbstractMetricsContext and i have a 
>> question:
>> why in the update(MetricsRecordImpl record) metricUpdates Map is not 
>> cleared after the updates are merged in metricMap. Because of this on 
>> every update() "old" increments are merged in metricMap. Is this the 
>> right behavior?
>> If i want to increment only one metric in the record using current 
>> implementation is not possible without modifying other metrics that 
>> are incremented rare.
>>
>>
>> Thanks
>> Ion
>>


Re: About Metrics update

2008-05-30 Thread lohit
Hi Ion,

Which version of Hadoop are you using? The problem you reported about 
safeModeTime and fsImageLoadTime keep growing was fixed in 0.18 (or trunk)

Thanks,
Lohit

- Original Message 
From: Ion Badita <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, May 30, 2008 8:10:52 AM
Subject: Re: About Metrics update

Hi,

I found (because of the Metrics behavior reported in the previous 
e-mail) some errors in the metrics reported by the NameNodeMetrics:
safeModeTime and fsImageLoadTime keep growing (they should be the same 
over time). The mentioned metrics use MetricsIntValue for the values, on 
MetricsIntValue .pushMetric() if "changed" field is marked true then the 
value is "published" in MetricsRecod else the method does nothing.

public synchronized void pushMetric(final MetricsRecord mr) {
if (changed)
  mr.incrMetric(name, value);
changed = false;
  }

The problem is in AbstractMetricsContext.update() method, because the 
metricUpdates are not cleared after been merged in the "record's 
internal data".

Ion



Ion Badita wrote:
> Hi,
>
> A looked over the class 
> org.apache.hadoop.metrics.spi.AbstractMetricsContext and i have a 
> question:
> why in the update(MetricsRecordImpl record) metricUpdates Map is not 
> cleared after the updates are merged in metricMap. Because of this on 
> every update() "old" increments are merged in metricMap. Is this the 
> right behavior?
> If i want to increment only one metric in the record using current 
> implementation is not possible without modifying other metrics that 
> are incremented rare.
>
>
> Thanks
> Ion


Re: Hadoop fsck displays open files as corrupt.

2008-05-22 Thread lohit
Hi Martin,

What you are seeing is expected behavior in earlier versions of Hadoop. 
hadoop-0.18 has fix for this.
What happens here is that, when a file is opened, but not yet closed; the 
namenode does not yet know the locations/block size from all the datanodes. 
When a block is successfully written by datanode, it reports this to the 
namenode. If it at this point, that namenode has knowledge of this block, its 
location and its size. For open files, this is still being updated. 

hadoop 0.18 filters out open files and provides and option (-openforwrite) to 
select such files. 

Thanks,
Lohit

- Original Message 
From: Martin Schaaf <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, May 22, 2008 4:14:20 PM
Subject: Hadoop fsck displays open files as corrupt.

Hi,

we wrote a program that uses a Writer to append keys and values to a file.
If we do an fsck during these writing the opened files are reported as corrupt 
and
the file size is zero until they are closed. On the other side if
we copy a file from local fs to the hadoop fs the size constantly increases and 
the
files aren't displayed as corrupt. So my question is this the expected 
behaviour? What
is the difference between this two operation.

Thanks in advance for your help
martin



Re: Making the case for Hadoop

2008-05-16 Thread lohit
You could also find some info about companies/projects using Hadoop at 
PoweredBy page
http://wiki.apache.org/hadoop/PoweredBy

Thanks,
Lohit

- Original Message 
From: Ted Dunning <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, May 16, 2008 10:02:25 AM
Subject: Re: Making the case for Hadoop


Nothing is the best case!


On 5/16/08 7:00 AM, "Edward Capriolo" <[EMAIL PROTECTED]> wrote:

> So hadoop is a fact. My advice for convincing IT executives. Ask them
> to present their alternative. (usually its nothing)


Re: When is HDFS really "corrupt"...(and can I upgrade a corrupt FS?)

2008-05-15 Thread lohit
Yes, that file is temp file used by one of your reducer. That is a file which 
was opened, but never closed hence namenode does not know location information 
of last block of such files. In hadoop-0.18 we have an option to filter of 
files which are open and do not consider them part contributing to filesystem 
as being CORRUPT. 
Thanks,
Lohit

- Original Message 
From: C G <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, May 15, 2008 12:51:55 PM
Subject: Re: When is HDFS really "corrupt"...(and can I upgrade a corrupt FS?)

I hadn't considered looking for the word MISSING...thanks for the heads-up.  I 
did a search and found the following:
  
  /output/ae/_task_200803191317_9183_r_08_1/part-8 0, 390547 block(s):  
MISSING 1 blocks of total size 0 B
0. -7099420740240431420 len=0 MISSING!

  That's the only one found.  Is it safe/sufficient to simply delete this file? 
 
  
  There were MR jobs active when the master failed...it wasn't a clean shutdown 
by any means.  I surmise this file is  remnant from an active job.
  
  Thanks,
  C G 
  
Lohit <[EMAIL PROTECTED]> wrote:
  Filesystem is considered corrupt if there are any missing blocks. do you see 
MISSING in your output? and also we see missing blocks for files not closed 
yet. When u stopped MR cluster where there any jobs running? 




On May 15, 2008, at 12:15 PM, C G 
wrote:

Earlier this week I wrote about a master node crash and our efforts to recover 
from the crash. We recovered from the crash and all systems are normal. 
However, I have a concern about what fsck is reporting and what it really means 
for a filesystem to be marked "corrupt."

With the mapred engine shut down, I ran fsck / -files -blocks -locations to 
inspect the file system. The output looks clean with the exception of this at 
the end of the output:

Status: CORRUPT
Total size: 5113667836544 B
Total blocks: 1070996 (avg. block size 4774684 B)
Total dirs: 50012
Total files: 1027089
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Target replication factor: 3
Real replication factor: 3.0

The filesystem under path '/' is CORRUPT

In reviewing the fsck output, there are no obvious errors being reported. I see 
tons of output like this:

/foo/bar/part-5 3387058, 6308 block(s): OK
0. 4958936159429948772 len=3387058 repl=3 [10.2.14.5:50010, 10.2.14.20:50010, 
10.2.14.8:50010]

and the only status ever reported is "OK."

So this begs the question about what causes HDFS to declare the FS is "corrupt" 
and how do I clear this up?

The second question, assuming that I can't make the "corrupt" state go away, 
concerns running an upgrade. If every file in HDFS reports "OK" but the FS 
reports "corrupt", is it safe to undertake an upgrade from 0.15.x to 0.16.4 ?

Thanks for any help
C G


Re: When is HDFS really "corrupt"...(and can I upgrade a corrupt FS?)

2008-05-15 Thread Lohit
Filesystem is considered corrupt if there are any missing blocks. do you see 
MISSING in your output? and also we see missing blocks for files not closed 
yet. When u stopped MR cluster where there any jobs running? 




On May 15, 2008, at 12:15 PM, C G <[EMAIL PROTECTED]> wrote:

Earlier this week I wrote about a master node crash and our efforts to recover 
from the crash.  We recovered from the crash and all systems are normal.  
However, I have a concern about what fsck is reporting and what it really means 
for a filesystem to be marked "corrupt."

 With the mapred engine shut down, I ran fsck / -files -blocks -locations to 
inspect the file system.  The output looks clean with the exception of this at 
the end of the output:

 Status: CORRUPT
Total size:5113667836544 B
Total blocks:  1070996 (avg. block size 4774684 B)
Total dirs:50012
Total files:   1027089
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Target replication factor: 3
Real replication factor:   3.0

The filesystem under path '/' is CORRUPT

 In reviewing the fsck output, there are no obvious errors being reported.  I 
see tons of output like this:

 /foo/bar/part-5 3387058, 6308 block(s):  OK
0. 4958936159429948772 len=3387058 repl=3 [10.2.14.5:50010, 10.2.14.20:50010, 
10.2.14.8:50010]

 and the only status ever reported is "OK."

 So this begs the question about what causes HDFS to declare the FS is 
"corrupt" and how do I clear this up?

 The second question, assuming that I can't make the "corrupt" state go away, 
concerns running an upgrade.  If every file in HDFS reports "OK" but the FS 
reports "corrupt", is it safe to undertake an upgrade from 0.15.x to 0.16.4 ?

 Thanks for any help
 C G





Re: Trouble hooking up my app to HDFS

2008-05-14 Thread lohit
You could do this.
open up hadoop (its a shell script). The last line is the one which executes 
the corresponding class of hadoop, instead of exec, make it echo and see what 
all is present in your classpath. Make sure your generated class path matches 
the same. And the conf dir (/Users/bryanduxbury/hadoop-0.16.3/conf) I hope it 
is the similar as the one you are using for your hadoop installation. 

Thanks,
lohit

- Original Message 
From: Bryan Duxbury <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Wednesday, May 14, 2008 7:30:26 PM
Subject: Re: Trouble hooking up my app to HDFS

Nobody has any ideas about this?

-Bryan

On May 13, 2008, at 11:27 AM, Bryan Duxbury wrote:

> I'm trying to create a java application that writes to HDFS. I have  
> it set up such that hadoop-0.16.3 is on my machine, and the env  
> variables HADOOP_HOME and HADOOP_CONF_DIR point to the correct  
> respective directories. My app lives elsewhere, but generates it's  
> classpath by looking in those environment variables. Here's what my  
> generated classpath looks like:
>
> /Users/bryanduxbury/hadoop-0.16.3/conf:/Users/bryanduxbury/ 
> hadoop-0.16.3/hadoop-0.16.3-core.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/hadoop-0.16.3-test.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/commons-cli-2.0-SNAPSHOT.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/commons-codec-1.3.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/commons-httpclient-3.0.1.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/commons-logging-1.0.4.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/commons-logging-api-1.0.4.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/jets3t-0.5.0.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/jetty-5.1.4.jar:/Users/bryanduxbury/hadoop-0.16.3/ 
> lib/jetty-ext/commons-el.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
> jetty-ext/jasper-compiler.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
> jetty-ext/jasper-runtime.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
> jetty-ext/jsp-api.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/ 
> junit-3.8.1.jar:/Users/bryanduxbury/hadoop-0.16.3/lib/kfs-0.1.jar:/ 
> Users/bryanduxbury/hadoop-0.16.3/lib/log4j-1.2.13.jar:/Users/ 
> bryanduxbury/hadoop-0.16.3/lib/servlet-api.jar:/Users/bryanduxbury/ 
> hadoop-0.16.3/lib/xmlenc-0.52.jar:/Users/bryanduxbury/projects/ 
> hdfs_collector/lib/jtestr-0.2.jar:/Users/bryanduxbury/projects/ 
> hdfs_collector/lib/jvyaml.jar:/Users/bryanduxbury/projects/ 
> hdfs_collector/lib/libthrift.jar:/Users/bryanduxbury/projects/ 
> hdfs_collector/build/hdfs_collector.jar
>
> The problem I have is that when I go to get a FileSystem object for  
> my file:/// files (for testing locally), I'm getting errors like this:
>
>[jtestr] java.io.IOException: No FileSystem for scheme: file
>[jtestr]   org/apache/hadoop/fs/FileSystem.java:1179:in  
> `createFileSystem'
>[jtestr]   org/apache/hadoop/fs/FileSystem.java:55:in `access 
> $300'
>[jtestr]   org/apache/hadoop/fs/FileSystem.java:1193:in `get'
>[jtestr]   org/apache/hadoop/fs/FileSystem.java:150:in `get'
>[jtestr]   org/apache/hadoop/fs/FileSystem.java:124:in  
> `getNamed'
>[jtestr]   org/apache/hadoop/fs/FileSystem.java:96:in `get'
>
> I saw an archived message that suggested this was a problem with  
> the classpath, but there was no resolution that I saw. Does anyone  
> have any ideas why this might be occurring?
>
> Thanks,
>
> Bryan


Re: HDFS corrupt...how to proceed?

2008-05-12 Thread lohit
I would suggest you run fsck with all options
hadoop fsck / -files -blocks -locations 
This will give you details of blocks which are missing and which files they 
belong to. fsck output depends on the current state of the namenode and its 
knowledge about the blocks. The two output differ suggests that namenode state 
has been updated, meaning blocks which were missing earlier might be reported 
now. Check with full options and see which blocks from which files are missing. 
Thanks,
Lohit

- Original Message 
From: C G <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Sunday, May 11, 2008 9:55:40 PM
Subject: Re: HDFS corrupt...how to proceed?

The system hosting the namenode experienced an OS panic and shut down, we 
subsequently rebooted it.  Currently we don't believe there is/was a bad disk 
or other hardware problem.
  
  Something interesting:  I've ran fsck twice, the first time it gave the 
result I posted.  The second time I still declared the FS to be corrupt, but 
said:
  [many rows of periods deleted]
  ..Status: CORRUPT
Total size:4900076384766 B
Total blocks:  994492 (avg. block size 4927215 B)
Total dirs:47404
Total files:   952310
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Target replication factor: 3
Real replication factor:   3.0
  
The filesystem under path '/' is CORRUPT

  So it seems like it's fixing some problems on its own?
  
  Thanks,
  C G
  
Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
  Did one datanode fail or did the namenode fail? By "fail" do you mean
that the system was rebooted or was there a bad disk that caused the
problem?

thanks,
dhruba

On Sun, May 11, 2008 at 7:23 PM, C G 
wrote:
> Hi All:
>
> We had a primary node failure over the weekend. When we brought the node back 
> up and I ran Hadoop fsck, I see the file system is corrupt. I'm unsure how 
> best to proceed. Any advice is greatly appreciated. If I've missed a Wiki 
> page or documentation somewhere please feel free to tell me to RTFM and let 
> me know where to look.
>
> Specific question: how to clear under and over replicated files? Is the 
> correct procedure to copy the file locally, delete from HDFS, and then copy 
> back to HDFS?
>
> The fsck output is long, but the final summary is:
>
> Total size: 4899680097382 B
> Total blocks: 994252 (avg. block size 4928006 B)
> Total dirs: 47404
> Total files: 952070
> 
> CORRUPT FILES: 2
> MISSING BLOCKS: 24
> MISSING SIZE: 1501009630 B
> 
> Over-replicated blocks: 1 (1.0057812E-4 %)
> Under-replicated blocks: 14958 (1.5044476 %)
> Target replication factor: 3
> Real replication factor: 2.9849212
>
> The filesystem under path '/' is CORRUPT
>
>
>
> -
> Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now.


  
-
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.


Re: Corrupt HDFS and salvaging data

2008-05-09 Thread lohit
Hi Otis,

Thanks for the reports. Looks like you have lot of blocks with replication 
factor of 1 and when the node which had these blocks was stopped, namenode 
started reporting the block as missing, as it could not find any other replica. 
Here is what I did

Find all blocks with replication factor 1
> grep repl=1 ../tmp.1/fsck-old.txt  | awk '{print $2}'  | sort > repl_1

File all blocks reported MISSIN
> grep MISSIN fsck-1newDN.txt  | grep blk_ | awk '{print $2}' | sort > 
> missing_new

diff to see 
>diff repl_1 missing_new | grep ">"

As you can see all missing blocks had replication factor of 1. The report does 
not show locations, You could check for location and make sure all of them were 
from same datanode.
So, this should confirm why even after you adding a new data node, cluster is 
not healthy. If you had replication factor of atleast 2 of these files, the 
under replicated block would have been added to new datanode.

You can set replication factor of a file using 'hadoop dfs -setrep" command.

Thanks,
Lohit

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, May 9, 2008 7:16:42 AM
Subject: Re: Corrupt HDFS and salvaging data

Hi,

Here are 2 "bin/hadoop fsck / -files -blocks locations" reports:

1) For the old HDFS cluster, reportedly HEALTHY, but with this inconsistency:

http://www.krumpir.com/fsck-old.txt.zip   ( < 1MB)

Total blocks:  32264 (avg. block size 11591245 B)
Minimally replicated blocks:   32264 (100.0 %) <== looks GOOD, matches 
"Total blocks"
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3   <== should 
have 3 copies of each block
Average block replication: 2.418051<== ???  shouldn't 
this be 3??
Missing replicas:  0 (0.0 %)<==
if the above is 2.41... how can I have 0 missing replicas?

2) For the cluster with 1 old DN replaced with 1 new DN:

http://www.krumpir.com/fsck-1newDN.txt.zip ( < 800KB)

Minimally replicated blocks:   29917 (92.72564 %)
Over-replicated blocks:0 (0.0 %)
Under-replicated blocks:   17124 (53.074635 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor:3
Average block replication: 1.8145611
Missing replicas:  17124 (29.249296 %)



Any help would be appreciated.

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: lohit <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, May 9, 2008 2:47:39 AM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> When you say all daemons, do you mean the entire cluster, including the 
> namenode?
> >According to your explanation, this means that after I removed 1 DN I 
> >started 
> missing about 30% of the blocks, right?
> No, You would only miss the replica. If all of your blocks have replication 
> factor of 3, then you would miss only one replica which was on this DN.
> 
> It would be good to see full report
> could you run hadoop fsck / -files -blocks -location?
> 
> That would give you much more detailed information. 
> 
> 
> - Original Message 
> From: Otis Gospodnetic 
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 10:54:53 PM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> Lohit,
> 
> 
> I run fsck after I replaced 1 DN (with data on it) with 1 blank DN and 
> started 
> all daemons.
> I see the fsck report does include this:
> Missing replicas:  17025 (29.727087 %)
> 
> According to your explanation, this means that after I removed 1 DN I started 
> missing about 30% of the blocks, right?
> Wouldn't that mean that 30% of all blocks were *only* on the 1 DN that I 
> removed?  But how could that be when I have replication factor of 3?
> 
> If I run bin/hadoop balancer with my old DN back in the cluster (and new DN 
> removed), I do get the happy "The cluster is balanced" response.  So wouldn't 
> that mean that everything is peachy and that if my replication factor is 3 
> then 
> when I remove 1 DN, I should have only some portion of blocks 
> under-replicated, 
> but not *completely* missing from HDFS?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> - Original Message 
> > From: lohit 
> > To: core-user@hadoop.apache.org
> > Sent: Friday, May 9, 2008 1:33:56 AM
> > Subject: Re: Corrupt HDFS and salvaging data
> > 
> > Hi Otis,
> > 
> > Namenode has location information a

Re: Corrupt HDFS and salvaging data

2008-05-08 Thread lohit
When you say all daemons, do you mean the entire cluster, including the 
namenode?
>According to your explanation, this means that after I removed 1 DN I started 
>missing about 30% of the blocks, right?
No, You would only miss the replica. If all of your blocks have replication 
factor of 3, then you would miss only one replica which was on this DN.

It would be good to see full report
could you run hadoop fsck / -files -blocks -location?

That would give you much more detailed information. 


- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, May 8, 2008 10:54:53 PM
Subject: Re: Corrupt HDFS and salvaging data

Lohit,


I run fsck after I replaced 1 DN (with data on it) with 1 blank DN and started 
all daemons.
I see the fsck report does include this:
Missing replicas:  17025 (29.727087 %)

According to your explanation, this means that after I removed 1 DN I started 
missing about 30% of the blocks, right?
Wouldn't that mean that 30% of all blocks were *only* on the 1 DN that I 
removed?  But how could that be when I have replication factor of 3?

If I run bin/hadoop balancer with my old DN back in the cluster (and new DN 
removed), I do get the happy "The cluster is balanced" response.  So wouldn't 
that mean that everything is peachy and that if my replication factor is 3 then 
when I remove 1 DN, I should have only some portion of blocks under-replicated, 
but not *completely* missing from HDFS?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: lohit <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, May 9, 2008 1:33:56 AM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> Hi Otis,
> 
> Namenode has location information about all replicas of a block. When you run 
> fsck, namenode checks for those replicas. If all replicas are missing, then 
> fsck 
> reports the block as missing. Otherwise they are added to under replicated 
> blocks. If you specify -move or -delete option along with fsck, files with 
> such 
> missing blocks are moved to /lost+found or deleted depending on the option. 
> At what point did you run the fsck command, was it after the datanodes were 
> stopped? When you run namenode -format it would delete directories specified 
> in 
> dfs.name.dir. If directory exists it would ask for confirmation. 
> 
> Thanks,
> Lohit
> 
> - Original Message 
> From: Otis Gospodnetic 
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 9:00:34 PM
> Subject: Re: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> Update:
> It seems fsck reports HDFS is corrupt when a significant-enough number of 
> block 
> replicas is missing (or something like that).
> fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I 
> restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt HDFS 
> and started reporting *healthy* HDFS.
> 
> 
> I'll follow-up with re-balancing question in a separate email.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> - Original Message 
> > From: Otis Gospodnetic 
> > To: core-user@hadoop.apache.org
> > Sent: Thursday, May 8, 2008 11:35:01 PM
> > Subject: Corrupt HDFS and salvaging data
> > 
> > Hi,
> > 
> > I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm 
> > trying 
> > not to lose the precious data in it.  I accidentally run bin/hadoop 
> > namenode 
> > -format on a *new DN* that I just added to the cluster.  Is it possible for 
> that 
> > to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
> because 
> > bin/stop-all.sh didn't stop them for some reason (it always did so before).
> > 
> > Is there any way to salvage the data?  I have a 4-node cluster with 
> replication 
> > factor of 3, though fsck reports lots of under-replicated blocks:
> > 
> >   
> >   CORRUPT FILES:3355
> >   MISSING BLOCKS:   3462
> >   MISSING SIZE: 17708821225 B
> >   
> > Minimally replicated blocks:   28802 (89.269775 %)
> > Over-replicated blocks:0 (0.0 %)
> > Under-replicated blocks:   17025 (52.76779 %)
> > Mis-replicated blocks: 0 (0.0 %)
> > Default replication factor:3
> > Average block replication: 1.7750744
> > Missing replicas:  17025 (29.727087 %)
> > Number of data-nodes:  4
> > Number of racks:   1
> > 
> > 
> > The filesystem under path '/' is CORRUPT
> > 
> > 
> > What can one do at this point to save the data?  If I run bin/hadoop fsck 
> -move 
> > or -delete will I lose some of the data?  Or will I simply end up with 
> > fewer 
> > block replicas and will thus have to force re-balancing in order to get 
> > back 
> to 
> > a "safe" number of replicas?
> > 
> > Thanks,
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Re: Corrupt HDFS and salvaging data

2008-05-08 Thread lohit
Hi Otis,

Namenode has location information about all replicas of a block. When you run 
fsck, namenode checks for those replicas. If all replicas are missing, then 
fsck reports the block as missing. Otherwise they are added to under replicated 
blocks. If you specify -move or -delete option along with fsck, files with such 
missing blocks are moved to /lost+found or deleted depending on the option. 
At what point did you run the fsck command, was it after the datanodes were 
stopped? When you run namenode -format it would delete directories specified in 
dfs.name.dir. If directory exists it would ask for confirmation. 

Thanks,
Lohit

- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, May 8, 2008 9:00:34 PM
Subject: Re: Corrupt HDFS and salvaging data

Hi,

Update:
It seems fsck reports HDFS is corrupt when a significant-enough number of block 
replicas is missing (or something like that).
fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN.  After I 
restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt HDFS 
and started reporting *healthy* HDFS.


I'll follow-up with re-balancing question in a separate email.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Otis Gospodnetic <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Thursday, May 8, 2008 11:35:01 PM
> Subject: Corrupt HDFS and salvaging data
> 
> Hi,
> 
> I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm trying 
> not to lose the precious data in it.  I accidentally run bin/hadoop namenode 
> -format on a *new DN* that I just added to the cluster.  Is it possible for 
> that 
> to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
> because 
> bin/stop-all.sh didn't stop them for some reason (it always did so before).
> 
> Is there any way to salvage the data?  I have a 4-node cluster with 
> replication 
> factor of 3, though fsck reports lots of under-replicated blocks:
> 
>   
>   CORRUPT FILES:3355
>   MISSING BLOCKS:   3462
>   MISSING SIZE: 17708821225 B
>   
> Minimally replicated blocks:   28802 (89.269775 %)
> Over-replicated blocks:0 (0.0 %)
> Under-replicated blocks:   17025 (52.76779 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor:3
> Average block replication: 1.7750744
> Missing replicas:  17025 (29.727087 %)
> Number of data-nodes:  4
> Number of racks:   1
> 
> 
> The filesystem under path '/' is CORRUPT
> 
> 
> What can one do at this point to save the data?  If I run bin/hadoop fsck 
> -move 
> or -delete will I lose some of the data?  Or will I simply end up with fewer 
> block replicas and will thus have to force re-balancing in order to get back 
> to 
> a "safe" number of replicas?
> 
> Thanks,
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


Re: hadoop and deprecation

2008-04-24 Thread lohit
If a method is deprecated in version 0.14, it could be removed in version 0.15 
at the earliest. Might be removed anytime starting 0.15.

- Original Message 
From: Karl Wettin <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Thursday, April 24, 2008 4:07:48 AM
Subject: hadoop and deprecation

When is depricated methods removed from the API? At new every minor?


 karl





Re: Run DfsShell command after your job is complete?

2008-04-21 Thread lohit
Yes FsShell.java implements most of the Shell commands. You could also use the 
FileSystem API
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/fs/FileSystem.html
Simple example http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample

Thanks,
Lohit

- Original Message 
From: Kayla Jay <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Monday, April 21, 2008 7:40:52 PM
Subject: Run DfsShell command after your job is complete?

Hello -

Is there any way to run a DfsShell command after your job is complete within 
that same job run/main class?

I.e after you're done with the maps and the reduces i want to directly move out 
of hdfs into local file system to load data into a database.

can you run a DfsShell within your job class or within your java class that 
runs the job?

thanks


   
-
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.




Re: datanode files list

2008-04-21 Thread lohit
> > Right now I'm calling the NameNode in order to get the list of all the files
> > in the cluster. For each file I check if it is a local file (one of the
> > locations is the host of the node), if it is I read it.

Instead of all datanodes trying to get list of locations, you could use 
FileSystem.getFileBLockLocations() from one client and request the datanode(s) 
returned by the API to read the local file. You would end up doing what a 
JobTracker would do. What you seem to request is a reverse mapping of the 
function, that given a block get the filename. Only namenode holds this mapping 
information and I doubt if there is such an API. 
Thanks,
Lohit

- Original Message 
From: Shimi K <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Monday, April 21, 2008 10:49:17 AM
Subject: Re: datanode files list

Do you remember the "Caching frequently map input files" thread?
http://mail-archives.apache.org/mod_mbox/hadoop-core-user/200802.mbox/[EMAIL 
PROTECTED]


On Mon, Apr 21, 2008 at 8:31 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
> This is kind of odd that you are doing this.  It really sounds like a
> replication of what hadoop is doing.
>
> Why not just run a map process and have hadoop figure out which blocks are
> where?
>
> Can you say more about *why* you are doing this, not just what you are
> trying to do?
>
> On 4/21/08 10:28 AM, "Shimi K" <[EMAIL PROTECTED]> wrote:
>
> > I am using Hadoop HDFS as a distributed file system. On each DFS node I
> have
> > another process which needs to read the local HDFS files.
> > Right now I'm calling the NameNode in order to get the list of all the
> files
> > in the cluster. For each file I check if it is a local file (one of the
> > locations is the host of the node), if it is I read it.
> > Disadvantages:
> > * This solution works only if the entire file is not split.
> > * It involves the NameNode.
> > * Each node needs to iterate on all the files in the cluster.
> >
> > There must be a better way to do it. The perfect way will be to call the
> > DataNode and to get a list of the local files and their blocks.
> >
> > On Mon, Apr 21, 2008 at 7:18 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:
> >
> >>
> >> Datanodes don't necessarily contain complete files.  It is possible to
> >> enumerate all files and to find out which datanodes host different
> blocks
> >> from these files.
> >>
> >> What did you need to do?
> >>
> >>
> >> On 4/21/08 2:11 AM, "Shimi K" <[EMAIL PROTECTED]> wrote:
> >>
> >>> Is there a way to get the list of files on each datanode?
> >>> I need to be able to get all the names of the files on a specific
> >> datanode?
> >>> is there a way to do it?
> >>
> >>
>
>





  1   2   >