RE: Reader/Writer problem in HDFS

2011-07-28 Thread Laxman
One approach can be use some .tmp extension while writing. Once the write
is completed rename back to original file name. Also, reducer has to filter
out .tmp files.

This will ensure reducer will not pickup the partial files.

We do have the similar scenario where the a/m approach resolved the issue.

-Original Message-
From: Meghana [mailto:meghana.mara...@germinait.com] 
Sent: Thursday, July 28, 2011 1:38 PM
To: common-user; hdfs-u...@hadoop.apache.org
Subject: Reader/Writer problem in HDFS

Hi,

We have a job where the map tasks are given the path to an output folder.
Each map task writes a single file to that folder. There is no reduce phase.
There is another thread, which constantly looks for new files in the output
folder. If found, it persists the contents to index, and deletes the file.

We use this code in the map task:
try {
OutputStream oStream = fileSystem.create(path);
IOUtils.write(xyz, oStream);
} finally {
IOUtils.closeQuietly(oStream);
}

The problem: Some times the reader thread sees  tries to read a file which
is not yet fully written to HDFS (or the checksum is not written yet, etc),
and throws an error. Is it possible to write an HDFS file in such a way that
it won't be visible until it is fully written?

We use Hadoop 0.20.203.

Thanks,

Meghana



RE: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Laxman
Start the namenode[set fs.default.name to hdfs://192.168.1.101:9000] and
check your netstat report [netstat -nlp] to check which port and IP it is
binding. Ideally, 9000 should be bound to 192.168.1.101. If yes, configure
the same IP in slaves as well. Otw, we may need to revisit your configs
once. 

To use the hostname, you should have hostname-IP mapping in /etc/hosts file
in master as well as slaves.

-Original Message-
From: Doan Ninh [mailto:uitnetw...@gmail.com] 
Sent: Thursday, July 28, 2011 6:45 PM
To: common-user@hadoop.apache.org
Subject: Re: Error in 9000 and 9001 port in hadoop-0.20.2

I changed fs.default.name to hdfs://192.168.1.101:9000. But, the same error
as before.
I need a help

On Thu, Jul 28, 2011 at 7:45 PM, Nitin Khandelwal 
nitin.khandel...@germinait.com wrote:

 Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000
 Thanks,
 Nitin

 On 28 July 2011 17:46, Doan Ninh uitnetw...@gmail.com wrote:

  In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
  That is the hostname of the master node.
  But, the same error occurs
  How can i fix it?
 
  On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak phatak@gmail.com
  wrote:
 
   I had issue using IP address in XML files . You can try to use host
 names
   in
   the place of IP address .
  
   On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh uitnetw...@gmail.com
 wrote:
  
Hi,
   
I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
On the master node (192.168.1.101), I configure fs.default.name =
   hdfs://
127.0.0.1:9000. Then i configure everything on 3 other node
When i start the cluster by entering $HADOOP_HOME/bin/start-all.sh
 on
   the
master node
Everything is ok, but the slave can't connect to the master on 9000,
  9001
port.
I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
connection refused
Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000.
 The
result is connected.
But, on the master node, i telnet to 192.168.1.101:9000 =
 Connection
Refused
   
Can somebody help me?
   
  
 



 --


 Nitin Khandelwal