date:20080327

Re: Performance / cluster scaling question

2008-03-27 Thread Chris K Wensel

FYI, Just ran a 50 node cluster using one of the new kernels for  
Fedora with all nodes forced onto the same 'availability zone' and  
there were no timeouts or failed writes.


On Mar 27, 2008, at 4:16 PM, Chris K Wensel wrote:
If it's any consolation, I'm seeing similar behaviors on 0.16.0 when  
running on EC2 when I push the number of nodes in the cluster past 40.


On Mar 24, 2008, at 6:31 AM, André Martin wrote:

Thanks for the clarification, dhruba :-)
Anyway, what can cause those other exceptions such as  "Could not  
get block locations" and "DataXceiver: java.io.EOFException"? Can  
anyone give me a little more insight about those exceptions?
And does anyone have a similar workload (frequent writes and  
deletion of small files), and what could cause the performance  
degradation (see first post)?  I think HDFS should be able to  
handle two million and more files/blocks...
Also, I observed that some of my datanodes do not "heartbeat" to  
the namenode for several seconds (up to 400 :-() from time to time  
- when I check those specific datanodes and do a "top", I see the  
"du" command running that seems to got stuck?!?

Thanks and Happy Easter :-)

Cu on the 'net,
 Bye - bye,

< André   èrbnA >

dhruba Borthakur wrote:

The namenode lazily instructs a Datanode to delete blocks. As a  
response to every heartbeat from a Datanode, the Namenode  
instructs it to delete a maximum on 100 blocks. Typically, the  
heartbeat periodicity is 3 seconds. The heartbeat thread in the  
Datanode deletes the block files synchronously before it can send  
the next heartbeat. That's the reason a small number (like 100)  
was chosen.


If you have 8 datanodes, your system will probably delete about  
800 blocks every 3 seconds.


Thanks,
dhruba

-Original Message-
From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday,  
March 21, 2008 3:06 PM

To: core-user@hadoop.apache.org
Subject: Re: Performance / cluster scaling question

After waiting a few hours (without having any load), the block  
number and "DFS Used" space seems to go down...
My question is: is the hardware simply too weak/slow to send the  
block deletion request to the datanodes in a timely manner, or do  
simply those "crappy" HDDs cause the delay, since I noticed that I  
can take up to 40 minutes when deleting ~400.000 files at once  
manually using "rm -r"...
Actually - my main concern is why the performance à la the  
throughput goes down - any ideas?




Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/





Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: Reduce Hangs

2008-03-27 Thread Mafish Liu

On Fri, Mar 28, 2008 at 12:31 AM, 朱盛凯 <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I met this problem in my cluster before, I think I can share with you some
> of my experience.
> But it may not work in you case.
>
> The job in my cluster always hung at 16% of reduce. It occured because the
> reduce task could not fetch the
> map output from other nodes.
>
> In my case, two factors may result in this faliure of communication
> between
> two task trackers.
>
> One is the firewall block the trackers from communications. I solved this
> by
> disabling the firewall.
> The other factor is that trackers refer to other nodes by host name only,
> but not ip address. I solved this by editing the file /etc/hosts
> with mapping from hostname to ip address of all nodes in cluster.


I meet this problem with the same reason too.
Try to host names to all your /etc/hosts files .

>
>
> I hope my experience will be helpful for you.
>
> On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> > I have small Hadoop cluster, one master and three slaves.
> > When I try the example wordcount on one of our log file (size ~350 MB)
> >
> > Map runs fine but reduce always hangs (sometime around 19%,60% ...)
> after
> > very long time it finishes.
> > I am seeing this error
> > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
> > In the log I am seeing this
> > INFO org.apache.hadoop.mapred.TaskTracker:
> > task_200803261535_0001_r_00_0 0.1834% reduce > copy (11 of 20 at
> > 0.02 MB/s) >
> >
> > Do you know what might be the problem.
> > Thanks,
> > Senthil
> >
> >
>



-- 
[EMAIL PROTECTED]
Institute of Computing Technology, Chinese Academy of Sciences, Beijing.

Re: Problems with 0.16.1

2008-03-27 Thread Chris K Wensel


never mind. i found archive.apache.org.

On Mar 27, 2008, at 6:00 PM, Chris K Wensel wrote:

is this still true?

if so, can we restore 0.16.0 at http://www.apache.org/dist/hadoop/core/ 
 ?


just realized it was missing as I am rebuilding my ec2 ami's.

On Mar 17, 2008, at 8:43 PM, Owen O'Malley wrote:
We believe that there has been a regression in release 0.16.1 with  
respect to the reliability of HDFS. In particular, tasks get stuck  
talking to the data nodes when they are under load. It seems to be  
HADOOP-3033. We are currently testing it further. In the mean time,  
I would suggest holding off 0.16.1.


-- Owen


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/






Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: Problems with 0.16.1

2008-03-27 Thread Chris K Wensel


is this still true?

if so, can we restore 0.16.0 at http://www.apache.org/dist/hadoop/ 
core/ ?


just realized it was missing as I am rebuilding my ec2 ami's.

On Mar 17, 2008, at 8:43 PM, Owen O'Malley wrote:
We believe that there has been a regression in release 0.16.1 with  
respect to the reliability of HDFS. In particular, tasks get stuck  
talking to the data nodes when they are under load. It seems to be  
HADOOP-3033. We are currently testing it further. In the mean time,  
I would suggest holding off 0.16.1.


-- Owen


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: Performance / cluster scaling question

2008-03-27 Thread Chris K Wensel

If it's any consolation, I'm seeing similar behaviors on 0.16.0 when  
running on EC2 when I push the number of nodes in the cluster past 40.


On Mar 24, 2008, at 6:31 AM, André Martin wrote:

Thanks for the clarification, dhruba :-)
Anyway, what can cause those other exceptions such as  "Could not  
get block locations" and "DataXceiver: java.io.EOFException"? Can  
anyone give me a little more insight about those exceptions?
And does anyone have a similar workload (frequent writes and  
deletion of small files), and what could cause the performance  
degradation (see first post)?  I think HDFS should be able to handle  
two million and more files/blocks...
Also, I observed that some of my datanodes do not "heartbeat" to the  
namenode for several seconds (up to 400 :-() from time to time -  
when I check those specific datanodes and do a "top", I see the "du"  
command running that seems to got stuck?!?

Thanks and Happy Easter :-)

Cu on the 'net,
  Bye - bye,

 < André   èrbnA >

dhruba Borthakur wrote:

The namenode lazily instructs a Datanode to delete blocks. As a  
response to every heartbeat from a Datanode, the Namenode instructs  
it to delete a maximum on 100 blocks. Typically, the heartbeat  
periodicity is 3 seconds. The heartbeat thread in the Datanode  
deletes the block files synchronously before it can send the next  
heartbeat. That's the reason a small number (like 100) was chosen.


If you have 8 datanodes, your system will probably delete about 800  
blocks every 3 seconds.


Thanks,
dhruba

-Original Message-
From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday, March  
21, 2008 3:06 PM

To: core-user@hadoop.apache.org
Subject: Re: Performance / cluster scaling question

After waiting a few hours (without having any load), the block  
number and "DFS Used" space seems to go down...
My question is: is the hardware simply too weak/slow to send the  
block deletion request to the datanodes in a timely manner, or do  
simply those "crappy" HDDs cause the delay, since I noticed that I  
can take up to 40 minutes when deleting ~400.000 files at once  
manually using "rm -r"...
Actually - my main concern is why the performance à la the  
throughput goes down - any ideas?




Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/

RE: nfs mount hadoop-site?

2008-03-27 Thread Xavier Stevens

Shouldn't be any issues with this.  It's similar to the setup we have
right now.

-Xavier 

-Original Message-
From: Colin Freas
Sent: Thursday, March 27, 2008 12:05 PM
To: Hadoop
Subject: nfs mount hadoop-site?

are there any issues with having the hadoop-site.xml in .../conf placed
on an nfs mounted dir that all my nodes have access to?

-colin

Re: Using HDFS as native storage

2008-03-27 Thread Ted Dunning


We looked seriously at HDFS and MogileFS and considered (and instantly
rejected a *bunch* of others).

HDFS was eliminated based on number of files, lack of HA and lack of
reference implementations serving large scale web sites directly from it.

Mogile had HA (using crude tools), reference implementations and could
obviously be scaled to pretty large levels.  It also had an implementation
that was simple enough for our guys to fix the defects.

At that point, the time window for further evaluation closed and we had to
make a decision.   


On 3/27/08 11:30 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:

> 
> might be off-topic but how would you compare GlusterFS to HDFS and
> MogileFS for such an application? Did you look at that at all and
> decided against it?
> 
> 
> Ted Dunning wrote:
>> We evaluated several options for just this problem and eventually settled on
>> MogileFS.  That said, Mogile needed several weeks of work to get it ready
>> for prime time.  It will work pretty well for modest sized collections, but
>> for our stuff (many hundreds of millions of files, approaching PB of
>> storage), it just wasn't ready.  The fixes had to do with sharding the name
>> database across many mySQL instances and improving the handling of storage
>> system up-state.
>> 
>> 
>> On 3/27/08 2:13 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:
>> 
>>> Hi,
>>> 
>>> we're looking for options for creating a scalable storage solution based
>>> on commodity hardware for media files (spacewise dominated video files
>>> of a few hundred MB but also to store up to a few million smaller files
>>> such as thumbnails). The system will start with a few TB and should be
>>> able to scale to about a PB.
>>> 
>>> Is anyone using HDFS for native storage for critical files or is it just
>>> common to use HDFS for large amounts of temporary more or less
>>> non-critical data? What would be the trade-offs to decide whether to use
>>> HDFS or something like GlusterFS? Note that we'r ecurrently not planning
>>> on using MapReduce.
>>> 
>>> Thanks in advance,
>>> 
>>> Robert
>>> 
>> 
>

[Map/Reduce][HDFS]

2008-03-27 Thread Jean-Pierre

Hello,

I'm working on large amount of logs, and I've noticed that the
distribution of data on the network (./hadoop dfs -put input input)
takes a lot of time.

Let's says that my data is already distributed among the network, is
there anyway to say to hadoop to use the already existing
distribution ?.

Thanks

-- 
Jean-Pierre <[EMAIL PROTECTED]>

nfs mount hadoop-site?

2008-03-27 Thread Colin Freas

are there any issues with having the hadoop-site.xml in .../conf placed on
an nfs mounted dir that all my nodes have access to?

-colin

Re: Do multiple small files share one block?

2008-03-27 Thread Doug Cutting


Robert Krüger wrote:
this seems like an FAQ but I didn't explicitly see it in the docs: Is 
the minmium size a file occupies on HDFS controlled by the block size, 
i.e. would using the default block size of 64 B result in consumption of 
64 MB if I stored a file of 1 byte?


No.  The last block in a file is only be as long as it needs to be.

Doug

Do multiple small files share one block?

2008-03-27 Thread Robert Krüger



Hi,

this seems like an FAQ but I didn't explicitly see it in the docs: Is 
the minmium size a file occupies on HDFS controlled by the block size, 
i.e. would using the default block size of 64 B result in consumption of 
64 MB if I stored a file of 1 byte? I would assume yes based on the fact 
that HDFS was primarily developed for the distribution of large files 
and sentences like "Internally, a file is split into one or more blocks 
and these blocks are stored in a set of Datanodes" seem to imply it but 
I can't find a definite answer. If so and the default block size is 64MB 
it feels like abusing HDFS if we set the block size to 10k just because 
a few hundred thousand files of ours are only a few K in size but I 
should probably run some benchmarks.


Thanks in advance,

Robert

Re: Using HDFS as native storage

2008-03-27 Thread Robert Krüger



might be off-topic but how would you compare GlusterFS to HDFS and 
MogileFS for such an application? Did you look at that at all and 
decided against it?



Ted Dunning wrote:

We evaluated several options for just this problem and eventually settled on
MogileFS.  That said, Mogile needed several weeks of work to get it ready
for prime time.  It will work pretty well for modest sized collections, but
for our stuff (many hundreds of millions of files, approaching PB of
storage), it just wasn't ready.  The fixes had to do with sharding the name
database across many mySQL instances and improving the handling of storage
system up-state.


On 3/27/08 2:13 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:


Hi,

we're looking for options for creating a scalable storage solution based
on commodity hardware for media files (spacewise dominated video files
of a few hundred MB but also to store up to a few million smaller files
such as thumbnails). The system will start with a few TB and should be
able to scale to about a PB.

Is anyone using HDFS for native storage for critical files or is it just
common to use HDFS for large amounts of temporary more or less
non-critical data? What would be the trade-offs to decide whether to use
HDFS or something like GlusterFS? Note that we'r ecurrently not planning
on using MapReduce.

Thanks in advance,

Robert






--
(-) Robert Krüger
(-) SIGNAL 7 Gesellschaft für Informationstechnologie mbH
(-) Landwehrstraße 4 - 64293 Darmstadt,
(-) Tel: +49 (0) 6151 969 96 11, Fax: +49 (0) 6151 969 96 29
(-) [EMAIL PROTECTED], www.signal7.de
(-) Amtsgericht Darmstadt, HRB 6833
(-) Geschäftsführer: Robert Krüger, Frank Peters, Jochen Strunk

Re: Using HDFS as native storage

2008-03-27 Thread Ted Dunning

We evaluated several options for just this problem and eventually settled on
MogileFS.  That said, Mogile needed several weeks of work to get it ready
for prime time.  It will work pretty well for modest sized collections, but
for our stuff (many hundreds of millions of files, approaching PB of
storage), it just wasn't ready.  The fixes had to do with sharding the name
database across many mySQL instances and improving the handling of storage
system up-state.

On 3/27/08 2:13 AM, "Robert Krüger" <[EMAIL PROTECTED]> wrote:

> 
> Hi,
> 
> we're looking for options for creating a scalable storage solution based
> on commodity hardware for media files (spacewise dominated video files
> of a few hundred MB but also to store up to a few million smaller files
> such as thumbnails). The system will start with a few TB and should be
> able to scale to about a PB.
> 
> Is anyone using HDFS for native storage for critical files or is it just
> common to use HDFS for large amounts of temporary more or less
> non-critical data? What would be the trade-offs to decide whether to use
> HDFS or something like GlusterFS? Note that we'r ecurrently not planning
> on using MapReduce.
> 
> Thanks in advance,
> 
> Robert
>

Re: Append data in hdfs_write

2008-03-27 Thread Ted Dunning

Yes.

 The present work-arounds for this are pretty complicated.

option1) you can write small files relatively frequently and every time you
write some number of them, you can concatenate them and delete them.  These
concatenations can receive the same treatment.  If managed carefully in
conjunction with a safe status update mechanism like zookeeper, you can have
a pretty robust system that reflects new data with fairly low latency (on
the order of seconds behind).

option2) you can accumulate data in a non-HDFS location until it is big
enough to push to HDFS.  This can be done in conjunction with option1.  The
danger is that you run the risk of losing data if the accumulator fails
before burping data to HDFS.  This is very commonly used for log files that
are consolidated at the hourly level and transferred to HDFS.

On 3/27/08 12:02 AM, "Raghavendra K" <[EMAIL PROTECTED]> wrote:

> Hi,
> Thanks for the reply.
> Does this mean that once I close a file, I can open it only for reading?
> And if I reopen the same file to write any data then the old data will be
> lost and again its as good as a new file being created with the same name?
> 
> On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <[EMAIL PROTECTED]>
> wrote:
> 
>> HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
>> for more details.
>> 
>> Thanks,
>> dhruba
>> 
>> -Original Message-
>> From: Raghavendra K [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, March 26, 2008 11:29 PM
>> To: core-user@hadoop.apache.org
>> Subject: Append data in hdfs_write
>> 
>> Hi,
>>  I am using
>> hdfsWrite to write data onto a file.
>> Whenever I close the file and re open it for writing it will start
>> writing
>> from the position 0 (rewriting the old data).
>> Is there any way to append data onto a file using hdfsWrite.
>> I cannot use hdfsTell because it works only when opened in RDONLY mode
>> and
>> also I dont know the number of bytes written onto the file previously.
>> Please throw some light onto it.
>> 
>> --
>> Regards,
>> Raghavendra K
>> 
> 
>

Re: Using HDFS as native storage

2008-03-27 Thread Nathan Fiedler

I can't offer any insights into other clustering FS solutions, but I
think it's a very safe bet to say that Google relies entirely on GFS
for their long-term storage. Granted, they almost certainly make
offline backups of business-critical data, but I would assume that
everything related to GMail, Google Code, Picasa, Google Docs, etc. is
stored in, and only in, one or more massive GFS clusters. Take for
instance their pride in the fact that they (claim) to have lost only
one 64MB block in the history of their modern infrastructure (that is,
since 2004).

Look at it another way: how would you backup petabytes of data? When
you've got multiple data centers consisting of thousands of nodes, and
every data block is replicated on at least three machines, what's the
point of backups?

Again, I'm no expert, I'm just basing this on everything I've read and
watched about Google. Hopefully others will have more enlightened
comments.

n

Re: Reduce Hangs

2008-03-27 Thread 朱盛凯

Hi,

I met this problem in my cluster before, I think I can share with you some
of my experience.
But it may not work in you case.

The job in my cluster always hung at 16% of reduce. It occured because the
reduce task could not fetch the
map output from other nodes.

In my case, two factors may result in this faliure of communication between
two task trackers.

One is the firewall block the trackers from communications. I solved this by
disabling the firewall.
The other factor is that trackers refer to other nodes by host name only,
but not ip address. I solved this by editing the file /etc/hosts
with mapping from hostname to ip address of all nodes in cluster.

I hope my experience will be helpful for you.

On 3/27/08, Natarajan, Senthil <[EMAIL PROTECTED]> wrote:
>
> Hi,
> I have small Hadoop cluster, one master and three slaves.
> When I try the example wordcount on one of our log file (size ~350 MB)
>
> Map runs fine but reduce always hangs (sometime around 19%,60% ...) after
> very long time it finishes.
> I am seeing this error
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
> In the log I am seeing this
> INFO org.apache.hadoop.mapred.TaskTracker:
> task_200803261535_0001_r_00_0 0.1834% reduce > copy (11 of 20 at
> 0.02 MB/s) >
>
> Do you know what might be the problem.
> Thanks,
> Senthil
>
>

Re: Reduce Hangs

2008-03-27 Thread Amar Kamat

On Thu, 27 Mar 2008, Natarajan, Senthil wrote:

> Hi,
> I have small Hadoop cluster, one master and three slaves.
> When I try the example wordcount on one of our log file (size ~350 MB)
>
> Map runs fine but reduce always hangs (sometime around 19%,60% ...) after 
> very long time it finishes.
> I am seeing this error
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
This error occurs when the reducer fails to fetch map-task-output from 5
unique map tasks. Before considering an attempt as failed the reducer
tries to fetch the map output for 7 times in 5 mins (default config).
In case of the job failure check the following
1. Is this problem common in all the reducers?
2. Are the map tasks same across all the reducers for which the failure is
reported?
3. Is there atleast one map task whose output is successfully fetched?
If the job becomes successful then there might be some problem with the
reducer.
Amar
> In the log I am seeing this
> INFO org.apache.hadoop.mapred.TaskTracker: task_200803261535_0001_r_00_0 
> 0.1834% reduce > copy (11 of 20 at 0.02 MB/s) >
>
> Do you know what might be the problem.
> Thanks,
> Senthil
>
>

Reduce Hangs

2008-03-27 Thread Natarajan, Senthil

Hi,
I have small Hadoop cluster, one master and three slaves.
When I try the example wordcount on one of our log file (size ~350 MB)

Map runs fine but reduce always hangs (sometime around 19%,60% ...) after very 
long time it finishes.
I am seeing this error
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
In the log I am seeing this
INFO org.apache.hadoop.mapred.TaskTracker: task_200803261535_0001_r_00_0 
0.1834% reduce > copy (11 of 20 at 0.02 MB/s) >

Do you know what might be the problem.
Thanks,
Senthil

Re: complete documentation of hadoop-site.xml

2008-03-27 Thread Alfonso Olias Sanz

It is a good place to start
http://wiki.apache.org/hadoop/GettingStartedWithHadoop

Check the articles which descrive how to set up a sigle node and a two
cluster node.
http://wiki.apache.org/hadoop/HadoopArticles
I followed them and it work! ;)

The two cluster node installation also scales to more nodes, just
adding the hostnames to the "slaves" file. But first you should make
it run for a Single host. (ssh related)

A full description of the xml file is provided with the Hadoop installation.
$HADOOP_HOME/conf/hadoop-default.xml

But I shouldn't have a look at this fill until you make it work with 1
and 2 nodes.

On 27/03/2008, John Menzer <[EMAIL PROTECTED]> wrote:
>
>  Hello,
>
>  where can I find a complete list of all possible configuration properties
>  (e.g. in file hadoop-site.xml)?
>  I am experiencing lots of bind errors in my log-files when trying to
>  start-dfs.sh, start-mapred.sh, start-all.sh!
>
>  That's why I think, I have to change some port settings. But I don't know
>  which bind error refers to which port setting and I even don't know which
>  settings are relevant at all.
>
>  Why can't I find a complete documentation or at least a list of these
>  properties???
>
>  Thank you for any help!
>
>  John
>
>
>  --
>  View this message in context: 
> http://www.nabble.com/complete-documentation-of-hadoop-site.xml-tp16323391p16323391.html
>  Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

How to handle bind errors?

2008-03-27 Thread John Menzer


I am experiencing bind errors in my log files when trying to start up the
hadoop cluster.

However, the log files do not really show me which adress/port pair is
responsible for the error.

I tried some property settings in the hadoop-site.xml but actually I don't
know which setting is responsible for what deamon.

I am using hadoop 0.16.1

Is there any systematic way of eliminating port errors?
-- 
View this message in context: 
http://www.nabble.com/How-to-handle-bind-errors--tp16323398p16323398.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

complete documentation of hadoop-site.xml

2008-03-27 Thread John Menzer


Hello,

where can I find a complete list of all possible configuration properties
(e.g. in file hadoop-site.xml)?
I am experiencing lots of bind errors in my log-files when trying to
start-dfs.sh, start-mapred.sh, start-all.sh!

That's why I think, I have to change some port settings. But I don't know
which bind error refers to which port setting and I even don't know which
settings are relevant at all.

Why can't I find a complete documentation or at least a list of these
properties???

Thank you for any help!

John

-- 
View this message in context: 
http://www.nabble.com/complete-documentation-of-hadoop-site.xml-tp16323391p16323391.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Using HDFS as native storage

2008-03-27 Thread Robert Krüger



Hi,

we're looking for options for creating a scalable storage solution based 
on commodity hardware for media files (spacewise dominated video files 
of a few hundred MB but also to store up to a few million smaller files 
such as thumbnails). The system will start with a few TB and should be 
able to scale to about a PB.


Is anyone using HDFS for native storage for critical files or is it just 
common to use HDFS for large amounts of temporary more or less 
non-critical data? What would be the trade-offs to decide whether to use 
HDFS or something like GlusterFS? Note that we'r ecurrently not planning 
on using MapReduce.


Thanks in advance,

Robert

Re: Append data in hdfs_write

2008-03-27 Thread Raghavendra K

Hi,
Thanks for the reply.
Does this mean that once I close a file, I can open it only for reading?
And if I reopen the same file to write any data then the old data will be
lost and again its as good as a new file being created with the same name?

On Thu, Mar 27, 2008 at 12:23 PM, dhruba Borthakur <[EMAIL PROTECTED]>
wrote:

> HDFS files, once closed, cannot be reopened for writing. See HADOOP-1700
> for more details.
>
> Thanks,
> dhruba
>
> -Original Message-
> From: Raghavendra K [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, March 26, 2008 11:29 PM
> To: core-user@hadoop.apache.org
> Subject: Append data in hdfs_write
>
> Hi,
>  I am using
> hdfsWrite to write data onto a file.
> Whenever I close the file and re open it for writing it will start
> writing
> from the position 0 (rewriting the old data).
> Is there any way to append data onto a file using hdfsWrite.
> I cannot use hdfsTell because it works only when opened in RDONLY mode
> and
> also I dont know the number of bytes written onto the file previously.
> Please throw some light onto it.
>
> --
> Regards,
> Raghavendra K
>



-- 
Regards,
Raghavendra K

Re: Performance / cluster scaling question

Re: Reduce Hangs

Re: Problems with 0.16.1

Re: Problems with 0.16.1

Re: Performance / cluster scaling question

RE: nfs mount hadoop-site?

Re: Using HDFS as native storage

[Map/Reduce][HDFS]

nfs mount hadoop-site?

Re: Do multiple small files share one block?

Do multiple small files share one block?

Re: Using HDFS as native storage

Re: Using HDFS as native storage

Re: Append data in hdfs_write

Re: Using HDFS as native storage

Re: Reduce Hangs

Re: Reduce Hangs

Reduce Hangs

Re: complete documentation of hadoop-site.xml

How to handle bind errors?

complete documentation of hadoop-site.xml

Using HDFS as native storage

Re: Append data in hdfs_write

23 matches

Site Navigation

Mail list logo

Footer information