Hi all,
Why isn't the dfs.safemode.threshold.pct 1 by default?
When dfs.replication.min=1 with dfs.safemode.threshold.pct=0.999,
there might be chances for a NameNode to check in with incomplete data
in its file system. Am I right? Is it permissible? Or is it assuming that
replication would be
After looking at the HBaseRegionServer and its functionality, I began
wondering if there is a more general use case for memory caching of
HDFS blocks/files. In many use cases people wish to store data on
Hadoop indefinitely, however the last day,last week, last month, data
is probably the most
Yes, it is mostly geared towards replication greater than 1. One of the
reasons for waiting for this threshold is to avoid HDFS starting unnecessary
replications of blocks at the start up when some of the datanodes are slower
to start up.
When the replication is 1, you don't have that issue. A
'lo all,
We're starting a Boston Hadoop Meetup (finally ;-) -- first meeting
will be on Wednesday, October 28th, 7 pm, at the HubSpot offices:
http://www.meetup.com/bostonhadoop/
(HubSpot is at 1 Broadway, Cambridge on the fifth floor. There Will
Be Food.)
I'm stealing the organizing
hi all,
What would you consider the state of the art for WebDAV integration with
HDFS? I'm having trouble discerning the functionality that aligns with
each patch on HDFS-225 (https://issues.apache.org/jira/browse/HDFS-225)
. I've read some patches do not support write operations. Not sure if
Hey-
I'm trying to update a custom recordreader written for 0.18.3 and was wondering
if either
A) Anyone has any example code for extending RecordReader in 0.20.1 (in the
mapreduce package, not the mapred interface)?
or
B) Anyone can give me tips on how to write getCurrentKey() and
Edward,
Interesting concept. I imagine that implementing CachedInputFormat over
something like memcached would make for the most straightforward
implementation. You could store 64MB chunks in memcached and try to retrieve
them from there, falling back to the filesystem on failure. One obvious
Map tasks are generated based on InputSplits. An InputSplit is a logical
description of the work that a task should use. The array of InputSplit
objects is created on the client by the InputFormat.
org.apache.hadoop.mapreduce.InputSplit has an abstract method:
/**
* Get the list of nodes by
Thank you, Raghu.
Then, when the percentage is below 0.999, how can you tell
if some datanodes are just slower than others or some of the data blocks are
lost?
I think percentage 1 should have speacial meaning like
it guarantees integrity of data in HDFS.
If it's below 1, then the integrity is
I am wondering how to read a block of data in Map. I have a file with a
single number on every line and I wish to calculate some statistics.
Once the file is divided into blocks and sent to different nodes by hadoop,
is it possible to read a chunk of the data in each map function? Right now
each
I am not sure what the real concern is... You can set it to 1.0 (or even 1.1
:)) if you prefer. Many admins do.
Raghu.
On Tue, Oct 6, 2009 at 5:20 PM, Manhee Jo j...@nttdocomo.com wrote:
Thank you, Raghu.
Then, when the percentage is below 0.999, how can you tell
if some datanodes are just
Now it's clear. Thank you, Raghu.
But if you set it to 1.1, the safemode is permanent :).
Thanks,
Manhee
- Original Message -
From: Raghu Angadi rang...@apache.org
To: common-user@hadoop.apache.org
Sent: Wednesday, October 07, 2009 10:03 AM
Subject: Re: A question on
12 matches
Mail list logo