[jira] Commented: (HDFS-1339) NameNodeMetrics should use MetricsTimeVaryingLong

2010-08-11 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897349#action_12897349
 ] 

Hong Tang commented on HDFS-1339:
-

If you only use it to calculate rates of such operations (such as ops/s or 
ops/hour), it probably does not matter whether using long or int.

> NameNodeMetrics should use MetricsTimeVaryingLong 
> --
>
> Key: HDFS-1339
> URL: https://issues.apache.org/jira/browse/HDFS-1339
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Reporter: Scott Chen
>Assignee: Scott Chen
>Priority: Minor
> Attachments: HDFS-1339.txt
>
>
> NameNodeMetrics uses MetricsTimeVaryingInt. We see that FileInfoOps and 
> GetBlockLocations overflow in our cluster.
> Using MetricsTimeVaryingLong will easily solve this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1338) Improve TestDFSIO

2010-08-10 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897066#action_12897066
 ] 

Hong Tang commented on HDFS-1338:
-

More thoughts:
- I'd correct myself that we should launch sufficient writers so that # of 
concurrent IO operations are more than # of physical disks. I'd suggest the 
ratio to be between 1.5 to 2. For instance, if we have 4 disks per node, and 
rep-degree = 3. Then we should launch two DFS writers per node.
- Similar principle should apply to reader side too. We should have sufficient 
map slots to allow the # of readers to be 1.5x to 2x of the # of physical 
drivers. Again, with 4 disks per node, we may need 6 map slots. (And each map 
should read one block of data instead of a whole file.)

> Improve TestDFSIO
> -
>
> Key: HDFS-1338
> URL: https://issues.apache.org/jira/browse/HDFS-1338
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>
> Currently the read test in TestDFSIO benchmark just opens a large side file 
> and measures the read performance. The MR scheduler has no opportunity to do 
> *any* optimization for the TestDFSIO MR application. The side-effect of this 
> is that it is *very* hard to do any meaningful analysis of the results of the 
> benchmark i.e. to check if node-local or rack-local or off-switch read 
> performance improved/degraded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1338) Improve TestDFSIO

2010-08-10 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896997#action_12896997
 ] 

Hong Tang commented on HDFS-1338:
-

I think the goal of TestDFSIO is to benchmark the peak HDFS throughput under 
typical MR usage pattern. This means:
- Files should be replicated.
- Files should be spread to nodes relatively evenly. (Run one map per node on 
the cluster, and writes out data evenly.)
- Locality information should be exposed to the MR framework correctly. (Should 
just use FileInputFormat instead of writing a side file.)
- The amount of dataset should not fit in OS buffer cache. (Configure the 
benchmark such that total amount of data > total RAM).
- Throughput should be aggregated as a time series and we should ignore the 
ramp up and cool down phase of the execution. (Output of each map should be 
time series of counters of bytes read so far. The reporting may calculate the 
max and average of the mid-1/3 of the time series).
- We should minimize the variations of MR scheduling. (Run one wave of maps, 
increase block size so that each map runs in at least 20 to 30 seconds). 

> Improve TestDFSIO
> -
>
> Key: HDFS-1338
> URL: https://issues.apache.org/jira/browse/HDFS-1338
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Arun C Murthy
>
> Currently the read test in TestDFSIO benchmark just opens a large side file 
> and measures the read performance. The MR scheduler has no opportunity to do 
> *any* optimization for the TestDFSIO MR application. The side-effect of this 
> is that it is *very* hard to do any meaningful analysis of the results of the 
> benchmark i.e. to check if node-local or rack-local or off-switch read 
> performance improved/degraded.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1128) Allow TestDFSIO to process multiple files for each map

2010-05-05 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated HDFS-1128:


Attachment: HDFS-1128-yhadoop-20.1xx-2.patch

New patch that addresses Hairong's comments.

> Allow TestDFSIO to process multiple files for each map
> --
>
> Key: HDFS-1128
> URL: https://issues.apache.org/jira/browse/HDFS-1128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hong Tang
> Attachments: HDFS-1128-yhadoop-20.1xx-2.patch, 
> HDFS-1128-yhadoop-20.1xx.patch
>
>
> DFSIO only processes one file in each map, it would be nice to enhance it to 
> process multiple files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1128) Allow TestDFSIO to process multiple files for each map

2010-05-04 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated HDFS-1128:


Attachment: HDFS-1128-yhadoop-20.1xx.patch

patch for yahoo hadoop branch, not to be committed.

> Allow TestDFSIO to process multiple files for each map
> --
>
> Key: HDFS-1128
> URL: https://issues.apache.org/jira/browse/HDFS-1128
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Hong Tang
> Attachments: HDFS-1128-yhadoop-20.1xx.patch
>
>
> DFSIO only processes one file in each map, it would be nice to enhance it to 
> process multiple files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1128) Allow TestDFSIO to process multiple files for each map

2010-05-04 Thread Hong Tang (JIRA)
Allow TestDFSIO to process multiple files for each map
--

 Key: HDFS-1128
 URL: https://issues.apache.org/jira/browse/HDFS-1128
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hong Tang


DFSIO only processes one file in each map, it would be nice to enhance it to 
process multiple files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1061) Memory footprint optimization for INodeFile object.

2010-04-15 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated HDFS-1061:


Status: Patch Available  (was: Open)

Retry hudson per Eli's suggestion.

> Memory footprint optimization for INodeFile object. 
> 
>
> Key: HDFS-1061
> URL: https://issues.apache.org/jira/browse/HDFS-1061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: HDFS-1061.patch
>
>
> I am proposing a footprint optimization to merge blockReplication and 
> preferredBlockSize fields into one 'long header' field in INodeFile class. 
> This saves 8 bytes per INodeFile object on a 64 bit JVM. This memory 
> optimization is transparent and changes are very minimal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HDFS-1061) Memory footprint optimization for INodeFile object.

2010-04-15 Thread Hong Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hong Tang updated HDFS-1061:


Status: Open  (was: Patch Available)

> Memory footprint optimization for INodeFile object. 
> 
>
> Key: HDFS-1061
> URL: https://issues.apache.org/jira/browse/HDFS-1061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Bharath Mundlapudi
>Assignee: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.22.0
>
> Attachments: HDFS-1061.patch
>
>
> I am proposing a footprint optimization to merge blockReplication and 
> preferredBlockSize fields into one 'long header' field in INodeFile class. 
> This saves 8 bytes per INodeFile object on a 64 bit JVM. This memory 
> optimization is transparent and changes are very minimal.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1061) Memory footprint optimization for INodeFile object.

2010-03-22 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848474#action_12848474
 ] 

Hong Tang commented on HDFS-1061:
-

@bharath, what is the average size of an INodeFile object currently?

> Memory footprint optimization for INodeFile object. 
> 
>
> Key: HDFS-1061
> URL: https://issues.apache.org/jira/browse/HDFS-1061
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.22.0
>Reporter: Bharath Mundlapudi
>Priority: Minor
> Fix For: 0.22.0
>
>
> I am proposing a footprint optimization to merge blockReplication and 
> preferredBlockSize fields into one 'long header' field in INodeFile class. 
> This saves 8 bytes per INodeFile object on a 64 bit JVM. This memory 
> optimization is transparent and changes are very minimal.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-985) HDFS should issue multiple RPCs for listing a large directory

2010-02-18 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835337#action_12835337
 ] 

Hong Tang commented on HDFS-985:


+1. In general, we should bound the work (and thus the waiting on the client 
side) of every RPC call.

> HDFS should issue multiple RPCs for listing a large directory
> -
>
> Key: HDFS-985
> URL: https://issues.apache.org/jira/browse/HDFS-985
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.22.0
>
>
> Currently HDFS issues one RPC from the client to the NameNode for listing a 
> directory. However some directories are large that contain thousands or 
> millions of items. Listing such large directories in one RPC has a few 
> shortcomings:
> 1. The list operation holds the global fsnamesystem lock for a long time thus 
> blocking other requests. If a large number (like thousands) of such list 
> requests hit NameNode in a short period of time, NameNode will be 
> significantly slowed down. Users end up noticing longer response time or lost 
> connections to NameNode.
> 2. The response message is uncontrollable big. We observed a response as big 
> as 50M bytes when listing a directory of 300 thousand items. Even with the 
> optimization introduced at HDFS-946 that may be able to cut the response by 
> 20-50%, the response size will still in the magnitude of 10 mega bytes.
> I propose to implement a directory listing using multiple RPCs. Here is the 
> plan:
> 1. Each getListing RPC has an upper limit on the number of items returned.  
> This limit could be configurable, but I am thinking to set it to be a fixed 
> number like 500.
> 2. Each RPC additionally specifies a start position for this listing request. 
> I am thinking to use the last item of the previous listing RPC as an 
> indicator. Since NameNode stores all items in a directory as a sorted array, 
> NameNode uses the last item to locate the start item of this listing even if 
> the last item is deleted in between these two consecutive calls. This has the 
> advantage of avoid duplicate entries at the client side.
> 3. The return value additionally specifies if the whole directory is done 
> listing. If the client sees a false flag, it will continue to issue another 
> RPC.
> This proposal will change the semantics of large directory listing in a sense 
> that listing is no longer an atomic operation if a directory's content is 
> changing while the listing operation is in progress.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-779) Automatic move to safe-mode when cluster size drops

2009-11-19 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780404#action_12780404
 ] 

Hong Tang commented on HDFS-779:


+1 on the direction. I my previous job, we also have a similar feature for our 
internal distributed storage system. 

N% would depend on a number of factors:
- how well is the cluster maintained? well maintained cluster should set N% 
low, 10% sounds right to me.
- what is expected capacity utilization of the cluster? For a cluster that is 
expected to be loaded to 90% full, then we have to claim emergency when 10% of 
the capacity is lost.

> Automatic move to safe-mode when cluster size drops
> ---
>
> Key: HDFS-779
> URL: https://issues.apache.org/jira/browse/HDFS-779
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Reporter: Owen O'Malley
>
> As part of looking at using Kerberos, we want to avoid the case where both 
> the primary (and optional secondary) KDC go offline causing a replication 
> storm as the DataNodes' service tickets time out and they lose the ability to 
> connect to the NameNode. However, this is a specific case of a more general 
> problem of loosing too many nodes too quickly. I think we should have an 
> option to go into safe mode if the cluster size goes down more than N% in 
> terms of DataNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-778) DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames.

2009-11-19 Thread Hong Tang (JIRA)
DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
ips as hostnames.
---

 Key: HDFS-778
 URL: https://issues.apache.org/jira/browse/HDFS-778
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hong Tang


DistributedFileSystem.getFileBlockLocations() may occasionally return numeric 
ips as hostnames. This seems to be a breach of the 
FileSystem.getFileBlockLocation() contract:
{noformat}
  /**
   * Return an array containing hostnames, offset and size of 
   * portions of the given file.  For a nonexistent 
   * file or regions, null will be returned.
   *
   * This call is most helpful with DFS, where it returns 
   * hostnames of machines that contain the given file.
   *
   * The FileSystem will simply return an elt containing 'localhost'.
   */
  public BlockLocation[] getFileBlockLocations(FileStatus file, 
  long start, long len) throws IOException
{noformat}

One (maybe minor) consequence of this issue is: When a job includes such 
numeric ips in in its splits' locations, JobTracker would not be able to assign 
the job's map tasks local to the file blocks.

We should either fix the implementation or change the contract. In the latter 
case, JobTracker needs to be fixed to maintain both the hostnames and ips of 
the TaskTrackers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-746) Documenting HDFS metrics

2009-10-30 Thread Hong Tang (JIRA)
Documenting HDFS metrics


 Key: HDFS-746
 URL: https://issues.apache.org/jira/browse/HDFS-746
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hong Tang


As part of HADOOP-6350, we should document the metrics for NameNode and 
DataNode as part of their interfaces.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-738) Improve the disk utilization of HDFS

2009-10-27 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770593#action_12770593
 ] 

Hong Tang commented on HDFS-738:


I have done some empirical observation - on Linux, "iostat -dkx 10" would 
provide two useful metrics: %util and avgqu-sz. %util is a pretty good 
indicator of disk utilization (but sometimes it would shoot over 100%), a high 
%util with a large avgqu-sz (10s to 100s) means overload on disk.  

> Improve the disk utilization of HDFS
> 
>
> Key: HDFS-738
> URL: https://issues.apache.org/jira/browse/HDFS-738
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Zheng Shao
>
> HDFS data node currently assigns writers to disks randomly. This is good if 
> there are a large number of readers/writers on a single data node, but might 
> create a lot of contentions if there are only 4 readers/writers on a 4-disk 
> node.
> A better way is to introduce a base class DiskHandler, for registering all 
> disk operations (read/write), as well as getting the best disk for writing 
> new blocks. A good strategy of the DiskHandler would be to distribute the 
> load of the writes to the disks with more free spaces as well as less recent 
> activities. There can be many strategies.
> This could help improve the HDFS multi-threaded write throughput a lot - we 
> are seeing <25MB/s/disk on a 4-disk/node 4-node cluster (replication is 
> already considered) given 8 concurrent writers (24 writers considering 
> replication). I believe we can improve that to 2x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-738) Improve the disk utilization of HDFS

2009-10-27 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770408#action_12770408
 ] 

Hong Tang commented on HDFS-738:


+1 on the direction.

I have been brewing the idea that we should have a shared io load monitor, 
publishing the load (using util% or queue size) through shared memory, and 
allow task processes to use the same info to decide which disk to write to.

> Improve the disk utilization of HDFS
> 
>
> Key: HDFS-738
> URL: https://issues.apache.org/jira/browse/HDFS-738
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Reporter: Zheng Shao
>
> HDFS data node currently assigns writers to disks randomly. This is good if 
> there are a large number of readers/writers on a single data node, but might 
> create a lot of contentions if there are only 4 readers/writers on a 4-disk 
> node.
> A better way is to introduce a base class DiskHandler, for registering all 
> disk operations (read/write), as well as getting the best disk for writing 
> new blocks. A good strategy of the DiskHandler would be to distribute the 
> load of the writes to the disks with more free spaces as well as less recent 
> activities. There can be many strategies.
> This could help improve the HDFS multi-threaded write throughput a lot - we 
> are seeing <25MB/s/disk on a 4-disk/node 4-node cluster (replication is 
> already considered) given 8 concurrent writers (24 writers considering 
> replication). I believe we can improve that to 2x.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-677) Rename failure due to quota results in deletion of src directory

2009-10-09 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763879#action_12763879
 ] 

Hong Tang commented on HDFS-677:


bq. I do not anticipated any failure after that. But in case some exceptions 
happen, then the src is restored back in a finally block.

Are you asserting that restoring src back will guarantee to succeed? (Or die 
miserably, e.g. OOM and the whole NN will crash). 

A typical way of doing this is:
- lock down both src and dest node.
- Create src.shadow and dest.shadow and set up the state of src.shadow and 
dest.shadow as how src and dest would look like after the rename.
- Support a "guaranteed-to-success" "swap" operation on the node. And "swap" 
between src.shadow and src, and dest.shadow and dest. This is guaranteed to 
success. 


> Rename failure due to quota results in deletion of src directory
> 
>
> Key: HDFS-677
> URL: https://issues.apache.org/jira/browse/HDFS-677
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>Priority: Blocker
> Fix For: 0.20.2, 0.21.0, 0.22.0
>
> Attachments: hdfs-677.8.patch
>
>
> Renaming src to destination where src has exceeded the quota to a dst without 
> sufficent quota fails. During this failure, src is deleted. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-677) Rename failure due to quota results in deletion of src directory

2009-10-08 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763741#action_12763741
 ] 

Hong Tang commented on HDFS-677:


bq. 2. During rename after the src is removed, any failure to complete rename 
is handled by adding src back without checking for quota.
What kind of failures might we encounter during the rename? How can we 
guarantee to have no failures adding src back?

In general, you are trying to make the move operation atomic (either done 
successfully, or no effect is happening). This typically means that you want  
to prepare as much change on the side as possible (without changing the state), 
and effect the state change in one atomic operation.

> Rename failure due to quota results in deletion of src directory
> 
>
> Key: HDFS-677
> URL: https://issues.apache.org/jira/browse/HDFS-677
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>Priority: Blocker
> Fix For: 0.20.2, 0.21.0, 0.22.0
>
> Attachments: hdfs-677.8.patch
>
>
> Renaming src to destination where src has exceeded the quota to a dst without 
> sufficent quota fails. During this failure, src is deleted. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-326) Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.

2009-09-15 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755591#action_12755591
 ] 

Hong Tang commented on HDFS-326:


bq.2. I'm worried about the problem of what happens if you try to stop a 
service while it is starting up. Some of the services do take a while to start, 
and you should have the right to interrupt something like a JobTracker and tell 
it to go away.

Without look at the patch, I am a bit confusion by "... do take a while to 
start", do you mean the service will take a while in STARTING or in 
INITIALIZING? I think in general a component should do the minimum to set up 
itself to a consistent state in STARTING stage and defer length initialization 
in the INITIALIZING stage. And we would allow STARTING to run through 
uninterrupted but support aborting the initialization.

> Add a lifecycle interface for Hadoop components: namenodes, job clients, etc.
> -
>
> Key: HDFS-326
> URL: https://issues.apache.org/jira/browse/HDFS-326
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Steve Loughran
>Assignee: Steve Loughran
> Attachments: AbstractHadoopComponent.java, HADOOP-3628-18.patch, 
> HADOOP-3628-19.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-3628.patch, hadoop-3628.patch, hadoop-3628.patch, 
> hadoop-lifecycle-tomw.sxw, hadoop-lifecycle.pdf, hadoop-lifecycle.pdf, 
> hadoop-lifecycle.sxw
>
>
> I'd like to propose we have a standard interface for hadoop components, the 
> things that get started or stopped when you bring up a namenode. currently, 
> some of these classes have a stop() or shutdown() method, with no standard 
> name/interface, but no way of seeing if they are live, checking their health 
> of shutting them down reliably. Indeed, there is a tendency for the spawned 
> threads to not want to die; to require the entire process to be killed to 
> stop the workers. 
> Having a standard interface would make it easier for 
>  * management tools to manage the different things
>  * monitoring the state of things
>  * subclassing
> The latter is interesting as right now TaskTracker and JobTracker start up 
> threads in their constructor; that's very dangerous as subclasses may have 
> their methods called before they are full initialised. Adding this interface 
> would be the right time to clean up the startup process so that subclassing 
> is less risky.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-599) Improve Namenode robustness by prioritizing datanode heartbeats over client requests

2009-09-09 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753240#action_12753240
 ] 

Hong Tang commented on HDFS-599:


Using different ports for QoS also has its limitations. The request dispatching 
logic must be baked in the client side. Inside the server, multiple threads or 
multiple thread pools that handle different classes are oblivious to each other 
and may not enforce fine grain resource control.This OSDI paper (which I 
co-authored), http://www.usenix.org/events/osdi02/tech/shen.html, describes a 
complicated case of QoS policy and the solution is based on queues.

Switches may be useful for network QoS (e.g. preventing DoS attacks and 
prioritize traffic). But for application level QoS (where the system is not 
constrained by the amount of network traffic, but the work needs to be 
performed in the server), then I'd think queue-based solution is better.

> Improve Namenode robustness by prioritizing datanode heartbeats over client 
> requests
> 
>
> Key: HDFS-599
> URL: https://issues.apache.org/jira/browse/HDFS-599
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The namenode processes RPC requests from clients that are reading/writing to 
> files as well as heartbeats/block reports from datanodes.
> Sometime, because of various reasons (Java GC runs, inconsistent performance 
> of NFS filer that stores HDFS transacttion logs, etc), the namenode 
> encounters transient slowness. For example, if the device that stores the 
> HDFS transaction logs becomes sluggish, the Namenode's ability to process 
> RPCs slows down to a certain extent. During this time, the RPCs from clients 
> as well as the RPCs from datanodes suffer in similar fashion. If the 
> underlying problem becomes worse, the NN's ability to process a heartbeat 
> from a DN is severly impacted, thus causing the NN to declare that the DN is 
> dead. Then the NN starts replicating blocks that used to reside on the 
> now-declared-dead datanode. This adds extra load to the NN. Then the 
> now-declared-datanode finally re-establishes contact with the NN, and sends a 
> block report. The block report processing on the NN is another heavyweight 
> activity, thus casing more load to the already overloaded namenode. 
> My proposal is tha the NN should try its best to continue processing RPCs 
> from datanodes and give lesser priority to serving client requests. The 
> Datanode RPCs are integral to the consistency and performance of the Hadoop 
> file system, and it is better to protect it at all costs. This will ensure 
> that NN  recovers from the hiccup much faster than what it does now.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-07-25 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735343#action_12735343
 ] 

Hong Tang commented on HDFS-503:


Yes, there is a link to the pdf version of the paper: 
http://www.cs.utk.edu/~plank/plank/papers/FAST-2009.pdf.

> Implement erasure coding as a layer on HDFS
> ---
>
> Key: HDFS-503
> URL: https://issues.apache.org/jira/browse/HDFS-503
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
> file system can be reduced. Keeping three copies of the same data is very 
> costly, especially when the size of storage is huge. One idea is to reduce 
> the replication factor and do erasure coding of a set of blocks so that the 
> over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see 
> http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
> described DiskReduce 
> https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base 
> HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-503) Implement erasure coding as a layer on HDFS

2009-07-24 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735011#action_12735011
 ] 

Hong Tang commented on HDFS-503:


As a reference, FAST 09 has a paper that benchmarks the performance of various 
open source erasure coding implementations: 
http://www.cs.utk.edu/~plank/plank/papers/FAST-2009.html.

> Implement erasure coding as a layer on HDFS
> ---
>
> Key: HDFS-503
> URL: https://issues.apache.org/jira/browse/HDFS-503
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS 
> file system can be reduced. Keeping three copies of the same data is very 
> costly, especially when the size of storage is huge. One idea is to reduce 
> the replication factor and do erasure coding of a set of blocks so that the 
> over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see 
> http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has 
> described DiskReduce 
> https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base 
> HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-15 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731700#action_12731700
 ] 

Hong Tang commented on HDFS-487:


UUID is not a requirement based on the usage cases everybody presented here. 
But it would probably make the system a bit more future proof. The few cases I 
can think of (not necessarily independent) are (1) cross co-lo synchronization; 
(2) federated hdfs systems; (3) separation of block management.

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-15 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731611#action_12731611
 ] 

Hong Tang commented on HDFS-487:


The wikipedia page tells you 5 ways of doing it. 

> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: fileid1.txt
>
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-487) HDFS should expose a fileid to uniquely identify a file

2009-07-13 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730447#action_12730447
 ] 

Hong Tang commented on HDFS-487:


How about using 
[UUID|http://en.wikipedia.org/wiki/Universally_Unique_Identifier]? They have 
advantages:
- can be calculated independently by any process.
- guaranteed to be unique globally, even across two different HDFS instances, 
which would make it easier if we want to build a federated file system.


> HDFS should expose a fileid to uniquely identify a file
> ---
>
> Key: HDFS-487
> URL: https://issues.apache.org/jira/browse/HDFS-487
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
>
> HDFS should expose a id that uniquely identifies a file. This helps in 
> developing  applications that work correctly even when files are moved from 
> one directory to another. A typical use-case is to make the Pluggable Block 
> Placement Policy (HDFS-385) use fileid instead of filename.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728987#action_12728987
 ] 

Hong Tang commented on HDFS-385:


@sanjay, my understanding is that in the scenario you described, policy 2 has 
to be compatible with policy 1. This could mean two things: (1) more general 
constraints being added to policy 2 and policy 2 does not violate any 
constraints of policy 1, eg. I want to have a copy of this file close to an 
existing file (and obey the general constraints). (2) specialize on individual 
files or directories by user hints: such as I want this file to be replicated 
twice but no need to be on two different racks.

The paper I cited earlier offers an interesting way of expressing per-file or 
directory hints. But we may consider supporting it through something similar to 
mac's extended attributes.

Flexible, Wide-Area Storage for Distributed Systems with WheelFS
Jeremy Stribling, MIT CSAIL; Yair Sovran, New York University; Irene Zhang and 
Xavid Pretzer, MIT CSAIL; Jinyang Li, New York University; M. Frans Kaashoek 
and Robert Morris, MIT CSAIL 
http://www.usenix.org/events/nsdi09/tech/full_papers/stribling/stribling.pdf



> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt, 
> BlockPlacementPluggable5.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-08 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728613#action_12728613
 ] 

Hong Tang commented on HDFS-385:


@Dhruba, Fine with me (obviously you have more context than I do to justify 
this).

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-385) Design a pluggable interface to place replicas of blocks in HDFS

2009-07-06 Thread Hong Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727855#action_12727855
 ] 

Hong Tang commented on HDFS-385:


Some minor nits:
- In class BlockPlacementPolicy, the javadoc for chooseTarget contains an 
unused parameter excludedNodes. 
- There are two versions of abstract chooseTarget(), is it possible to provide 
a default implementation for chooseTarget(FSInodeName srcInode, int 
numOfReplicas, DatanodeDescriptor writer,  List 
chosenNodes, long blocksize) on top of chooseTarget(String srcPath, int 
numOfReplicas, DatanodeDescriptor writer,  List 
chosenNodes, long blocksize) ?
-  There is asymmetry for chooseTarget which takes a list of DatanodeDescriptor 
and returns an array of DatanodeDescriptor. Why not returning a List too?

> Design a pluggable interface to place replicas of blocks in HDFS
> 
>
> Key: HDFS-385
> URL: https://issues.apache.org/jira/browse/HDFS-385
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Fix For: 0.21.0
>
> Attachments: BlockPlacementPluggable.txt, 
> BlockPlacementPluggable2.txt, BlockPlacementPluggable3.txt, 
> BlockPlacementPluggable4.txt, BlockPlacementPluggable4.txt
>
>
> The current HDFS code typically places one replica on local rack, the second 
> replica on remote random rack and the third replica on a random node of that 
> remote rack. This algorithm is baked in the NameNode's code. It would be nice 
> to make the block placement algorithm a pluggable interface. This will allow 
> experimentation of different placement algorithms based on workloads, 
> availability guarantees and failure models.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.