[jira] Commented: (HDFS-867) Add a PowerTopology class to aid replica placement and enhance availability of blocks

2010-01-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796665#action_12796665
 ] 

Steve Loughran commented on HDFS-867:
-

How do plan to use this? In data placement? And balancing?

 Add a PowerTopology class to aid replica placement and enhance availability 
 of blocks 
 --

 Key: HDFS-867
 URL: https://issues.apache.org/jira/browse/HDFS-867
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Jeff Hammerbacher
Priority: Minor

 Power outages are a common reason for a DataNode to become unavailable. 
 Having a data structure to represent to the power topology of your data 
 center can be used to implement a power-aware replica placement policy.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-853) The HDFS webUI should show a metric that summarizes whether the cluster is balanced regarding disk space usage

2010-01-05 Thread Andrew Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796772#action_12796772
 ] 

Andrew Ryan commented on HDFS-853:
--

We're currently graphing both mean and standard deviation of datanodes from 
that mean, using a script that parses the output of 'dfsadmin -report'. Our DFS 
cluster nodes all have the same amount of disk space, so you'd expect mean of 
individual datanodes to be the same as % DFS full, but it's not quite the same. 
Haven't yet looked into why this is so.

To directly answer Konstantin's question, the one line we're using is standard 
deviation.

 The HDFS webUI should show a metric that summarizes whether the cluster is 
 balanced regarding disk space usage
 --

 Key: HDFS-853
 URL: https://issues.apache.org/jira/browse/HDFS-853
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur

 It is desirable to know how much the datanodes vary form one another in terms 
 of space utilization to get a sense of how well a HDFS cluster is balanced.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-717) Proposal for exceptions thrown by FileContext and Abstract File System

2010-01-05 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796790#action_12796790
 ] 

Suresh Srinivas commented on HDFS-717:
--

The following are the layers between the application and the Service 
Implementation (such as NameNode).
Application = Client library = =RPC client= = Network = = RPC server = 
= Service Impl

Key goals:
# InterruptedExceptions in the client library should not be ignored. This will 
help in clean application shutdown. InterruptedException on the server side 
should not be ignored; see below.
# Applications must be able to differentiate between RPC layer exception from 
the exceptions in the Service Impl. Applications can choose to retry a request 
based on different categories of exceptions received.
# Exceptions declared in the API should be propagated end to end over RPC from 
the server to the application. All undeclared exceptions from the Service Impl 
including InterruptedException should be handled by the RPC layer.
# Changes needed in applications to move to FileContext from FileSystem should 
be minimal.

Proposal: 
Exceptions will be organized as shown below.
# IOException
#* exceptions as declared in the RPC API - note the detailed method exception 
will be declared even though they are a subclass of IOException
#* RPCException  - exceptions in the rpc layer
#** RPCClientException - exception encountered in RPC client
#** RPCServerException - exception encountered in RPC server
#** UnexpectedServerException - unexpected exception from the Service Impl to 
RPC handlers.
# RunTimeException
#* HadoopIllegalArgumentException - sublcass of IllegalArgumentException 
indicates illegal or inappropriate argument. 
#* HadoopInterruptedException - subclass of RunTimeException thrown on 
encountering InterruptedException.
#* UnsupportedOperationException - thrown to indicate the requested operation 
is not supported.

Rationale:
# declared exception should be subclass of IOException as before - no changes 
here.
# group the rpc exceptions categorized by client side and server side.
# use runtime exception for InterrruptedException - simplifies migration to 
FileContext. Subclass of IOException not used as applications might have catch 
and ignore code.
# HadoopIllegalArgumentException instead of the java IllegalArgumentException - 
helps differentiate exception in Hadoop implementation from exception thrown 
from java libraries. Applications can choose to catch IllegalArgumentException.
# unsupported operation is indicated by unchecked UnsupportedOperationException 
- subclass of IOException not used as applications might have catch and ignore 
code. Using RunTimeException since applications cannot recover from this 
condition.

Implementation details:
InterruptedException handling:
# Client side changes
#* Client library (both API interface and RPC client) and InputStream and 
OutputStream returned by FileContext throw unchecked HadoopInterruptedException 
on InterruptedException.
# Server changes:
#* InterruptedException is currently ignored in the Service Impl layer. With 
this change the Service Impl will throw the exception. Methods in protocol 
classes such as ClientProtocol will specify InterruptedException in throws 
clause.
#* On InterruptedException, RPC handlers close the socket connection to the 
client. Client handles this failure same as loss of connection.

RPC layer changes
# RPC layer marshalls HadoopInterruptedException, 
HadoopIllegalArgumentException, UnsupportedException from Service Impl all the 
way to the client.
# RPC layer throws RPCClientException, RPCServerException and 
UnexpectedServerException.

FileContext and AbstractFileSystem and protocol changes:
# Methods in FileContext declare IOException and the relevant subclasses of 
IOExceptions. This helps document the specific exceptions thrown and in 
marshalling the exception from the server to application over RPC. 
RPCExceptions are not declared as thrown in FileContext and AbstractFileSystem, 
as some implementation might not use RPC layer (Local file system).
example:
{noformat}
public FSDataInputStream open(Path path) throws IOException, 
FileNotFoundException, AccessDeniedException;
{noformat}
# Protocol methods (such as ClientProtocol) will the throw exceptions similar 
to FileContext, along with InterruptedException.

Finally the FileContext will throw the following exceptions. The exception 
hierarchy is flattened. The semantics remains as defined in the earlier 
comments.
# IOException
#* ServerNotReadyException (NameNode safemode etc)
#* OutOfSpaceException for write operations
#* AccessControlException
#* InvalidPathNameException
#* FileNotFoundException
#* FileAlreadyExistsException
#* DirectoryNotEmptyException
#* NotDirectoryException
#* DirectoryNotAllowedException


 Proposal for exceptions thrown by FileContext and Abstract File System
 

[jira] Commented: (HDFS-853) The HDFS webUI should show a metric that summarizes whether the cluster is balanced regarding disk space usage

2010-01-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796795#action_12796795
 ] 

Konstantin Shvachko commented on HDFS-853:
--

May be we should use the mean and standard deviation of _utilization_ rather 
than directly disk space. This would work for heterogeneous clusters as well. 
By utilization I mean the percentage of disk space used for blocks on a 
data-node. We should also make sure this is consistent with the Balancer: 
balancing should improve the metrics.

 The HDFS webUI should show a metric that summarizes whether the cluster is 
 balanced regarding disk space usage
 --

 Key: HDFS-853
 URL: https://issues.apache.org/jira/browse/HDFS-853
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: dhruba borthakur

 It is desirable to know how much the datanodes vary form one another in terms 
 of space utilization to get a sense of how well a HDFS cluster is balanced.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline

2010-01-05 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796810#action_12796810
 ] 

Konstantin Shvachko commented on HDFS-826:
--

I remember we intended to implement but haven't done it yet two things related 
to this issue.
# The client throws an exception to the application when the write pipeline 
falls _below_ the minimal replication factor.
# A client should be able to close a file even if its last block is not 
complete. With the following semantics: if the last block has at least one 
valid replica it will be fully replicated, otherwise the last block is treated 
as a corrupt block.

It seems the patch proposes new api to work around the problems rather than 
addressing them directly.

 Allow a mechanism for an application to detect that datanode(s)  have died in 
 the write pipeline
 

 Key: HDFS-826
 URL: https://issues.apache.org/jira/browse/HDFS-826
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: ReplicableHdfs.txt


 HDFS does not replicate the last block of the file that is being currently 
 written to by an application. Every datanode death in the write pipeline 
 decreases the reliability of the last block of the currently-being-written 
 block. This situation can be improved if the application can be notified of a 
 datanode death in the write pipeline. Then, the application can decide what 
 is the right course of action to be taken on this event.
 In our use-case, the application can close the file on the first datanode 
 death, and start writing to a newly created file. This ensures that the 
 reliability guarantee of a block is close to 3 at all time.
 One idea is to make DFSOutoutStream. write() throw an exception if the number 
 of datanodes in the write pipeline fall below minimum.replication.factor that 
 is set on the client (this is backward compatible).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-755:
-

Status: Open  (was: Patch Available)

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-755:
-

Status: Patch Available  (was: Open)

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796895#action_12796895
 ] 

Hadoop QA commented on HDFS-755:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12427406/alldata-hdfs.tsv
  against trunk revision 895877.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/169/console

This message is automatically generated.

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-755:
-

Status: Patch Available  (was: Open)

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
 hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-755:
-

Status: Open  (was: Patch Available)

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
 hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-755:
-

Attachment: hdfs-755.txt

Reattaching same patch so Hudson doesn't try to apply benchmark results as a 
patch.

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
 hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-870) Topology is permanently cached

2010-01-05 Thread Allen Wittenauer (JIRA)
Topology is permanently cached
--

 Key: HDFS-870
 URL: https://issues.apache.org/jira/browse/HDFS-870
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Allen Wittenauer


Replacing the topology script requires a namenode bounce because the NN caches 
the information permanently.  It should really either expire it periodically or 
expire on -refreshNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796947#action_12796947
 ] 

Hadoop QA commented on HDFS-755:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429481/hdfs-755.txt
  against trunk revision 895877.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/console

This message is automatically generated.

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
 hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796952#action_12796952
 ] 

Todd Lipcon commented on HDFS-755:
--

I think these failures are spurious - the same test passes locally:
{noformat}
[junit] Running org.apache.hadoop.hdfs.TestDataTransferProtocol
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.678 sec
{noformat}

 Read multiple checksum chunks at once in DFSInputStream
 ---

 Key: HDFS-755
 URL: https://issues.apache.org/jira/browse/HDFS-755
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
 hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
 hdfs-755.txt


 HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
 checksum chunks in a single call to readChunk. This is the HDFS-side use of 
 that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.