[jira] Commented: (HDFS-867) Add a PowerTopology class to aid replica placement and enhance availability of blocks
[ https://issues.apache.org/jira/browse/HDFS-867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796665#action_12796665 ] Steve Loughran commented on HDFS-867: - How do plan to use this? In data placement? And balancing? Add a PowerTopology class to aid replica placement and enhance availability of blocks -- Key: HDFS-867 URL: https://issues.apache.org/jira/browse/HDFS-867 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jeff Hammerbacher Priority: Minor Power outages are a common reason for a DataNode to become unavailable. Having a data structure to represent to the power topology of your data center can be used to implement a power-aware replica placement policy. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-853) The HDFS webUI should show a metric that summarizes whether the cluster is balanced regarding disk space usage
[ https://issues.apache.org/jira/browse/HDFS-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796772#action_12796772 ] Andrew Ryan commented on HDFS-853: -- We're currently graphing both mean and standard deviation of datanodes from that mean, using a script that parses the output of 'dfsadmin -report'. Our DFS cluster nodes all have the same amount of disk space, so you'd expect mean of individual datanodes to be the same as % DFS full, but it's not quite the same. Haven't yet looked into why this is so. To directly answer Konstantin's question, the one line we're using is standard deviation. The HDFS webUI should show a metric that summarizes whether the cluster is balanced regarding disk space usage -- Key: HDFS-853 URL: https://issues.apache.org/jira/browse/HDFS-853 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur It is desirable to know how much the datanodes vary form one another in terms of space utilization to get a sense of how well a HDFS cluster is balanced. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-717) Proposal for exceptions thrown by FileContext and Abstract File System
[ https://issues.apache.org/jira/browse/HDFS-717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796790#action_12796790 ] Suresh Srinivas commented on HDFS-717: -- The following are the layers between the application and the Service Implementation (such as NameNode). Application = Client library = =RPC client= = Network = = RPC server = = Service Impl Key goals: # InterruptedExceptions in the client library should not be ignored. This will help in clean application shutdown. InterruptedException on the server side should not be ignored; see below. # Applications must be able to differentiate between RPC layer exception from the exceptions in the Service Impl. Applications can choose to retry a request based on different categories of exceptions received. # Exceptions declared in the API should be propagated end to end over RPC from the server to the application. All undeclared exceptions from the Service Impl including InterruptedException should be handled by the RPC layer. # Changes needed in applications to move to FileContext from FileSystem should be minimal. Proposal: Exceptions will be organized as shown below. # IOException #* exceptions as declared in the RPC API - note the detailed method exception will be declared even though they are a subclass of IOException #* RPCException - exceptions in the rpc layer #** RPCClientException - exception encountered in RPC client #** RPCServerException - exception encountered in RPC server #** UnexpectedServerException - unexpected exception from the Service Impl to RPC handlers. # RunTimeException #* HadoopIllegalArgumentException - sublcass of IllegalArgumentException indicates illegal or inappropriate argument. #* HadoopInterruptedException - subclass of RunTimeException thrown on encountering InterruptedException. #* UnsupportedOperationException - thrown to indicate the requested operation is not supported. Rationale: # declared exception should be subclass of IOException as before - no changes here. # group the rpc exceptions categorized by client side and server side. # use runtime exception for InterrruptedException - simplifies migration to FileContext. Subclass of IOException not used as applications might have catch and ignore code. # HadoopIllegalArgumentException instead of the java IllegalArgumentException - helps differentiate exception in Hadoop implementation from exception thrown from java libraries. Applications can choose to catch IllegalArgumentException. # unsupported operation is indicated by unchecked UnsupportedOperationException - subclass of IOException not used as applications might have catch and ignore code. Using RunTimeException since applications cannot recover from this condition. Implementation details: InterruptedException handling: # Client side changes #* Client library (both API interface and RPC client) and InputStream and OutputStream returned by FileContext throw unchecked HadoopInterruptedException on InterruptedException. # Server changes: #* InterruptedException is currently ignored in the Service Impl layer. With this change the Service Impl will throw the exception. Methods in protocol classes such as ClientProtocol will specify InterruptedException in throws clause. #* On InterruptedException, RPC handlers close the socket connection to the client. Client handles this failure same as loss of connection. RPC layer changes # RPC layer marshalls HadoopInterruptedException, HadoopIllegalArgumentException, UnsupportedException from Service Impl all the way to the client. # RPC layer throws RPCClientException, RPCServerException and UnexpectedServerException. FileContext and AbstractFileSystem and protocol changes: # Methods in FileContext declare IOException and the relevant subclasses of IOExceptions. This helps document the specific exceptions thrown and in marshalling the exception from the server to application over RPC. RPCExceptions are not declared as thrown in FileContext and AbstractFileSystem, as some implementation might not use RPC layer (Local file system). example: {noformat} public FSDataInputStream open(Path path) throws IOException, FileNotFoundException, AccessDeniedException; {noformat} # Protocol methods (such as ClientProtocol) will the throw exceptions similar to FileContext, along with InterruptedException. Finally the FileContext will throw the following exceptions. The exception hierarchy is flattened. The semantics remains as defined in the earlier comments. # IOException #* ServerNotReadyException (NameNode safemode etc) #* OutOfSpaceException for write operations #* AccessControlException #* InvalidPathNameException #* FileNotFoundException #* FileAlreadyExistsException #* DirectoryNotEmptyException #* NotDirectoryException #* DirectoryNotAllowedException Proposal for exceptions thrown by FileContext and Abstract File System
[jira] Commented: (HDFS-853) The HDFS webUI should show a metric that summarizes whether the cluster is balanced regarding disk space usage
[ https://issues.apache.org/jira/browse/HDFS-853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796795#action_12796795 ] Konstantin Shvachko commented on HDFS-853: -- May be we should use the mean and standard deviation of _utilization_ rather than directly disk space. This would work for heterogeneous clusters as well. By utilization I mean the percentage of disk space used for blocks on a data-node. We should also make sure this is consistent with the Balancer: balancing should improve the metrics. The HDFS webUI should show a metric that summarizes whether the cluster is balanced regarding disk space usage -- Key: HDFS-853 URL: https://issues.apache.org/jira/browse/HDFS-853 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: dhruba borthakur It is desirable to know how much the datanodes vary form one another in terms of space utilization to get a sense of how well a HDFS cluster is balanced. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-826) Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline
[ https://issues.apache.org/jira/browse/HDFS-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796810#action_12796810 ] Konstantin Shvachko commented on HDFS-826: -- I remember we intended to implement but haven't done it yet two things related to this issue. # The client throws an exception to the application when the write pipeline falls _below_ the minimal replication factor. # A client should be able to close a file even if its last block is not complete. With the following semantics: if the last block has at least one valid replica it will be fully replicated, otherwise the last block is treated as a corrupt block. It seems the patch proposes new api to work around the problems rather than addressing them directly. Allow a mechanism for an application to detect that datanode(s) have died in the write pipeline Key: HDFS-826 URL: https://issues.apache.org/jira/browse/HDFS-826 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Reporter: dhruba borthakur Assignee: dhruba borthakur Attachments: ReplicableHdfs.txt HDFS does not replicate the last block of the file that is being currently written to by an application. Every datanode death in the write pipeline decreases the reliability of the last block of the currently-being-written block. This situation can be improved if the application can be notified of a datanode death in the write pipeline. Then, the application can decide what is the right course of action to be taken on this event. In our use-case, the application can close the file on the first datanode death, and start writing to a newly created file. This ensures that the reliability guarantee of a block is close to 3 at all time. One idea is to make DFSOutoutStream. write() throw an exception if the number of datanodes in the write pipeline fall below minimum.replication.factor that is set on the client (this is backward compatible). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-755: - Status: Open (was: Patch Available) Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-755: - Status: Patch Available (was: Open) Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796895#action_12796895 ] Hadoop QA commented on HDFS-755: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12427406/alldata-hdfs.tsv against trunk revision 895877. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/169/console This message is automatically generated. Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-755: - Status: Patch Available (was: Open) Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-755: - Status: Open (was: Patch Available) Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-755: - Attachment: hdfs-755.txt Reattaching same patch so Hudson doesn't try to apply benchmark results as a patch. Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-870) Topology is permanently cached
Topology is permanently cached -- Key: HDFS-870 URL: https://issues.apache.org/jira/browse/HDFS-870 Project: Hadoop HDFS Issue Type: Bug Reporter: Allen Wittenauer Replacing the topology script requires a namenode bounce because the NN caches the information permanently. It should really either expire it periodically or expire on -refreshNodes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796947#action_12796947 ] Hadoop QA commented on HDFS-755: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429481/hdfs-755.txt against trunk revision 895877. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/170/console This message is automatically generated. Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream
[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796952#action_12796952 ] Todd Lipcon commented on HDFS-755: -- I think these failures are spurious - the same test passes locally: {noformat} [junit] Running org.apache.hadoop.hdfs.TestDataTransferProtocol [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 5.678 sec {noformat} Read multiple checksum chunks at once in DFSInputStream --- Key: HDFS-755 URL: https://issues.apache.org/jira/browse/HDFS-755 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs client Affects Versions: 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple checksum chunks in a single call to readChunk. This is the HDFS-side use of that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.