[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844943#action_12844943 ] Zlatin Balevsky commented on HDFS-918: -- The max and current sizes of the threadpool really should be exported as metrics if the unlimited model is used. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844485#action_12844485 ] Zlatin Balevsky commented on HDFS-918: -- bq. I think it is very important to have separate pools for each partition +1 Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1034) Enhance datanode to read data and checksum file in parallel
[ https://issues.apache.org/jira/browse/HDFS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844269#action_12844269 ] Zlatin Balevsky commented on HDFS-1034: --- How complicated would it be to store the checksum file on a separate mount point? In JBOD configurations this will enable both reads to happen simultaneously. Enhance datanode to read data and checksum file in parallel --- Key: HDFS-1034 URL: https://issues.apache.org/jira/browse/HDFS-1034 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur In the current HDFS implementation, a read of a block issued to the datanode results in a disk access to the checksum file followed by a disk access to the checksum file. It would be nice to be able to do these two IOs in parallel to reduce read latency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-1034) Enhance datanode to read data and checksum file in parallel
[ https://issues.apache.org/jira/browse/HDFS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844272#action_12844272 ] Zlatin Balevsky commented on HDFS-1034: --- The only possible bottleneck is the extra disk seek which may or may not be a big deal. Probably for HBase-type workloads. There are many ways around that including but not limited to: a) prepending a copy of the checksum file to the block file while keeping the separate copy intact for off-thread verification after the transfer starts b) using some ext4-extents jni magic ... ? Enhance datanode to read data and checksum file in parallel --- Key: HDFS-1034 URL: https://issues.apache.org/jira/browse/HDFS-1034 Project: Hadoop HDFS Issue Type: Improvement Reporter: dhruba borthakur Assignee: dhruba borthakur In the current HDFS implementation, a read of a block issued to the datanode results in a disk access to the checksum file followed by a disk access to the checksum file. It would be nice to be able to do these two IOs in parallel to reduce read latency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843559#action_12843559 ] Zlatin Balevsky commented on HDFS-918: -- bq. Current DFS : 92MB/s over 60 runs bq. Multiplex : 97 MB/s over 60 runs bq. Either random variation, or maybe larger packet size helps A http://en.wikipedia.org/wiki/Student's_t-test will help you figure out if this difference is statistically significant or can be attributed to random variation. It is an essential tool when benchmarking modifications. The R project distro will make it trivial to perform. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834274#action_12834274 ] Zlatin Balevsky commented on HDFS-918: -- Jay, the selector thread is likely busylooping because select() will return immediately if any channels are writable. Cancelling takes a select() call and you cannot re-register the channel until the key has been properly cancelled and removed from the selector key sets. It is easier to turn write interest off before passing the writable channel to the threadpool. When the threadpool is done with transferTo(), pass the channel back to the select()-ing thread and instruct it to turn write interest back on. (Do not change the interest outside the selecting thread.) Hope this helps. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads
[ https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833579#action_12833579 ] Zlatin Balevsky commented on HDFS-918: -- I see a problem with doing the disk read on the same thread that is doing the select()-ing; the round-robining of several selector threads doesn't help you avoid a situation where a channel is writable, but the selecting thread is stuck in a transferTo call to another channel even if there are other selector threads in handlers[] available. With an architecture like this you will always perform worse than a thread-per-stream approach. Instead you could have a single selector thread that blocks only on select() and never does any disk io (including creation of RandomAccessFile objects). It simply dispatches the writable channels to a threadpool that does the actual transferTo calls. Use single Selector and small thread pool to replace many instances of BlockSender for reads Key: HDFS-918 URL: https://issues.apache.org/jira/browse/HDFS-918 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Jay Booth Fix For: 0.22.0 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-multiplex.patch Currently, on read requests, the DataXCeiver server allocates a new thread per request, which must allocate its own buffers and leads to higher-than-optimal CPU and memory usage by the sending threads. If we had a single selector and a small threadpool to multiplex request packets, we could theoretically achieve higher performance while taking up fewer resources and leaving more CPU on datanodes available for mapred, hbase or whatever. This can be done without changing any wire protocols. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-945) Make NameNode resilient to DoS attacks (malicious or otherwise)
[ https://issues.apache.org/jira/browse/HDFS-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829190#action_12829190 ] Zlatin Balevsky commented on HDFS-945: -- Any type of rate-limiting should be either optional or configurable on per-application basis. Make NameNode resilient to DoS attacks (malicious or otherwise) --- Key: HDFS-945 URL: https://issues.apache.org/jira/browse/HDFS-945 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Reporter: Arun C Murthy We've seen defective applications cause havoc on the NameNode, for e.g. by doing 100k+ 'listStatus' on very large directories (60k files) etc. I'd like to start a discussion around how we prevent such, and possibly malicious applications in the future, taking down the NameNode. Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-928) Ability to provide custom DatanodeProtocol implementation
[ https://issues.apache.org/jira/browse/HDFS-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805920#action_12805920 ] Zlatin Balevsky commented on HDFS-928: -- We may be talking about the same thing. It will be easier to unit-test the DataNode class by providing mock implementations of the DatanodeProtocol and then registering expectations and whatnot. But to plug in the mocked implementation you have to explicitly set the reference after the DataNode object is constructed. It is cleaner to use the mocked implementation from the get-go as you avoid executing a lot of code and that will make your unit tests more focused and self-contained. It will also make sure that any code inside the constructor is using the mocked object and you will be able to observe/test any interactions there. Ability to provide custom DatanodeProtocol implementation - Key: HDFS-928 URL: https://issues.apache.org/jira/browse/HDFS-928 Project: Hadoop HDFS Issue Type: Wish Components: data-node Reporter: Zlatin Balevsky Priority: Trivial This should make testing easier as well as allow users to provide their own RPC/namenode implementations. It's pretty straightforward: 1. add interface DatanodeProtocolProvider { DatanodeProtocol getNameNode(Configuration conf); } 2. add a config setting like dfs.datanode.protocol.impl 3. create a default implementation and copy/paste the RPC initialization code there -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-928) Ability to provide custom DatanodeProtocol implementation
[ https://issues.apache.org/jira/browse/HDFS-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806012#action_12806012 ] Zlatin Balevsky commented on HDFS-928: -- From a puristic QE point of view, right now you cannot do any unit tests on the Datanode component because you inevitably end up testing the RPC code and the Namenode code. What you have now are integration tests; you absolutely need to have those but they are not as helpful as unit tests to pinpoint problems. For example right now a bug in the RPC code will cause the Datanode tests to fail. If all you know is that a bunch of tests failed, it will take you less time to find the bug if there are fewer places to start look at. Going back to the reason for this wish, I don't want to reinvent dependency injection but the easier it is to swap things in and out, the easier it is to write tests and to develop. More importantly, it makes it easier for third parties (i.e. myself) to modify the source code for their specific needs and the project as a whole only benefits from this. Ability to provide custom DatanodeProtocol implementation - Key: HDFS-928 URL: https://issues.apache.org/jira/browse/HDFS-928 Project: Hadoop HDFS Issue Type: Wish Components: data-node Reporter: Zlatin Balevsky Priority: Trivial This should make testing easier as well as allow users to provide their own RPC/namenode implementations. It's pretty straightforward: 1. add interface DatanodeProtocolProvider { DatanodeProtocol getNameNode(Configuration conf); } 2. add a config setting like dfs.datanode.protocol.impl 3. create a default implementation and copy/paste the RPC initialization code there -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-926) BufferedDFSInputStream
BufferedDFSInputStream -- Key: HDFS-926 URL: https://issues.apache.org/jira/browse/HDFS-926 Project: Hadoop HDFS Issue Type: Wish Components: hdfs client Reporter: Zlatin Balevsky Priority: Minor Self-explanatory. Buffer size can be provided in number of blocks. Could be implemented trivially with heap storage and several BlockReaders or could have more advanced features like: * logic to ensure that blocks are not pulled from the same Datanode(s). * local filesystem store for buffered blocks * adaptive parallelism -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-928) Ability to provide custom DatanodeProtocol implementation
Ability to provide custom DatanodeProtocol implementation - Key: HDFS-928 URL: https://issues.apache.org/jira/browse/HDFS-928 Project: Hadoop HDFS Issue Type: Wish Components: data-node Reporter: Zlatin Balevsky Priority: Trivial This should make testing easier as well as allow users to provide their own RPC/namenode implementations. It's pretty straightforward: 1. add interface DatanodeProtocolProvider { DatanodeProtocol getNameNode(Configuration conf); } 2. add a config setting like dfs.datanode.protocol.impl 3. create a default implementation and copy/paste the RPC initialization code there -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-912) sed in build.xml fails
[ https://issues.apache.org/jira/browse/HDFS-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803962#action_12803962 ] Zlatin Balevsky commented on HDFS-912: -- It fails for me on Windows under eclipse. Works fine under cygwin though. sed in build.xml fails --- Key: HDFS-912 URL: https://issues.apache.org/jira/browse/HDFS-912 Project: Hadoop HDFS Issue Type: Bug Environment: ant 1.7.1 Solaris Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Minor Attachments: HDFS-912.txt This is the HDFS version of HADOOP-6505. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-854) Datanode should scan devices in parallel to generate block report
[ https://issues.apache.org/jira/browse/HDFS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803480#action_12803480 ] Zlatin Balevsky commented on HDFS-854: -- If it is not possible to move the i/o operations listFiles() and length() outside of the lock, it would make sense to set a flag that a block report is in progress so that the rest of the datanode doesn't just hang. My 2c. Datanode should scan devices in parallel to generate block report - Key: HDFS-854 URL: https://issues.apache.org/jira/browse/HDFS-854 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: dhruba borthakur A Datanode should scan its disk devices in parallel so that the time to generate a block report is reduced. This will reduce the startup time of a cluster. A datanode has 12 disk (each of 1 TB) to store HDFS blocks. There is a total of 150K blocks on these 12 disks. It takes the datanode upto 20 minutes to scan these devices to generate the first block report. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.