[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-13 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844943#action_12844943
 ] 

Zlatin Balevsky commented on HDFS-918:
--

The max and current sizes of the threadpool really should be exported as 
metrics if the unlimited model is used.

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
 hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-12 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844485#action_12844485
 ] 

Zlatin Balevsky commented on HDFS-918:
--

bq. I think it is very important to have separate pools for each partition

+1

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
 hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1034) Enhance datanode to read data and checksum file in parallel

2010-03-11 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844269#action_12844269
 ] 

Zlatin Balevsky commented on HDFS-1034:
---

How complicated would it be to store the checksum file on a separate mount 
point?  In JBOD configurations this will enable both reads to happen 
simultaneously.  

 Enhance datanode to read data and checksum file in parallel
 ---

 Key: HDFS-1034
 URL: https://issues.apache.org/jira/browse/HDFS-1034
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 In the current HDFS implementation, a read of a block issued to the datanode 
 results in a disk access to the checksum file followed by a disk access to 
 the checksum file. It would be nice to be able to do these two IOs in 
 parallel to reduce read latency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1034) Enhance datanode to read data and checksum file in parallel

2010-03-11 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12844272#action_12844272
 ] 

Zlatin Balevsky commented on HDFS-1034:
---

The only possible bottleneck is the extra disk seek which may or may not be a 
big deal.  Probably for HBase-type workloads.  There are many ways around that 
including but not limited to: 

a) prepending a copy of the checksum file to the block file while keeping the 
separate copy intact for off-thread verification after the transfer starts
b) using some ext4-extents jni magic
... ?


 Enhance datanode to read data and checksum file in parallel
 ---

 Key: HDFS-1034
 URL: https://issues.apache.org/jira/browse/HDFS-1034
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur

 In the current HDFS implementation, a read of a block issued to the datanode 
 results in a disk access to the checksum file followed by a disk access to 
 the checksum file. It would be nice to be able to do these two IOs in 
 parallel to reduce read latency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-03-10 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12843559#action_12843559
 ] 

Zlatin Balevsky commented on HDFS-918:
--


bq. Current DFS : 92MB/s over 60 runs
bq. Multiplex : 97 MB/s over 60 runs
bq. Either random variation, or maybe larger packet size helps 

A http://en.wikipedia.org/wiki/Student's_t-test will help you figure out if 
this difference is statistically significant or can be attributed to random 
variation.  It is an essential tool when benchmarking modifications.  The R 
project distro will make it trivial to perform.

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-918-20100228.patch, hdfs-918-20100309.patch, 
 hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-16 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12834274#action_12834274
 ] 

Zlatin Balevsky commented on HDFS-918:
--

Jay,

the selector thread is likely busylooping because select() will return 
immediately if any channels are writable.  Cancelling takes a select() call and 
you cannot re-register the channel until the key has been properly cancelled 
and removed from the selector key sets.  It is easier to turn write interest 
off before passing the writable channel to the threadpool.  When the threadpool 
is done with transferTo(), pass the channel back to the select()-ing thread and 
instruct it to turn write interest back on.  (Do not change the interest 
outside the selecting thread.)

Hope this helps.


 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-02-14 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833579#action_12833579
 ] 

Zlatin Balevsky commented on HDFS-918:
--

I see a problem with doing the disk read on the same thread that is doing the 
select()-ing; the round-robining of several selector threads doesn't help you 
avoid a situation where a channel is writable, but the selecting thread is 
stuck in a transferTo call to another channel even if there are other selector 
threads in handlers[] available.  With an architecture like this you will 
always perform worse than a thread-per-stream approach.

Instead you could have a single selector thread that blocks only on select() 
and never does any disk io (including creation of RandomAccessFile objects).  
It simply dispatches the writable channels to a threadpool that does the actual 
transferTo calls. 





 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
 hdfs-918-20100211.patch, hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-945) Make NameNode resilient to DoS attacks (malicious or otherwise)

2010-02-03 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829190#action_12829190
 ] 

Zlatin Balevsky commented on HDFS-945:
--

Any type of rate-limiting should be either optional or configurable on 
per-application basis.

 Make NameNode resilient to DoS attacks (malicious or otherwise)
 ---

 Key: HDFS-945
 URL: https://issues.apache.org/jira/browse/HDFS-945
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Arun C Murthy

 We've seen defective applications cause havoc on the NameNode, for e.g. by 
 doing 100k+ 'listStatus' on very large directories (60k files) etc.
 I'd like to start a discussion around how we prevent such, and possibly 
 malicious applications in the future, taking down the NameNode.
 Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-928) Ability to provide custom DatanodeProtocol implementation

2010-01-28 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805920#action_12805920
 ] 

Zlatin Balevsky commented on HDFS-928:
--

We may be talking about the same thing.  It will be easier to unit-test the 
DataNode class by providing mock implementations of the DatanodeProtocol and 
then registering expectations and whatnot.  But to plug in the mocked 
implementation you have to explicitly set the reference after the DataNode 
object is constructed.

It is cleaner to use the mocked implementation from the get-go as you avoid 
executing a lot of code and that will make your unit tests more focused and 
self-contained.  It will also make sure that any code inside the constructor is 
using the mocked object and you will be able to observe/test any interactions 
there.

 Ability to provide custom DatanodeProtocol implementation
 -

 Key: HDFS-928
 URL: https://issues.apache.org/jira/browse/HDFS-928
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: data-node
Reporter: Zlatin Balevsky
Priority: Trivial

 This should make testing easier as well as allow users to provide their own 
 RPC/namenode implementations.  It's pretty straightforward:
 1. add 
 interface DatanodeProtocolProvider {
   DatanodeProtocol getNameNode(Configuration conf);
 }
 2. add a config setting like dfs.datanode.protocol.impl
 3. create a default implementation and copy/paste the RPC initialization code 
 there

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-928) Ability to provide custom DatanodeProtocol implementation

2010-01-28 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12806012#action_12806012
 ] 

Zlatin Balevsky commented on HDFS-928:
--

From a puristic QE point of view, right now you cannot do any unit tests on 
the Datanode component because you inevitably end up testing the RPC code and 
the Namenode code.  What you have now are integration tests; you absolutely 
need to have those but they are not as helpful as unit tests to pinpoint 
problems.  For example right now a bug in the RPC code will cause the Datanode 
tests to fail.  If all you know is that a bunch of tests failed, it will take 
you less time to find the bug if there are fewer places to start look at.  

Going back to the reason for this wish, I don't want to reinvent dependency 
injection but the easier it is to swap things in and out, the easier it is to 
write tests and to develop.  More importantly, it makes it easier for third 
parties (i.e. myself) to modify the source code for their specific needs and 
the project as a whole only benefits from this.

 Ability to provide custom DatanodeProtocol implementation
 -

 Key: HDFS-928
 URL: https://issues.apache.org/jira/browse/HDFS-928
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: data-node
Reporter: Zlatin Balevsky
Priority: Trivial

 This should make testing easier as well as allow users to provide their own 
 RPC/namenode implementations.  It's pretty straightforward:
 1. add 
 interface DatanodeProtocolProvider {
   DatanodeProtocol getNameNode(Configuration conf);
 }
 2. add a config setting like dfs.datanode.protocol.impl
 3. create a default implementation and copy/paste the RPC initialization code 
 there

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-926) BufferedDFSInputStream

2010-01-26 Thread Zlatin Balevsky (JIRA)
BufferedDFSInputStream
--

 Key: HDFS-926
 URL: https://issues.apache.org/jira/browse/HDFS-926
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: hdfs client
Reporter: Zlatin Balevsky
Priority: Minor


Self-explanatory.  Buffer size can be provided in number of blocks.  Could be 
implemented trivially with heap storage and several BlockReaders or could have 
more advanced features like:

* logic to ensure that blocks are not pulled from the same Datanode(s).
* local filesystem store for buffered blocks
* adaptive parallelism 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-928) Ability to provide custom DatanodeProtocol implementation

2010-01-26 Thread Zlatin Balevsky (JIRA)
Ability to provide custom DatanodeProtocol implementation
-

 Key: HDFS-928
 URL: https://issues.apache.org/jira/browse/HDFS-928
 Project: Hadoop HDFS
  Issue Type: Wish
  Components: data-node
Reporter: Zlatin Balevsky
Priority: Trivial


This should make testing easier as well as allow users to provide their own 
RPC/namenode implementations.  It's pretty straightforward:

1. add 
interface DatanodeProtocolProvider {
  DatanodeProtocol getNameNode(Configuration conf);
}

2. add a config setting like dfs.datanode.protocol.impl

3. create a default implementation and copy/paste the RPC initialization code 
there

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-912) sed in build.xml fails

2010-01-22 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803962#action_12803962
 ] 

Zlatin Balevsky commented on HDFS-912:
--

It fails for me on Windows under eclipse.  Works fine under cygwin though.

  sed in build.xml fails
 ---

 Key: HDFS-912
 URL: https://issues.apache.org/jira/browse/HDFS-912
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: ant 1.7.1
 Solaris
Reporter: Allen Wittenauer
Assignee: Allen Wittenauer
Priority: Minor
 Attachments: HDFS-912.txt


 This is the HDFS version of HADOOP-6505.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-854) Datanode should scan devices in parallel to generate block report

2010-01-21 Thread Zlatin Balevsky (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803480#action_12803480
 ] 

Zlatin Balevsky commented on HDFS-854:
--

If it is not possible to move the i/o operations listFiles() and length() 
outside of the lock, it would make sense to set a flag that a block report is 
in progress so that the rest of the datanode doesn't just hang.  My 2c.


 Datanode should scan devices in parallel to generate block report
 -

 Key: HDFS-854
 URL: https://issues.apache.org/jira/browse/HDFS-854
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: dhruba borthakur

 A Datanode should scan its disk devices in parallel so that the time to 
 generate a block report is reduced. This will reduce the startup time of a 
 cluster.
 A datanode has 12 disk (each of 1 TB) to store HDFS blocks. There is a total 
 of 150K blocks on these 12 disks. It takes the datanode upto 20 minutes to 
 scan these devices to generate the first block report.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.