[jira] Updated: (HDFS-245) Create symbolic links in HDFS

2010-01-31 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-245:
-

Attachment: symlink35-hdfs.patch

Hey Sanjay,

Thanks for the review. Latest patch attached, addresses your feedback and 
merges in trunk.

bq. File Jira to resolve dot relative and volume-root relative names locally in 
HDFS (this Jira can be completed later)

Filed HDFS-932.

bq. explicitly declare the unresolvedLinkException  in methods that throw them- 
we are moving towards declaring all exceptions (see HDFS-717). ClientProtocol, 
Namenode, FSNamesystem, etc

Done. I had to modify quite a few functions both in hdfs and common since the 
UnresolvedLinkException flows all the way up to FileContext. I left the tests 
as is, I also didn't modify FileSystem.

bq. UnresolvedPathException, suggestion: adding a method called 
getUnresolvedPath for completeness.

Done.

bq. createSymlinkXXX -don't pass in permission object as parameter - it  
suggests that permissions matter for symlinks

Done. I left the permissions as default since INodes do not support null 
PermissionStatus or PermissionStatus' created with a null FsPermission. 
Currently links are rwx everyone so access is checked on the link target. And 
like files accessed to the links themselves should be subject to the containing 
directory's permissions. I started to modify TestDFSPermission to cover this, 
which requires converting the test to use FileContext, and just converting it 
to use FileContext caused some of the tests to fail, which I'm looking into, if 
you don't mind will address in a follow-on jira. I filed HDFS-876 for porting 
tests over to FileContext a while back and HDFS-934 to extend those for 
additional coverage after they've been ported. 

bq. FSDirectory#addToParent - if not a symlink, then pass null instead of a 
null string.

The symlink value needs to be a string since we serialize it to image/edit log, 
ie readString and writeString don't support nulls.

bq. getExistingPathINodes() typo: javadoc for param resolveLink is missing the 
param name

Fixed.

bq. Add comment that exception is thrown for symlinks occurs in the middle of 
the path.

Done.

bq. The multiple if statements with same or negation of condition - can be 
confusing and error prone in future

I rewrote this to use just two if that read like the logic: If the current 
node is a symlink then throw an UnresolvedPathException, unconditionally if we 
are not on the last path component, and conditionally if we are on the last 
path component and we want to resolve it. Think  this is more clear.

{code}
if (curNode.isLink()  (!lastComp || (lastComp  resolveLink))) {
  throw new UnresolvedPathException(...)
}
if (lastComp || !curNode.isDirectory()) {
  break;
}
{code}

bq. FSDirectory#getFileBlocks add: if (symlink) return null.

Fixed. Shouldn't be executed in practice since we always complete a resolved 
path.

bq. FSDirectory#getListing() calls getNode(srcs, false) // shouldn't 
resolveLink be true

Fixed. FileContext#listStatus was not resolving links, fixed that. I added 
FileContextSymlinkBaseTest#testListStatusUsingLink to test this case.

bq.  FSDirectory#getPreferredBlockSize calls getNode(filename, false ) // 
shouldn't resolveLink be true

Fixed. Looks like DFSClient#getBlockSize is the only caller and isn't covered 
by any unit tests (you can remove the method and compile hdfs). Filed HDFS-929 
to either remove it or add a test.

I replaced the throw of UnresolvedPathException in FSNameSystem, it wasn't 
necessary as the call to getFileINode resolves the links. There's now only a 
single place we throw UnresolvedPathException which was my original intent.

Latest patch attached here and to HADOOP-6421.

Thanks,
Eli 

 Create symbolic links in HDFS
 -

 Key: HDFS-245
 URL: https://issues.apache.org/jira/browse/HDFS-245
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dhruba borthakur
Assignee: Eli Collins
 Attachments: 4044_20081030spi.java, designdocv1.txt, designdocv2.txt, 
 designdocv3.txt, HADOOP-4044-strawman.patch, symlink-0.20.0.patch, 
 symlink-25-hdfs.patch, symlink-26-hdfs.patch, symlink-26-hdfs.patch, 
 symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, 
 symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, 
 symlink16-common.patch, symlink16-hdfs.patch, symlink16-mr.patch, 
 symlink17-common.txt, symlink17-hdfs.txt, symlink18-common.txt, 
 symlink19-common-delta.patch, symlink19-common.txt, symlink19-common.txt, 
 symlink19-hdfs-delta.patch, symlink19-hdfs.txt, symlink20-common.patch, 
 symlink20-hdfs.patch, symlink21-common.patch, symlink21-hdfs.patch, 
 symlink22-common.patch, symlink22-hdfs.patch, symlink23-common.patch, 
 symlink23-hdfs.patch, symlink24-hdfs.patch, symlink27-hdfs.patch, 
 

[jira] Updated: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-01-31 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-918:
---

Attachment: hdfs-918-20100201.patch

New patch.. 

* new configuration params:  dfs.datanode.multiplexBlockSender=true, 
dfs.datanode.multiplex.packetSize=32k, dfs.datanode.multiplex.numWorkers=3

* Packet size is tuneable, possibly allowing better performance with larger TCP 
buffers enabled

* Workers only wake up when a connection is writable

* 3 new class files, minor changes to DataXceiverServer and DataXceiver, 2 
utility classes added to DataTransferProtocol (one stolen from HDFS-881)

* Passes tests from earlier comment  plus a new one for files with lengths that 
don't match up to checksum chunk size, as well as holding up to some load on 
TestDFSIO

* Still fails all tests relying on SimulatedFSDataset

* Has a large amount of TRACE level debugging going on in 
MultiplexedBlockSender in case anybody wants to watch the output

* Adds dependencies for commons-pool and commons-math (for benchmarking code)

* Doesn't yet have benchmarks, but those should be easy now that the 
configuration is all in place

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

2010-01-31 Thread Jay Booth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828009#action_12828009
 ] 

Jay Booth commented on HDFS-918:


I haven't had a chance to run benchmarks yet, but I think that under lots of 
connections, the thread-per-connection model will spend more time swapping 
compared to getting work done, plus it has a few places where they hot block 
by doing while (buff.hasRemaining()) { write() }.  Only selecting the currently 
writeable connections and scheduling them sidesteps both issues while being 
less of a resource footprint - assuming it delivers on the performance.  As 
soon as I get a chance, I'll write some benchmarks.

If anyone wants to take a look at the code in the meantime, I think this patch 
is pretty easy to set up  -- just enable MultiplexBlockSender.LOG for TRACE and 
run tests, and you can see how each packet is built and sent.  'ant compile 
eclipse-files' will set up the extra dependencies on commons-pool and 
commons-math.

 Use single Selector and small thread pool to replace many instances of 
 BlockSender for reads
 

 Key: HDFS-918
 URL: https://issues.apache.org/jira/browse/HDFS-918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Jay Booth
 Fix For: 0.22.0

 Attachments: hdfs-918-20100201.patch, hdfs-multiplex.patch


 Currently, on read requests, the DataXCeiver server allocates a new thread 
 per request, which must allocate its own buffers and leads to 
 higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
 single selector and a small threadpool to multiplex request packets, we could 
 theoretically achieve higher performance while taking up fewer resources and 
 leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
 can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.