[jira] Commented: (HDFS-881) Refactor DataNode Packet header into DataTransferProtocol

2010-01-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797939#action_12797939
 ] 

Todd Lipcon commented on HDFS-881:
--

Hey Konstantin,

Sorry, didn't notice HDFS-608 before - I just did this as I was working on 
HDFS-877.

I can probably get a chance to run some benchmarks tomorrow. I don't anticipate 
this will be a problem, as it's adding only one method-call indirection per 
packet, and each packet contains a reasonably large amount of data. Compared to 
other operations like checksumming, this is just noise.

> Refactor DataNode Packet header into DataTransferProtocol
> -
>
> Key: HDFS-881
> URL: https://issues.apache.org/jira/browse/HDFS-881
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-881.txt
>
>
> The Packet Header format is used ad-hoc in various places. This JIRA is to 
> refactor it into a class inside DataTransferProtocol (like was done with 
> PipelineAck)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-880) TestNNLeaseRecovery fails on windows

2010-01-07 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-880:


Attachment: HDFS-880.patch

The updated version of Konstantin's patch does make sure that there's no 
unclosed instance of FSDirectory is laying around: I'm simply closing it before 
the change of the reference.

The reason why it has to be a mocked reference to FSDirectory is that mocked 
INodeFileUnderConstruction has to be returned for the purpose having properly 
set blocks and so on. 

In fact, with having _real_ FSDirectory's instance being properly closed the 
whole business of removing the {{NAME_DIR}} isn't needed any more, but I'll 
leave for a sanity sake.

> TestNNLeaseRecovery fails on windows
> 
>
> Key: HDFS-880
> URL: https://issues.apache.org/jira/browse/HDFS-880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: HDFS-880.patch, testNNLeaseRecovery.patch
>
>
> TestNNLeaseRecovery fails on windows trying to delete name-node storage 
> directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-881) Refactor DataNode Packet header into DataTransferProtocol

2010-01-07 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797916#action_12797916
 ] 

Konstantin Boudnik commented on HDFS-881:
-

Oh, great! You've gone a step further and did what I had started in HDFS-608 :-)

I think Druba's comment about performance measurement makes sense and it is 
something which needs to be tested.

> Refactor DataNode Packet header into DataTransferProtocol
> -
>
> Key: HDFS-881
> URL: https://issues.apache.org/jira/browse/HDFS-881
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-881.txt
>
>
> The Packet Header format is used ad-hoc in various places. This JIRA is to 
> refactor it into a class inside DataTransferProtocol (like was done with 
> PipelineAck)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-881) Refactor DataNode Packet header into DataTransferProtocol

2010-01-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797912#action_12797912
 ] 

Hadoop QA commented on HDFS-881:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429704/hdfs-881.txt
  against trunk revision 897068.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/88/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/88/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/88/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/88/console

This message is automatically generated.

> Refactor DataNode Packet header into DataTransferProtocol
> -
>
> Key: HDFS-881
> URL: https://issues.apache.org/jira/browse/HDFS-881
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-881.txt
>
>
> The Packet Header format is used ad-hoc in various places. This JIRA is to 
> refactor it into a class inside DataTransferProtocol (like was done with 
> PipelineAck)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-877) Client-driven block verification not functioning

2010-01-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797910#action_12797910
 ] 

Hadoop QA commented on HDFS-877:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429705/hdfs-877.txt
  against trunk revision 897068.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/174/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/174/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/174/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/174/console

This message is automatically generated.

> Client-driven block verification not functioning
> 
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-877.txt
>
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
> out). The issue is that DFSInputStream relies on readChunk being called one 
> last time at the end of the file in order to receive the 
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read 
> checks pos < getFileLength() before issuing the read. Thus gotEOS never 
> shifts to true and checksumOk() is never called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-734) TestDatanodeBlockScanner times out in branch 0.20

2010-01-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797885#action_12797885
 ] 

Todd Lipcon commented on HDFS-734:
--

Patch available on HDFS-877. Once that's committed there, I'll backport to 
branch-20.

> TestDatanodeBlockScanner times out in branch 0.20
> -
>
> Key: HDFS-734
> URL: https://issues.apache.org/jira/browse/HDFS-734
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Reporter: Hairong Kuang
>Priority: Blocker
> Fix For: 0.20.2
>
>
> When I test HDFS-723 on branch 0.20, TestDatanodeBlockScanner always times 
> out with or without my patch to HDFS-723.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-877) Client-driven block verification not functioning

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-877:
-

Component/s: test
 hdfs client

> Client-driven block verification not functioning
> 
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-877.txt
>
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
> out). The issue is that DFSInputStream relies on readChunk being called one 
> last time at the end of the file in order to receive the 
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read 
> checks pos < getFileLength() before issuing the read. Thus gotEOS never 
> shifts to true and checksumOk() is never called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-877) Client-driven block verification not functioning

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-877:
-

Status: Patch Available  (was: Open)

> Client-driven block verification not functioning
> 
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-877.txt
>
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
> out). The issue is that DFSInputStream relies on readChunk being called one 
> last time at the end of the file in order to receive the 
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read 
> checks pos < getFileLength() before issuing the read. Thus gotEOS never 
> shifts to true and checksumOk() is never called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-877) Client-driven block verification not functioning

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-877:
-

Attachment: hdfs-877.txt

I took the following route to fixing this:
- BlockReader now knows how many bytes it's expected to read.
- After it has read this many bytes, it reads ahead one packet, expecting an 
empty "end of stream" packet from the DN. In the case that this doesn't come 
(or it's not the special end of stream packet) it throws IOE. If it does get 
it, it sets gotEOS.

This current patch is a little ugly since it duplicates the 
packet-header-reading logic. I have another version that depends on HDFS-881 to 
clean that up, if 881 gets committed in the meantime.

> Client-driven block verification not functioning
> 
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
> Attachments: hdfs-877.txt
>
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
> out). The issue is that DFSInputStream relies on readChunk being called one 
> last time at the end of the file in order to receive the 
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read 
> checks pos < getFileLength() before issuing the read. Thus gotEOS never 
> shifts to true and checksumOk() is never called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-874) TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts

2010-01-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797881#action_12797881
 ] 

Todd Lipcon commented on HDFS-874:
--

Test failures are the bizarre new "ClassDefNotFound" problems seen on other 
recent hudson builds. The same tests passed locally for me.

> TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts
> ---
>
> Key: HDFS-874
> URL: https://issues.apache.org/jira/browse/HDFS-874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-874.txt
>
>
> On an internal build machine I see exceptions like this:
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:47262/data/1/scratch/patchqueue/patch-worker-20518/patch_21/svnrepo/build/test/data/test/test/testRenameWithQuota/srcdir,
>  expected: hdfs://localhost.localdomain:47262
> "hostname" and "hostname -f" both show the machine's FQDN (not localhost). 
> /etc/hosts is stock after CentOS 5 install. "host 127.0.0.1" reverses to 
> "localhost"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-874) TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts

2010-01-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797880#action_12797880
 ] 

Hadoop QA commented on HDFS-874:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429689/hdfs-874.txt
  against trunk revision 896735.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/173/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/173/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/173/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/173/console

This message is automatically generated.

> TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts
> ---
>
> Key: HDFS-874
> URL: https://issues.apache.org/jira/browse/HDFS-874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-874.txt
>
>
> On an internal build machine I see exceptions like this:
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:47262/data/1/scratch/patchqueue/patch-worker-20518/patch_21/svnrepo/build/test/data/test/test/testRenameWithQuota/srcdir,
>  expected: hdfs://localhost.localdomain:47262
> "hostname" and "hostname -f" both show the machine's FQDN (not localhost). 
> /etc/hosts is stock after CentOS 5 install. "host 127.0.0.1" reverses to 
> "localhost"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-881) Refactor DataNode Packet header into DataTransferProtocol

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-881:
-

Status: Patch Available  (was: Open)

> Refactor DataNode Packet header into DataTransferProtocol
> -
>
> Key: HDFS-881
> URL: https://issues.apache.org/jira/browse/HDFS-881
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-881.txt
>
>
> The Packet Header format is used ad-hoc in various places. This JIRA is to 
> refactor it into a class inside DataTransferProtocol (like was done with 
> PipelineAck)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-881) Refactor DataNode Packet header into DataTransferProtocol

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-881:
-

Attachment: hdfs-881.txt

This implements the proposed refactor.

Couple notes:
- I had to implement read/write fields both for OutputStream and for ByteBuffer 
in/out. It's a bit ugly, but I wasn't able to find a way of avoiding one or the 
other without adding extra double buffering or writing some kind of 
ByteBuffer<->DataOutput translation class in Common util. If we think this is 
necessary, let's do it in another JIRA.
- I changed the meaning of PKT_HEADER_LEN when moving it into the new class. It 
used to be that nearly every instance of its use also added SIZE_OF_INTEGER, so 
I just incorporated that into the constant to make the code simpler everywhere 
else.

> Refactor DataNode Packet header into DataTransferProtocol
> -
>
> Key: HDFS-881
> URL: https://issues.apache.org/jira/browse/HDFS-881
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-881.txt
>
>
> The Packet Header format is used ad-hoc in various places. This JIRA is to 
> refactor it into a class inside DataTransferProtocol (like was done with 
> PipelineAck)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-881) Refactor DataNode Packet header into DataTransferProtocol

2010-01-07 Thread Todd Lipcon (JIRA)
Refactor DataNode Packet header into DataTransferProtocol
-

 Key: HDFS-881
 URL: https://issues.apache.org/jira/browse/HDFS-881
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon


The Packet Header format is used ad-hoc in various places. This JIRA is to 
refactor it into a class inside DataTransferProtocol (like was done with 
PipelineAck)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-786) Implement getContentSummary(..) in HftpFileSystem

2010-01-07 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated HDFS-786:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Nicholas points out that hftpURI is used elsewhere, so the code isn't actually 
dead and restructuring {{TestListPathServlet}} is of questionable value.

I committed this. Thanks, Nicholas!

> Implement getContentSummary(..) in HftpFileSystem
> -
>
> Key: HDFS-786
> URL: https://issues.apache.org/jira/browse/HDFS-786
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.22.0
>
> Attachments: h786_20091223.patch, h786_20091224.patch, 
> h786_20100104.patch, h786_20100106.patch
>
>
> HftpFileSystem does not override getContentSummary(..).  As a result, it uses 
> FileSystem's default implementation, which computes content summary on the 
> client side by calling listStatus(..) recursively.  In contrast, 
> DistributedFileSystem has overridden getContentSummary(..) and does the 
> computation on the NameNode.
> As a result, running "fs -dus" on hftp is much slower than running it on hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-878) FileStatus should have a field "isUnderConstruction"

2010-01-07 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797848#action_12797848
 ] 

Hairong Kuang commented on HDFS-878:


For 1, as I commented on HDFS-879, I do not think HDFS-879 is a good idea.
For 2, it might be better to expose isUnderConstruction information through 
DFSOutputStream. Currently listPath is a very memory expensive operation for a 
large directory. It might be a good idea avoiding add a field to FileStatus.

> FileStatus should have a field "isUnderConstruction"
> 
>
> Key: HDFS-878
> URL: https://issues.apache.org/jira/browse/HDFS-878
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>
> Currently DFSClient has no way to know whether a file is under construction 
> or not, unless we open the file and get locatedBlocks (which is much more 
> costly).
> However, the namenode knows whether each INode is under construction or not.
> We should expose that information from NameNode.getListing(), to 
> DFSClient.listPaths(), to DistributedFileSystem.listStatus().
> We should also expose that information through DFSInputStream and 
> DFSDataInputStream if not there yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-879) FileStatus should have the visible length of the file

2010-01-07 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797845#action_12797845
 ] 

Hairong Kuang commented on HDFS-879:


I am not sure what's your definition of the latest file length. If you meant 
the visible length, I do not think this is a good idea because many of the 
cases getFileStatus does not care about the visible length if a file is under 
construction. If by default we force to fetch the visible length, it is going 
to double the overhead of the cost of getFileStatus for a file under 
construction. With the append design, we on purpose define the getFileStatus 
semantics to be there is no guarantee on the length of an  file under 
construction.

> FileStatus should have the visible length of the file
> -
>
> Key: HDFS-879
> URL: https://issues.apache.org/jira/browse/HDFS-879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>
> Currently, {{FileStatus}} returned by {{DistributedFileSystem.listStatus()}} 
> (which goes through {{DFSClient.listPath()}} then {{NameNode.getListing()}}) 
> does not have the latest file length, if the file is still open for write.
> We should make changes in {{DFSClient.listPath()}} to override the length of 
> the file, if the file is under construction.
> This depends on adding a {{isUnderConstruction}} field in {{FileStatus}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-07 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797844#action_12797844
 ] 

Hudson commented on HDFS-755:
-

Integrated in Hadoop-Hdfs-trunk-Commit #161 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/161/])
. Read multiple checksum chunks at once in DFSInputStream. Contributed by 
Todd Lipcon.


> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-145) FSNameSystem#addStoredBlock does not handle inconsistent block length correctly

2010-01-07 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-145:
---

Attachment: corruptionDetect.patch

This patch removes all of the corrupt replica detection from addStoredBlock and 
moves the disk space consumption code to commitBlock.

> FSNameSystem#addStoredBlock does not handle inconsistent block length 
> correctly
> ---
>
> Key: HDFS-145
> URL: https://issues.apache.org/jira/browse/HDFS-145
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.21.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.21.0, 0.22.0
>
> Attachments: corruptionDetect.patch, inconsistentLen.patch, 
> inconsistentLen1.patch, inconsistentLen2.patch
>
>
> Currently NameNode treats either the new replica or existing replicas as 
> corrupt if the new replica's length is inconsistent with NN recorded block 
> length. The correct behavior should be
> 1. For a block that is not under construction, the new replica should be 
> marked as corrupt if its length is inconsistent (no matter shorter or longer) 
> with the NN recorded block length;
> 2. For an under construction block, if the new replica's length is shorter 
> than the NN recorded block length, the new replica could be marked as 
> corrupt; if the new replica's length is longer, NN should update its recorded 
> block length. But it should not mark existing replicas as corrupt. This is 
> because NN recorded length for an under construction block does not 
> accurately match the block length on datanode disk. NN should not judge an 
> under construction replica to be corrupt by looking at the inaccurate 
> information:  its recorded block length.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-755) Read multiple checksum chunks at once in DFSInputStream

2010-01-07 Thread Tom White (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tom White updated HDFS-755:
---

   Resolution: Fixed
Fix Version/s: 0.22.0
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Todd!

> Read multiple checksum chunks at once in DFSInputStream
> ---
>
> Key: HDFS-755
> URL: https://issues.apache.org/jira/browse/HDFS-755
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.22.0
>
> Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, 
> hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-874) TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-874:
-

Status: Patch Available  (was: Open)

> TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts
> ---
>
> Key: HDFS-874
> URL: https://issues.apache.org/jira/browse/HDFS-874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-874.txt
>
>
> On an internal build machine I see exceptions like this:
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:47262/data/1/scratch/patchqueue/patch-worker-20518/patch_21/svnrepo/build/test/data/test/test/testRenameWithQuota/srcdir,
>  expected: hdfs://localhost.localdomain:47262
> "hostname" and "hostname -f" both show the machine's FQDN (not localhost). 
> /etc/hosts is stock after CentOS 5 install. "host 127.0.0.1" reverses to 
> "localhost"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-874) TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-874:
-

Attachment: hdfs-874.txt

> TestHDFSFileContextMainOperations fails on weirdly configured DNS hosts
> ---
>
> Key: HDFS-874
> URL: https://issues.apache.org/jira/browse/HDFS-874
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, test
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-874.txt
>
>
> On an internal build machine I see exceptions like this:
> java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://localhost:47262/data/1/scratch/patchqueue/patch-worker-20518/patch_21/svnrepo/build/test/data/test/test/testRenameWithQuota/srcdir,
>  expected: hdfs://localhost.localdomain:47262
> "hostname" and "hostname -f" both show the machine's FQDN (not localhost). 
> /etc/hosts is stock after CentOS 5 install. "host 127.0.0.1" reverses to 
> "localhost"

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-245) Create symbolic links in HDFS

2010-01-07 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-245:
-

Attachment: symlink30-hdfs.patch

Attached a minor update to the last patch to address javadoc and checkstyle.

> Create symbolic links in HDFS
> -
>
> Key: HDFS-245
> URL: https://issues.apache.org/jira/browse/HDFS-245
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dhruba borthakur
>Assignee: Eli Collins
> Attachments: 4044_20081030spi.java, designdocv1.txt, designdocv2.txt, 
> designdocv3.txt, HADOOP-4044-strawman.patch, symlink-0.20.0.patch, 
> symlink-25-hdfs.patch, symlink-26-hdfs.patch, symlink-26-hdfs.patch, 
> symLink1.patch, symLink1.patch, symLink11.patch, symLink12.patch, 
> symLink13.patch, symLink14.patch, symLink15.txt, symLink15.txt, 
> symlink16-common.patch, symlink16-hdfs.patch, symlink16-mr.patch, 
> symlink17-common.txt, symlink17-hdfs.txt, symlink18-common.txt, 
> symlink19-common-delta.patch, symlink19-common.txt, symlink19-common.txt, 
> symlink19-hdfs-delta.patch, symlink19-hdfs.txt, symlink20-common.patch, 
> symlink20-hdfs.patch, symlink21-common.patch, symlink21-hdfs.patch, 
> symlink22-common.patch, symlink22-hdfs.patch, symlink23-common.patch, 
> symlink23-hdfs.patch, symlink24-hdfs.patch, symlink27-hdfs.patch, 
> symlink28-hdfs.patch, symlink29-hdfs.patch, symlink29-hdfs.patch, 
> symlink30-hdfs.patch, symLink4.patch, symLink5.patch, symLink6.patch, 
> symLink8.patch, symLink9.pat
 ch
>
>
> HDFS should support symbolic links. A symbolic link is a special type of file 
> that contains a reference to another file or directory in the form of an 
> absolute or relative path and that affects pathname resolution. Programs 
> which read or write to files named by a symbolic link will behave as if 
> operating directly on the target file. However, archiving utilities can 
> handle symbolic links specially and manipulate them directly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-878) FileStatus should have a field "isUnderConstruction"

2010-01-07 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797819#action_12797819
 ] 

Zheng Shao commented on HDFS-878:
-

There are 2 use cases:

1. {{DistributedFileSystem.listStatus()}} will be able to know if a file is 
under construction or not, and if it is, {{DistributedFileSystem}} should try 
to connect to the data node to fetch the latest length of the file. This helps 
HDFS-879.

2. A continuous file copier copies a file from one location to another 
continuously before the file is fully completed. With this information, the 
copier can know whether it has fully copied the file.


> FileStatus should have a field "isUnderConstruction"
> 
>
> Key: HDFS-878
> URL: https://issues.apache.org/jira/browse/HDFS-878
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>
> Currently DFSClient has no way to know whether a file is under construction 
> or not, unless we open the file and get locatedBlocks (which is much more 
> costly).
> However, the namenode knows whether each INode is under construction or not.
> We should expose that information from NameNode.getListing(), to 
> DFSClient.listPaths(), to DistributedFileSystem.listStatus().
> We should also expose that information through DFSInputStream and 
> DFSDataInputStream if not there yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-880) TestNNLeaseRecovery fails on windows

2010-01-07 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797806#action_12797806
 ] 

Konstantin Shvachko commented on HDFS-880:
--

Looks like the problem is in {{TestNNLeaseRecovery.mockFileBlocks()}}. 
FSNamesystem constructor creates FSImage, which opens image and edits files. 
Then {{mockFileBlocks()}} creates a mocked {{FSDirectory}}, where FSImage is 
null, and assigns it to {{FSNamesystem.dir}}. Now when FSNamesystem closes it 
will keep the original files open since it does not have access to them (the 
original FSImage was replaced by null).
This does not effect the test in Unix, because it lets remove opened files. 
NTFS does not allow it.

> TestNNLeaseRecovery fails on windows
> 
>
> Key: HDFS-880
> URL: https://issues.apache.org/jira/browse/HDFS-880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: testNNLeaseRecovery.patch
>
>
> TestNNLeaseRecovery fails on windows trying to delete name-node storage 
> directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-877) Client-driven block verification not functioning

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-877:
-

Priority: Blocker  (was: Critical)

> Client-driven block verification not functioning
> 
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Blocker
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
> out). The issue is that DFSInputStream relies on readChunk being called one 
> last time at the end of the file in order to receive the 
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read 
> checks pos < getFileLength() before issuing the read. Thus gotEOS never 
> shifts to true and checksumOk() is never called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-880) TestNNLeaseRecovery fails on windows

2010-01-07 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-880:
-

Attachment: testNNLeaseRecovery.patch

The attached patch is not a fix for the problem. But it helps to see where the 
problem is.
- In the patch I set explicitly name-node storage directories. In the original 
code it is supposed that directories are in /tmp/hadoop-shv/dfs, but if you run 
it in eclipse it may not be (and usually is not) the case.
- There is no need to separately close the FSImage and then FSNamesystem. 
FSNamesystem.close() will also close FSImage.
- Use FSUtil.fullyDelete() it is standard.


> TestNNLeaseRecovery fails on windows
> 
>
> Key: HDFS-880
> URL: https://issues.apache.org/jira/browse/HDFS-880
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: testNNLeaseRecovery.patch
>
>
> TestNNLeaseRecovery fails on windows trying to delete name-node storage 
> directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-877) Client-driven block verification not functioning

2010-01-07 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-877:
-

Description: 
This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing out). 
The issue is that DFSInputStream relies on readChunk being called one last time 
at the end of the file in order to receive the lastPacketInBlock=true packet 
from the DN. However, DFSInputStream.read checks pos < getFileLength() before 
issuing the read. Thus gotEOS never shifts to true and checksumOk() is never 
called.



  was:This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
out). The issue is that DFSInputStream relies on readChunk being called one 
last time at the end of the file in order to receive the lastPacketInBlock=true 
packet from the DN. However, DFSInputStream.read checks pos < getFileLength() 
before issuing the read. Thus gotEOS never shifts to true and checksumOk() is 
never called.

   Priority: Critical  (was: Major)
Summary: Client-driven block verification not functioning  (was: 
Client-driven checksum verification not functioning)

Upgrading this to blocker since it's a cause for a test failure.

Worth noting that this bug only affects the proactive "checksum OK" marking 
that the client does after reading an entire block (thus avoiding the periodic 
scan on the DN). If the checksum is found to be invalid on the client, it still 
reports the bad block to the NN just fine. So, this isn't a dataloss bug, it's 
just a broken optimization and a failing test.

> Client-driven block verification not functioning
> 
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
> out). The issue is that DFSInputStream relies on readChunk being called one 
> last time at the end of the file in order to receive the 
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read 
> checks pos < getFileLength() before issuing the read. Thus gotEOS never 
> shifts to true and checksumOk() is never called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-880) TestNNLeaseRecovery fails on windows

2010-01-07 Thread Konstantin Shvachko (JIRA)
TestNNLeaseRecovery fails on windows


 Key: HDFS-880
 URL: https://issues.apache.org/jira/browse/HDFS-880
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.21.0
Reporter: Konstantin Shvachko
 Fix For: 0.21.0


TestNNLeaseRecovery fails on windows trying to delete name-node storage 
directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-877) Client-driven checksum verification not functioning

2010-01-07 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797796#action_12797796
 ] 

Todd Lipcon commented on HDFS-877:
--

This may turn out to be reasonably tricky to solve. The issue is that the 
packet with lastPacketInBlock=true comes in an empty packet after the data has 
been read. Consider the following scenario:

# Block is exactly N bytes
# Client determines (or knows) the file length and thus reads exactly up to 
byte N, but not past. This is the case for MapReduce jobs when an inputsplit 
doesn't cross block boundaries (eg any input file <1block)
# In this case, the server will still send the empty "lastPacketInBlock" 
packet, but the client will never read it (since it doesn't read ahead in any 
way)

Point 2 above is currently being enforced by DFSInputStream, since it calls 
getFileLength() before passing a read() call down into the BlockReader.

A couple things to investigate:
# Is the check currently done by DFSInputStream important for limiting the 
length visible to a reader for an in-progress block? Or can that limit be 
satisfied by passing only the visible length to the OP_READ_BLOCK call? If the 
length limitation can be ignored in the DFSInputStream layer, I think that 
would solve the issue fairly trivially.
# Alternatively, can we invert BlockReader.readChunk so that it reads ahead a 
packet? That is to say, if after a read, the internal buffer is emptied, can we 
read the *next* packet at this point? I don't really like this solution...

> Client-driven checksum verification not functioning
> ---
>
> Key: HDFS-877
> URL: https://issues.apache.org/jira/browse/HDFS-877
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.21.0, 0.22.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> This is actually the reason for HDFS-734 (TestDatanodeBlockScanner timing 
> out). The issue is that DFSInputStream relies on readChunk being called one 
> last time at the end of the file in order to receive the 
> lastPacketInBlock=true packet from the DN. However, DFSInputStream.read 
> checks pos < getFileLength() before issuing the read. Thus gotEOS never 
> shifts to true and checksumOk() is never called.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-878) FileStatus should have a field "isUnderConstruction"

2010-01-07 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HDFS-878:


Description: 
Currently DFSClient has no way to know whether a file is under construction or 
not, unless we open the file and get locatedBlocks (which is much more costly).
However, the namenode knows whether each INode is under construction or not.

We should expose that information from NameNode.getListing(), to 
DFSClient.listPaths(), to DistributedFileSystem.listStatus().

We should also expose that information through DFSInputStream and 
DFSDataInputStream if not there yet.


  was:
Currently DFSClient has no way to know whether a file is under construction or 
not, unless we open the file and get locatedBlocks (which is much more costly).
However, the namenode knows whether each INode is under construction or not.

We should expose that information from NameNode.getListing(), to 
DFSClient.listPaths(), to DistributedFileSystem.listStatus().



> FileStatus should have a field "isUnderConstruction"
> 
>
> Key: HDFS-878
> URL: https://issues.apache.org/jira/browse/HDFS-878
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>
> Currently DFSClient has no way to know whether a file is under construction 
> or not, unless we open the file and get locatedBlocks (which is much more 
> costly).
> However, the namenode knows whether each INode is under construction or not.
> We should expose that information from NameNode.getListing(), to 
> DFSClient.listPaths(), to DistributedFileSystem.listStatus().
> We should also expose that information through DFSInputStream and 
> DFSDataInputStream if not there yet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-879) FileStatus should have the visible length of the file

2010-01-07 Thread Zheng Shao (JIRA)
FileStatus should have the visible length of the file
-

 Key: HDFS-879
 URL: https://issues.apache.org/jira/browse/HDFS-879
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Zheng Shao
Assignee: Zheng Shao


Currently, {{FileStatus}} returned by {{DistributedFileSystem.listStatus()}} 
(which goes through {{DFSClient.listPath()}} then {{NameNode.getListing()}}) 
does not have the latest file length, if the file is still open for write.

We should make changes in {{DFSClient.listPath()}} to override the length of 
the file, if the file is under construction.

This depends on adding a {{isUnderConstruction}} field in {{FileStatus}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-878) FileStatus should have a field "isUnderConstruction"

2010-01-07 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797761#action_12797761
 ] 

Eli Collins commented on HDFS-878:
--

Why does the client need to know whether a file is under construction? Not sure 
FileStatus would be the right place to expose that type of information. 

> FileStatus should have a field "isUnderConstruction"
> 
>
> Key: HDFS-878
> URL: https://issues.apache.org/jira/browse/HDFS-878
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Zheng Shao
>Assignee: Zheng Shao
>
> Currently DFSClient has no way to know whether a file is under construction 
> or not, unless we open the file and get locatedBlocks (which is much more 
> costly).
> However, the namenode knows whether each INode is under construction or not.
> We should expose that information from NameNode.getListing(), to 
> DFSClient.listPaths(), to DistributedFileSystem.listStatus().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-878) FileStatus should have a field "isUnderConstruction"

2010-01-07 Thread Zheng Shao (JIRA)
FileStatus should have a field "isUnderConstruction"


 Key: HDFS-878
 URL: https://issues.apache.org/jira/browse/HDFS-878
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Zheng Shao
Assignee: Zheng Shao


Currently DFSClient has no way to know whether a file is under construction or 
not, unless we open the file and get locatedBlocks (which is much more costly).
However, the namenode knows whether each INode is under construction or not.

We should expose that information from NameNode.getListing(), to 
DFSClient.listPaths(), to DistributedFileSystem.listStatus().


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers

2010-01-07 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE resolved HDFS-200.
-

Resolution: Duplicate

This duplicates HDFS-659.

> In HDFS, sync() not yet guarantees data available to the new readers
> 
>
> Key: HDFS-200
> URL: https://issues.apache.org/jira/browse/HDFS-200
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: dhruba borthakur
>Priority: Blocker
> Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders11_20.txt, fsyncConcurrentReaders12_20.txt, 
> fsyncConcurrentReaders13_20.txt, fsyncConcurrentReaders14_20.txt, 
> fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
> fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, 
> fsyncConcurrentReaders9.patch, 
> hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, 
> hdfs-200-ryan-existing-file-fail.txt, hypertable-namenode.log.gz, 
> namenode.log, namenode.log, Reader.java, Reader.java, reopen_test.sh, 
> ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-870) Topology is permanently cached

2010-01-07 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797681#action_12797681
 ] 

Allen Wittenauer commented on HDFS-870:
---

No, but there should be a way to flush it when it does other than restart (== 
grid down time) when it does, esp. considering we can dynamically add/remove 
nodes (to some degree).

> Topology is permanently cached
> --
>
> Key: HDFS-870
> URL: https://issues.apache.org/jira/browse/HDFS-870
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Allen Wittenauer
>
> Replacing the topology script requires a namenode bounce because the NN 
> caches the information permanently.  It should really either expire it 
> periodically or expire on -refreshNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-870) Topology is permanently cached

2010-01-07 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797641#action_12797641
 ] 

dhruba borthakur commented on HDFS-870:
---

Does your network topology change so frequently?


> Topology is permanently cached
> --
>
> Key: HDFS-870
> URL: https://issues.apache.org/jira/browse/HDFS-870
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Allen Wittenauer
>
> Replacing the topology script requires a namenode bounce because the NN 
> caches the information permanently.  It should really either expire it 
> periodically or expire on -refreshNodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.