[jira] Commented: (HDFS-606) ConcurrentModificationException in invalidateCorruptReplicas()

2009-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754532#action_12754532
 ] 

Hudson commented on HDFS-606:
-

Integrated in Hadoop-Hdfs-trunk #81 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/81/])
. Fix ConcurrentModificationException in invalidateCorruptReplicas(). 
Contributed by Konstantin Shvachko.


> ConcurrentModificationException in invalidateCorruptReplicas()
> --
>
> Key: HDFS-606
> URL: https://issues.apache.org/jira/browse/HDFS-606
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: CMEinCorruptReplicas.patch
>
>
> {{BlockManager.invalidateCorruptReplicas()}} iterates over 
> DatanodeDescriptor-s while removing corrupt replicas from the descriptors. 
> This causes {{ConcurrentModificationException}} if there is more than one 
> replicas of the block. I ran into this exception debugging different 
> scenarios in append, but it should be fixed in the trunk too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-602) Atempt to make a directory under an existing file on DistributedFileSystem should throw an FileAlreadyExistsException instead of FileNotFoundException

2009-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754533#action_12754533
 ] 

Hudson commented on HDFS-602:
-

Integrated in Hadoop-Hdfs-trunk #81 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/81/])
. DistributedFileSystem mkdirs command throws FileAlreadyExistsException 
instead of FileNotFoundException. Contributed by Boris Shkolnik.


> Atempt to make a directory under an existing file on DistributedFileSystem 
> should throw an FileAlreadyExistsException instead of FileNotFoundException
> --
>
> Key: HDFS-602
> URL: https://issues.apache.org/jira/browse/HDFS-602
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client, name-node
>Reporter: Boris Shkolnik
>Assignee: Boris Shkolnik
> Fix For: 0.21.0
>
> Attachments: HDFS-602.patch
>
>
> Atempt to make a directory under an existing file on DistributedFileSystem 
> should throw an FileAlreadyExistsException instead of FileNotFoundException.
> Also we should unwrap this exception from RemoteException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-601) TestBlockReport should obtain data directories from MiniHDFSCluster

2009-09-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754575#action_12754575
 ] 

Konstantin Shvachko commented on HDFS-601:
--

+1

> TestBlockReport should obtain data directories from MiniHDFSCluster
> ---
>
> Key: HDFS-601
> URL: https://issues.apache.org/jira/browse/HDFS-601
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, Append Branch
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Boudnik
> Fix For: 0.21.0, Append Branch
>
> Attachments: HDFS-601.patch, HDFS-601.patch
>
>
> TestBlockReport relies on that "test.build.data" property is set in 
> configuration, which is not always correct, e.g. when you run test from 
> eclipse. It would be better to get data directories directly from the 
> mini-cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-601) TestBlockReport should obtain data directories from MiniHDFSCluster

2009-09-12 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-601:
-

   Resolution: Fixed
Fix Version/s: (was: Append Branch)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I just committed this to trunk. It will get into Append branch while merging.
Thank you Konstantin.

> TestBlockReport should obtain data directories from MiniHDFSCluster
> ---
>
> Key: HDFS-601
> URL: https://issues.apache.org/jira/browse/HDFS-601
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, Append Branch
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Boudnik
> Fix For: 0.21.0
>
> Attachments: HDFS-601.patch, HDFS-601.patch
>
>
> TestBlockReport relies on that "test.build.data" property is set in 
> configuration, which is not always correct, e.g. when you run test from 
> eclipse. It would be better to get data directories directly from the 
> mini-cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-614) TestDatanodeBlockScanner should data-node directories directly from MiniDFSCluster

2009-09-12 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754579#action_12754579
 ] 

Konstantin Shvachko commented on HDFS-614:
--

TestBalancer timed out. Unrelated to the patch.

> TestDatanodeBlockScanner should data-node directories directly from 
> MiniDFSCluster
> --
>
> Key: HDFS-614
> URL: https://issues.apache.org/jira/browse/HDFS-614
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: TestDNBlockScanner.patch
>
>
> TestDatanodeBlockScanner relies on that data-node directories are listed in 
> {{test.build.data}}, which is not true if the test run from eclipse. It shold 
> get the directories directly from {{MiniDFSCluster}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-614) TestDatanodeBlockScanner obtain should data-node directories directly from MiniDFSCluster

2009-09-12 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-614:
-

Summary: TestDatanodeBlockScanner obtain should data-node directories 
directly from MiniDFSCluster  (was: TestDatanodeBlockScanner should data-node 
directories directly from MiniDFSCluster)

> TestDatanodeBlockScanner obtain should data-node directories directly from 
> MiniDFSCluster
> -
>
> Key: HDFS-614
> URL: https://issues.apache.org/jira/browse/HDFS-614
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: TestDNBlockScanner.patch
>
>
> TestDatanodeBlockScanner relies on that data-node directories are listed in 
> {{test.build.data}}, which is not true if the test run from eclipse. It shold 
> get the directories directly from {{MiniDFSCluster}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-614) TestDatanodeBlockScanner obtain should data-node directories directly from MiniDFSCluster

2009-09-12 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated HDFS-614:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

I just committed this.

> TestDatanodeBlockScanner obtain should data-node directories directly from 
> MiniDFSCluster
> -
>
> Key: HDFS-614
> URL: https://issues.apache.org/jira/browse/HDFS-614
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: TestDNBlockScanner.patch
>
>
> TestDatanodeBlockScanner relies on that data-node directories are listed in 
> {{test.build.data}}, which is not true if the test run from eclipse. It shold 
> get the directories directly from {{MiniDFSCluster}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-612) FSDataset should not use org.mortbay.log.Log

2009-09-12 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-612:


Hadoop Flags: [Reviewed]
  Status: Patch Available  (was: Open)

> FSDataset should not use org.mortbay.log.Log
> 
>
> Key: HDFS-612
> URL: https://issues.apache.org/jira/browse/HDFS-612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.21.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.21.0
>
> Attachments: h612_20090911.patch, h612_20090911b.patch
>
>
> There are some codes in FSDataset using org.mortbay.log.Log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-601) TestBlockReport should obtain data directories from MiniHDFSCluster

2009-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754594#action_12754594
 ] 

Hudson commented on HDFS-601:
-

Integrated in Hadoop-Hdfs-trunk-Commit #30 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/30/])
. TestBlockReport obtains data directories directly from MiniHDFSCluster. 
Contributed by Konstantin Boudnik.


> TestBlockReport should obtain data directories from MiniHDFSCluster
> ---
>
> Key: HDFS-601
> URL: https://issues.apache.org/jira/browse/HDFS-601
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0, Append Branch
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Boudnik
> Fix For: 0.21.0
>
> Attachments: HDFS-601.patch, HDFS-601.patch
>
>
> TestBlockReport relies on that "test.build.data" property is set in 
> configuration, which is not always correct, e.g. when you run test from 
> eclipse. It would be better to get data directories directly from the 
> mini-cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-516) Low Latency distributed reads

2009-09-12 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-516:
---

Attachment: (was: hdfs-516-20090831.patch)

> Low Latency distributed reads
> -
>
> Key: HDFS-516
> URL: https://issues.apache.org/jira/browse/HDFS-516
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jay Booth
>Priority: Minor
> Attachments: hdfs-516-20090912.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-516) Low Latency distributed reads

2009-09-12 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-516:
---

Attachment: hdfs-516-20090912.patch

New patch:
* README file with instructions for eclipse and running with hadoop
* Javadoc and JUnit 4-style test cases, some new test cases
* Benchmarks for random reads, binary search, and streaming
* Illustrated 100% performance increase in streaming case, somehow, from 213 
seconds to 112 seconds to stream 1GB from a remote HDFS file.  Reproduced a 
couple times, using 16MB of cache with the lookahead mechanism.  I suspect it 
uses a lot more CPU than conventional streaming, but still, that's a lot faster.
* No longer requires any change to HDFS code, module is now entirely in contrib
* Cleans up file handles better
* Handles remote disconnect better from the client side

What are people's thoughts on getting this into 0.21?  It shows a lot of 
promise as far as performance but hasn't been tested on larger clusters, I'd be 
confident up to 200 nodes or so and then I'd start getting nervous.

Given that it lives entirely in contrib and needs to be actively configured to 
turn it on, could we include this for 0.21?  Anyone want to try running the 
benchmarks?

I'll run the benchmarks one last time tomorrow to sanity check the latest patch 
(changed a couple things since the last time I ran in a cluster), then maybe we 
could consider committing?

> Low Latency distributed reads
> -
>
> Key: HDFS-516
> URL: https://issues.apache.org/jira/browse/HDFS-516
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jay Booth
>    Priority: Minor
> Attachments: hdfs-516-20090912.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-516) Low Latency distributed reads

2009-09-12 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-516:
---

Attachment: (was: radfs.patch)

> Low Latency distributed reads
> -
>
> Key: HDFS-516
> URL: https://issues.apache.org/jira/browse/HDFS-516
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jay Booth
>Priority: Minor
> Attachments: hdfs-516-20090912.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-516) Low Latency distributed reads

2009-09-12 Thread Jay Booth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-516:
---

Attachment: (was: hdfs-516-20090824.patch)

> Low Latency distributed reads
> -
>
> Key: HDFS-516
> URL: https://issues.apache.org/jira/browse/HDFS-516
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jay Booth
>Priority: Minor
> Attachments: hdfs-516-20090912.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-614) TestDatanodeBlockScanner obtain should data-node directories directly from MiniDFSCluster

2009-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754601#action_12754601
 ] 

Hudson commented on HDFS-614:
-

Integrated in Hadoop-Hdfs-trunk-Commit #31 (See 
[http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/31/])
. TestDatanodeBlockScanner obtains data directories directly from 
MiniHDFSCluster. Contributed by Konstantin Shvachko.


> TestDatanodeBlockScanner obtain should data-node directories directly from 
> MiniDFSCluster
> -
>
> Key: HDFS-614
> URL: https://issues.apache.org/jira/browse/HDFS-614
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.21.0
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: 0.21.0
>
> Attachments: TestDNBlockScanner.patch
>
>
> TestDatanodeBlockScanner relies on that data-node directories are listed in 
> {{test.build.data}}, which is not true if the test run from eclipse. It shold 
> get the directories directly from {{MiniDFSCluster}}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-612) FSDataset should not use org.mortbay.log.Log

2009-09-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754618#action_12754618
 ] 

Hadoop QA commented on HDFS-612:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419353/h612_20090911b.patch
  against trunk revision 814221.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/23/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/23/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/23/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/23/console

This message is automatically generated.

> FSDataset should not use org.mortbay.log.Log
> 
>
> Key: HDFS-612
> URL: https://issues.apache.org/jira/browse/HDFS-612
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.21.0
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.21.0
>
> Attachments: h612_20090911.patch, h612_20090911b.patch
>
>
> There are some codes in FSDataset using org.mortbay.log.Log.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.