[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-27 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Status: Open  (was: Patch Available)

 Separate Storage from FSImage
 -

 Key: HDFS-1557
 URL: https://issues.apache.org/jira/browse/HDFS-1557
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.21.0
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
 HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
 HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff


 FSImage currently derives from Storage and FSEditLog has to call methods 
 directly on FSImage to access the filesystem. This JIRA is to separate the 
 Storage class out into NNStorage so that FSEditLog is less dependent on 
 FSImage. From this point, the other parts of the circular dependency should 
 be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-27 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Attachment: HDFS-1557.diff

 Separate Storage from FSImage
 -

 Key: HDFS-1557
 URL: https://issues.apache.org/jira/browse/HDFS-1557
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.21.0
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
 HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
 HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff


 FSImage currently derives from Storage and FSEditLog has to call methods 
 directly on FSImage to access the filesystem. This JIRA is to separate the 
 Storage class out into NNStorage so that FSEditLog is less dependent on 
 FSImage. From this point, the other parts of the circular dependency should 
 be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-27 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Status: Patch Available  (was: Open)

Addressed Suresh's comments.
{quote}
NNStorage use synchronized method errorDirectory() to notify listeners of 
error. The listener implement synchrnonized method to handle the error. Is it 
possible for listeners (say FSImage) to be calling from its synchronized 
section, a synchronized method on NNStorage? This could cause dead locks.
{quote}
NNStorage now using CopyOnWriteArrayLists now, so errorDirectory is no longer 
synchronised. (see below)

{quote}
format(), registerListener() should this be synchronized as it manipulates 
listeners?
{quote}
listeners is now a CopyOnWriteArrayList.

{quote}
Storage#storageDirs are manipulated in NNStorage and Storage. The way it is 
done is not thread safe. Perhaps the existing code is thread safe it self. This 
could be addressed in a separate jira.
{quote}
Well spotted. I dont think any of this is threadsafe, given that storageDirs is 
modified in numerous places, and is being constantly being iterated over which 
could trigger a concurrent modification exception.

I've made storageDirs and removedStorageDirs a CopyOnWriteArrayList now.

{quote}
Consider making the following method package private: isPreUpgradableLayout(), 
setRestoreFailedStorage() (both variants), attemptResotreRemovedStorage()... 
The are other methods that could only be used with in the package. This makes 
sure this is not a class for outside consumption. I would further consider 
making NNStorage non public class.
{quote}
Tightened up all the access privileges which I could on that class now to 
package private. Unfortunately, NNStorage itself must remain public because of 
UpgradeUtilities in testing being in a different package.


 Separate Storage from FSImage
 -

 Key: HDFS-1557
 URL: https://issues.apache.org/jira/browse/HDFS-1557
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.21.0
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
 HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
 HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff


 FSImage currently derives from Storage and FSEditLog has to call methods 
 directly on FSImage to access the filesystem. This JIRA is to separate the 
 Storage class out into NNStorage so that FSEditLog is less dependent on 
 FSImage. From this point, the other parts of the circular dependency should 
 be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HDFS-1596) Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml

2011-01-27 Thread Harsh J Chouraria (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J Chouraria reassigned HDFS-1596:
---

Assignee: Harsh J Chouraria

 Move secondary namenode checkpoint configs from core-default.xml to 
 hdfs-default.xml
 

 Key: HDFS-1596
 URL: https://issues.apache.org/jira/browse/HDFS-1596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Patrick Angeles
Assignee: Harsh J Chouraria
 Attachments: HDFS-7117.r1.diff


 The following configs are in core-default.xml, but are really read by the 
 Secondary Namenode. These should be moved to hdfs-default.xml for consistency.
 property
   namefs.checkpoint.dir/name
   value${hadoop.tmp.dir}/dfs/namesecondary/value
   descriptionDetermines where on the local filesystem the DFS secondary
   name node should store the temporary images to merge.
   If this is a comma-delimited list of directories then the image is
   replicated in all of the directories for redundancy.
   /description
 /property
 property
   namefs.checkpoint.edits.dir/name
   value${fs.checkpoint.dir}/value
   descriptionDetermines where on the local filesystem the DFS secondary
   name node should store the temporary edits to merge.
   If this is a comma-delimited list of directoires then teh edits is
   replicated in all of the directoires for redundancy.
   Default value is same as fs.checkpoint.dir
   /description
 /property
 property
   namefs.checkpoint.period/name
   value3600/value
   descriptionThe number of seconds between two periodic checkpoints.
   /description
 /property
 property
   namefs.checkpoint.size/name
   value67108864/value
   descriptionThe size of the current edit log (in bytes) that triggers
a periodic checkpoint even if the fs.checkpoint.period hasn't expired.
   /description
 /property

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1596) Move secondary namenode checkpoint configs from core-default.xml to hdfs-default.xml

2011-01-27 Thread Harsh J Chouraria (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J Chouraria updated HDFS-1596:


Attachment: HDFS-7117.r1.diff

Patch that updates all references of fs.checkpoint.* to their newer 
dfs.namenode.checkpoint.* keys.

 Move secondary namenode checkpoint configs from core-default.xml to 
 hdfs-default.xml
 

 Key: HDFS-1596
 URL: https://issues.apache.org/jira/browse/HDFS-1596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Patrick Angeles
 Attachments: HDFS-7117.r1.diff


 The following configs are in core-default.xml, but are really read by the 
 Secondary Namenode. These should be moved to hdfs-default.xml for consistency.
 property
   namefs.checkpoint.dir/name
   value${hadoop.tmp.dir}/dfs/namesecondary/value
   descriptionDetermines where on the local filesystem the DFS secondary
   name node should store the temporary images to merge.
   If this is a comma-delimited list of directories then the image is
   replicated in all of the directories for redundancy.
   /description
 /property
 property
   namefs.checkpoint.edits.dir/name
   value${fs.checkpoint.dir}/value
   descriptionDetermines where on the local filesystem the DFS secondary
   name node should store the temporary edits to merge.
   If this is a comma-delimited list of directoires then teh edits is
   replicated in all of the directoires for redundancy.
   Default value is same as fs.checkpoint.dir
   /description
 /property
 property
   namefs.checkpoint.period/name
   value3600/value
   descriptionThe number of seconds between two periodic checkpoints.
   /description
 /property
 property
   namefs.checkpoint.size/name
   value67108864/value
   descriptionThe size of the current edit log (in bytes) that triggers
a periodic checkpoint even if the fs.checkpoint.period hasn't expired.
   /description
 /property

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1600) editsStored.xml cause release audit warning

2011-01-27 Thread Erik Steffl (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987759#action_12987759
 ] 

Erik Steffl commented on HDFS-1600:
---

As explained in HDFS-1448 (test-patch comment) editsStored.xml should not be 
changed, it is a reference file for tests (test results are compared to this 
file). I did not know where to add it for it to be ignored (asked around but it 
seemed that the test-patch warning can be just ignored, there are other files 
that also miss the licence).

 editsStored.xml cause release audit warning
 ---

 Key: HDFS-1600
 URL: https://issues.apache.org/jira/browse/HDFS-1600
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build, test
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Erik Steffl
 Attachments: h1600_20110126.patch


 The file 
 {{src/test/hdfs/org/apache/hadoop/hdfs/tools/offlineEditsViewer/editsStored.xml}}
  for any new patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1335) HDFS side of HADOOP-6904: first step towards inter-version communications between dfs client and NameNode

2011-01-27 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1335:


Attachment: hdfsRPC.patch

Here is the patch that makes HDFS to work with new method-based RPC 
compatibility protocol.

 HDFS side of HADOOP-6904: first step towards inter-version communications 
 between dfs client and NameNode
 -

 Key: HDFS-1335
 URL: https://issues.apache.org/jira/browse/HDFS-1335
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs client, name-node
Affects Versions: 0.22.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Attachments: hdfsRPC.patch, hdfsRpcVersion.patch


 The idea is that for getProtocolVersion, NameNode checks if the client and 
 server versions are compatible if the server version is greater than the 
 client version. If no, throws a VersionIncompatible exception; otherwise, 
 returns the server version.
 On the dfs client side, when creating a NameNode proxy, catches the 
 VersionMismatch exception and then checks if the client version and the 
 server version are compatible if the client version is greater than the 
 server version. If not compatible, throws exception VersionIncomptible; 
 otherwise, records the server version and continues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1557) Separate Storage from FSImage

2011-01-27 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987865#action_12987865
 ] 

Jitendra Nath Pandey commented on HDFS-1557:


I ran test-patch on the latest patch.
  There is a findbug error 
Synchronization performed on java.util.concurrent.CopyOnWriteArrayList in 
org.apache.hadoop.hdfs.server.namenode.NNStorage.attemptRestoreRemovedStorage(boolean)

   at line 
synchronized (this.removedStorageDirs)  in NNStorage

Overall test-patch results were
  
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 42 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]
 [exec] +1 system tests framework.  The patch passed system tests 
framework compile.


 Separate Storage from FSImage
 -

 Key: HDFS-1557
 URL: https://issues.apache.org/jira/browse/HDFS-1557
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.21.0
Reporter: Ivan Kelly
Assignee: Ivan Kelly
 Fix For: 0.23.0

 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
 HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
 HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff, HDFS-1557.diff


 FSImage currently derives from Storage and FSEditLog has to call methods 
 directly on FSImage to access the filesystem. This JIRA is to separate the 
 Storage class out into NNStorage so that FSEditLog is less dependent on 
 FSImage. From this point, the other parts of the circular dependency should 
 be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1598) ListPathsServlet excludes .*.crc files

2011-01-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987894#action_12987894
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1598:
--

{noformat}
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] -1 release audit.  The applied patch generated 1 release audit 
warnings (more than the trunk's current 0 warnings).
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.
{noformat}
The release warning is not related to this.  See HDFS-1600.

 ListPathsServlet excludes .*.crc files
 --

 Key: HDFS-1598
 URL: https://issues.apache.org/jira/browse/HDFS-1598
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1598_20110126.patch


 The {{.*.crc}} files are excluded by default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1084) TestDFSShell fails in trunk.

2011-01-27 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987897#action_12987897
 ] 

Konstantin Shvachko commented on HDFS-1084:
---

We can fix it by using {{FileUtil.makeShellPath(File file, boolean 
makeCanonicalPath)}} here instead of {{getCanonicalPath()}}.
A side note - the entire {{RawLocalFileSystem}} should probably invoke FileUtil 
methods rather than Shell. This would be a larger code cleanup, not for 0.22.

 TestDFSShell fails in trunk.
 

 Key: HDFS-1084
 URL: https://issues.apache.org/jira/browse/HDFS-1084
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0
Reporter: Konstantin Shvachko
Assignee: Po Cheung
Priority: Blocker
 Fix For: 0.22.0


 {{TestDFSShell.testFilePermissions()}} fails on an assert attached below. I 
 see it on my Linux box. Don't see it failing with Hudson, and the same test 
 runs fine in 0.21 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1602) Revert HADOOP-4885 for it is doesn't work as expected.

2011-01-27 Thread Konstantin Boudnik (JIRA)
Revert HADOOP-4885 for it is doesn't work as expected.
--

 Key: HDFS-1602
 URL: https://issues.apache.org/jira/browse/HDFS-1602
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik


NameNode storage restore functionality doesn't work (as HDFS-903 demonstrated). 
This needs to be either disabled, or removed, or fixed. This feature also fails 
HDFS-1496

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1598) ListPathsServlet excludes .*.crc files

2011-01-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1598:
-

Attachment: h1598_20110126_0.20.patch

h1598_20110126_0.20.patch: for 0.20

 ListPathsServlet excludes .*.crc files
 --

 Key: HDFS-1598
 URL: https://issues.apache.org/jira/browse/HDFS-1598
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch


 The {{.*.crc}} files are excluded by default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1603) Namenode gets sticky if one of namenode storage volumes disappears (removed, unmounted, etc.)

2011-01-27 Thread Konstantin Boudnik (JIRA)
Namenode gets sticky if one of namenode storage volumes disappears (removed, 
unmounted, etc.)
-

 Key: HDFS-1603
 URL: https://issues.apache.org/jira/browse/HDFS-1603
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik


While investigating failures on HDFS-1602 it became apparent that once a 
namenode storage volume is pulled out NN becomes completely sticky until 
{{FSImage:processIOError: removing storage}} move the storage from the active 
set. During this time none of normal NN operations are possible (e.g. creating 
a directory on HDFS timeouts eventually).

In case of NFS this can be workaround'd with soft,intr,timeo,retrans settings. 
However, a better handling of the situation is apparently possible and needs to 
be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1598) ListPathsServlet excludes .*.crc files

2011-01-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987904#action_12987904
 ] 

Hadoop QA commented on HDFS-1598:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12469623/h1598_20110126_0.20.patch
  against trunk revision 1062052.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/134//console

This message is automatically generated.

 ListPathsServlet excludes .*.crc files
 --

 Key: HDFS-1598
 URL: https://issues.apache.org/jira/browse/HDFS-1598
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch


 The {{.*.crc}} files are excluded by default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1602) Fix HADOOP-4885 for it is doesn't work as expected.

2011-01-27 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1602:
-

Summary: Fix HADOOP-4885 for it is doesn't work as expected.  (was: Revert 
HADOOP-4885 for it is doesn't work as expected.)

 Fix HADOOP-4885 for it is doesn't work as expected.
 ---

 Key: HDFS-1602
 URL: https://issues.apache.org/jira/browse/HDFS-1602
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Konstantin Boudnik

 NameNode storage restore functionality doesn't work (as HDFS-903 
 demonstrated). This needs to be either disabled, or removed, or fixed. This 
 feature also fails HDFS-1496

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1598) ListPathsServlet excludes .*.crc files

2011-01-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-1598:
-

   Resolution: Fixed
Fix Version/s: 0.23.0
   0.22.0
   0.21.1
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

I have committed this.

 ListPathsServlet excludes .*.crc files
 --

 Key: HDFS-1598
 URL: https://issues.apache.org/jira/browse/HDFS-1598
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20.2
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE
 Fix For: 0.21.1, 0.22.0, 0.23.0

 Attachments: h1598_20110126.patch, h1598_20110126_0.20.patch


 The {{.*.crc}} files are excluded by default.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix

2011-01-27 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1496:
-

Attachment: HDFS-1496.sh

This system level test reproduces the same issue with local NFS server and 
over-NFS mounted storage volumes

 TestStorageRestore is failing after HDFS-903 fix
 

 Key: HDFS-1496
 URL: https://issues.apache.org/jira/browse/HDFS-1496
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0, 0.23.0
Reporter: Konstantin Boudnik
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-1496.sh


 TestStorageRestore seems to be failing after HDFS-903 commit. Running git 
 bisect confirms it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure

2011-01-27 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987916#action_12987916
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1595:
--

Todd, would you like to work on this with your idea?

 DFSClient may incorrectly detect datanode failure
 -

 Key: HDFS-1595
 URL: https://issues.apache.org/jira/browse/HDFS-1595
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20.4
Reporter: Tsz Wo (Nicholas), SZE
Priority: Critical
 Attachments: hdfs-1595-idea.txt


 Suppose a source datanode S is writing to a destination datanode D in a write 
 pipeline.  We have an implicit assumption that _if S catches an exception 
 when it is writing to D, then D is faulty and S is fine._  As a result, 
 DFSClient will take out D from the pipeline, reconstruct the write pipeline 
 with the remaining datanodes and then continue writing .
 However, we find a case that the faulty machine F is indeed S but not D.  In 
 the case we found, F has a faulty network interface (or a faulty switch port) 
 in such a way that the faulty network interface works fine when transferring 
 a small amount of data, say 1MB, but it often fails when transferring a large 
 amount of data, say 100MB.
 It is even worst if F is the first datanode in the pipeline.  Consider the 
 following:
 # DFSClient creates a pipeline with three datanodes.  The first datanode is F.
 # F catches an IOException when writing to the second datanode. Then, F 
 reports the second datanode has error.
 # DFSClient removes the second datanode from the pipeline and continue 
 writing with the remaining datanode(s).
 # The pipeline now has two datanodes but (2) and (3) repeat.
 # Now, only F remains in the pipeline.  DFSClient continues writing with one 
 replica in F.
 # The write succeeds and DFSClient is able to *close the file successfully*.
 # The block is under replicated.  The NameNode schedules replication from F 
 to some other datanode D.
 # The replication fails for the same reason.  D reports to the NameNode that 
 the replica in F is corrupted.
 # The NameNode marks the replica in F is corrupted.
 # The block is corrupted since no replica is available.
 We were able to manually divide the replicas into small files and copy them 
 out from F without fixing the hardware.  The replicas seems uncorrupted.  
 This is a *data availability problem*.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1595) DFSClient may incorrectly detect datanode failure

2011-01-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987920#action_12987920
 ] 

Todd Lipcon commented on HDFS-1595:
---

Yea, I can take this, but it may be a bit before I can get to it - mostly 
focusing on bug fixing at the moment (I would classify this as a missing 
feature more than a bug).

 DFSClient may incorrectly detect datanode failure
 -

 Key: HDFS-1595
 URL: https://issues.apache.org/jira/browse/HDFS-1595
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 0.20.4
Reporter: Tsz Wo (Nicholas), SZE
Priority: Critical
 Attachments: hdfs-1595-idea.txt


 Suppose a source datanode S is writing to a destination datanode D in a write 
 pipeline.  We have an implicit assumption that _if S catches an exception 
 when it is writing to D, then D is faulty and S is fine._  As a result, 
 DFSClient will take out D from the pipeline, reconstruct the write pipeline 
 with the remaining datanodes and then continue writing .
 However, we find a case that the faulty machine F is indeed S but not D.  In 
 the case we found, F has a faulty network interface (or a faulty switch port) 
 in such a way that the faulty network interface works fine when transferring 
 a small amount of data, say 1MB, but it often fails when transferring a large 
 amount of data, say 100MB.
 It is even worst if F is the first datanode in the pipeline.  Consider the 
 following:
 # DFSClient creates a pipeline with three datanodes.  The first datanode is F.
 # F catches an IOException when writing to the second datanode. Then, F 
 reports the second datanode has error.
 # DFSClient removes the second datanode from the pipeline and continue 
 writing with the remaining datanode(s).
 # The pipeline now has two datanodes but (2) and (3) repeat.
 # Now, only F remains in the pipeline.  DFSClient continues writing with one 
 replica in F.
 # The write succeeds and DFSClient is able to *close the file successfully*.
 # The block is under replicated.  The NameNode schedules replication from F 
 to some other datanode D.
 # The replication fails for the same reason.  D reports to the NameNode that 
 the replica in F is corrupted.
 # The NameNode marks the replica in F is corrupted.
 # The block is corrupted since no replica is available.
 We were able to manually divide the replicas into small files and copy them 
 out from F without fixing the hardware.  The replicas seems uncorrupted.  
 This is a *data availability problem*.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix

2011-01-27 Thread Konstantin Boudnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-1496:
-

Attachment: HDFS-1496.sh

fixing missed extra NN ops after NFS mount is restored.

 TestStorageRestore is failing after HDFS-903 fix
 

 Key: HDFS-1496
 URL: https://issues.apache.org/jira/browse/HDFS-1496
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0, 0.23.0
Reporter: Konstantin Boudnik
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.22.0

 Attachments: HDFS-1496.sh, HDFS-1496.sh


 TestStorageRestore seems to be failing after HDFS-903 commit. Running git 
 bisect confirms it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1604) Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

2011-01-27 Thread Alejandro Abdelnur (JIRA)
Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles
--

 Key: HDFS-1604
 URL: https://issues.apache.org/jira/browse/HDFS-1604
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


This JIRA is for the HDFS portion of HADOOP-7119

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (HDFS-1604) Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

2011-01-27 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-1604 started by Alejandro Abdelnur.

 Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles
 --

 Key: HDFS-1604
 URL: https://issues.apache.org/jira/browse/HDFS-1604
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 This JIRA is for the HDFS portion of HADOOP-7119

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1604) add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT web-consoles

2011-01-27 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-1604:
-

Summary: add Kerberos HTTP SPNEGO authentication support to Hadoop 
JT/NN/DN/TT web-consoles  (was: Kerberos HTTP SPNEGO authentication support to 
Hadoop JT/NN/DN/TT web-consoles)

 add Kerberos HTTP SPNEGO authentication support to Hadoop JT/NN/DN/TT 
 web-consoles
 --

 Key: HDFS-1604
 URL: https://issues.apache.org/jira/browse/HDFS-1604
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur

 This JIRA is for the HDFS portion of HADOOP-7119

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1594) When the disk becomes full Namenode is getting shutdown and not able to recover

2011-01-27 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987960#action_12987960
 ] 

Konstantin Boudnik commented on HDFS-1594:
--

bq . I am finding it hard to review this diff because it has lots of diffs that 
are not inherently connected with the attempted fix.
That was exactly the point of asking for another round-trip ;)

 When the disk becomes full Namenode is getting shutdown and not able to 
 recover
 ---

 Key: HDFS-1594
 URL: https://issues.apache.org/jira/browse/HDFS-1594
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0, 0.21.1, 0.22.0
 Environment: Linux linux124 2.6.27.19-5-default #1 SMP 2009-02-28 
 04:40:21 +0100 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Devaraj K
 Attachments: hadoop-root-namenode-linux124.log, HDFS-1594.patch


 When the disk becomes full name node is shutting down and if we try to start 
 after making the space available It is not starting and throwing the below 
 exception.
 {code:xml} 
 2011-01-24 23:23:33,727 ERROR 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
 initialization failed.
 java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:284)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:577)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:570)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
 2011-01-24 23:23:33,729 ERROR 
 org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at org.apache.hadoop.io.UTF8.readFields(UTF8.java:117)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImageSerialization.readString(FSImageSerialization.java:201)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:93)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:60)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1089)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:1041)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:487)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:149)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:306)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.init(FSNamesystem.java:284)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:328)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:356)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:577)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:570)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1529)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1538)
 2011-01-24 23:23:33,730 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
 SHUTDOWN_MSG: