from:"Colin Patrick McCabe \(Updated\) \(JIRA\)"

[jira] [Updated] (HDFS-3270) run valgrind on fuse-dfs, fix any memory leaks

2012-04-19 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3270:
---

Attachment: HDFS-3270.001.patch

* fix malloc check in hdfs.c

(not actually a memory leak,b ut still bogus)

 run valgrind on fuse-dfs, fix any memory leaks
 --

 Key: HDFS-3270
 URL: https://issues.apache.org/jira/browse/HDFS-3270
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3270.001.patch


 run valgrind on fuse-dfs, fix any memory leaks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3270) run valgrind on fuse-dfs, fix any memory leaks

2012-04-19 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3270:
---

Status: Patch Available  (was: Open)

 run valgrind on fuse-dfs, fix any memory leaks
 --

 Key: HDFS-3270
 URL: https://issues.apache.org/jira/browse/HDFS-3270
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3270.001.patch


 run valgrind on fuse-dfs, fix any memory leaks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3270) run valgrind on fuse-dfs, fix any memory leaks

2012-04-19 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3270:
---

Attachment: HDFS-3270.002.patch

* fix another case of the same mistake

 run valgrind on fuse-dfs, fix any memory leaks
 --

 Key: HDFS-3270
 URL: https://issues.apache.org/jira/browse/HDFS-3270
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3270.001.patch, HDFS-3270.002.patch


 run valgrind on fuse-dfs, fix any memory leaks

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3306) fuse_dfs: don't lock release operations

2012-04-19 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3306:
---

Attachment: HDFS-3306.001.patch

 fuse_dfs: don't lock release operations
 ---

 Key: HDFS-3306
 URL: https://issues.apache.org/jira/browse/HDFS-3306
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3306.001.patch


 There's no need to lock release operations in FUSE, because release can only 
 be called once on a fuse_file_info structure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3306) fuse_dfs: don't lock release operations

2012-04-19 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3306:
---

Status: Patch Available  (was: Open)

 fuse_dfs: don't lock release operations
 ---

 Key: HDFS-3306
 URL: https://issues.apache.org/jira/browse/HDFS-3306
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3306.001.patch


 There's no need to lock release operations in FUSE, because release can only 
 be called once on a fuse_file_info structure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3290) Use a better local directory layout for the datanode

2012-04-18 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3290:
---

Description:
When the HDFS DataNode stores chunks in a local directory, it currently puts
all of the chunk files into either one big directory, or a collection of
directories. However, there is no way to know which directory a given block
will end up in, given its ID. As the number of files increases, this does not
scale well.

Similar to the git version control system, HDFS should create a few different
top level directories keyed off of a few bits in the chunk ID. Git uses 8
bits. This substantially cuts down on the number of chunk files in the same
directory and gives increased performance, while not compromising O(1) lookup
of chunks.

was:
When the HDFS DataNode stores chunks in a local directory, it currently puts
all of the chunk files into one big directory. As the number of files
increases, this does not work well at all. Local filesystems are not optimized
for the case where there are hundreds of thousands of files in the same
directory. It also makes inspecting directories with standard UNIX tools
difficult.

Use a better local directory layout for the datanode

Key: HDFS-3290
URL: https://issues.apache.org/jira/browse/HDFS-3290
Project: Hadoop HDFS
Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor

When the HDFS DataNode stores chunks in a local directory, it currently puts
all of the chunk files into either one big directory, or a collection of
directories. However, there is no way to know which directory a given block
will end up in, given its ID. As the number of files increases, this does
not scale well.
Similar to the git version control system, HDFS should create a few different
top level directories keyed off of a few bits in the chunk ID. Git uses 8
bits. This substantially cuts down on the number of chunk files in the same
directory and gives increased performance, while not compromising O(1) lookup
of chunks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3206) oev: miscellaneous xml cleanups

2012-04-18 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3206:
---

Attachment: HDFS-3206.004.patch

* here's a patch with the binary part included.  Hopefully jenkins won't choke 
on it...

 oev: miscellaneous xml cleanups
 ---

 Key: HDFS-3206
 URL: https://issues.apache.org/jira/browse/HDFS-3206
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3206.001.patch, HDFS-3206.002.patch, 
 HDFS-3206.003.patch, HDFS-3206.004.patch


 * SetOwner operations can change both the user and group which a file or 
 directory belongs to, or just one of those.  Currently, in the XML 
 serialization/deserialization code, we don't handle the case where just the 
 group is set, not the user.  We should handle this case.
 * consistently serialize generation stamp as GENSTAMP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt

2012-04-17 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3049:
---

Attachment: HDFS-3049.003.patch

rebase on trunk

During the normal loading NN startup process, fall back on a different
EditLog if we see one that is corrupt

Key: HDFS-3049
URL: https://issues.apache.org/jira/browse/HDFS-3049
Project: Hadoop HDFS
Issue Type: New Feature
Components: name-node
Affects Versions: 0.23.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
Attachments: HDFS-3049.001.patch, HDFS-3049.002.patch,
HDFS-3049.003.patch

During the NameNode startup process, we load an image, and then apply edit
logs to it until we believe that we have all the latest changes.
Unfortunately, if there is an I/O error while reading any of these files, in
most cases, we simply abort the startup process. We should try harder to
locate a readable edit log and/or image file.
*There are three main use cases for this feature:*
1. If the operating system does not honor fsync (usually due to a
misconfiguration), a file may end up in an inconsistent state.
2. In certain older releases where we did not use fallocate() or similar to
pre-reserve blocks, a disk full condition may cause a truncated log in one
edit directory.
3. There may be a bug in HDFS which results in some of the data directories
receiving corrupt data, but not all. This is the least likely use case.
*Proposed changes to normal NN startup*
* We should try a different FSImage if we can't load the first one we try.
* We should examine other FSEditLogs if we can't load the first one(s) we try.
* We should fail if we can't find EditLogs that would bring us up to what we
believe is the latest transaction ID.
Proposed changes to recovery mode NN startup:
we should list out all the available storage directories and allow the
operator to select which one he wants to use.
Something like this:
{code}
Multiple storage directories found.
1. /foo/bar
edits__curent__XYZ size:213421345 md5:2345345
image size:213421345 md5:2345345
2. /foo/baz
edits__curent__XYZ size:213421345 md5:2345345345
image size:213421345 md5:2345345
Which one would you like to use? (1/2)
{code}
As usual in recovery mode, we want to be flexible about error handling. In
this case, this means that we should NOT fail if we can't find EditLogs that
would bring us up to what we believe is the latest transaction ID.
*Not addressed by this feature*
This feature will not address the case where an attempt to access the
NameNode name directory or directories hangs because of an I/O error. This
may happen, for example, when trying to load an image from a hard-mounted NFS
directory, when the NFS server has gone away. Just as now, the operator will
have to notice this problem and take steps to correct it.

[jira] [Updated] (HDFS-3206) oev: miscellaneous xml cleanups

2012-04-17 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3206:
---

Attachment: HDFS-3206.003.patch

 oev: miscellaneous xml cleanups
 ---

 Key: HDFS-3206
 URL: https://issues.apache.org/jira/browse/HDFS-3206
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3206.001.patch, HDFS-3206.002.patch, 
 HDFS-3206.003.patch


 * SetOwner operations can change both the user and group which a file or 
 directory belongs to, or just one of those.  Currently, in the XML 
 serialization/deserialization code, we don't handle the case where just the 
 group is set, not the user.  We should handle this case.
 * consistently serialize generation stamp as GENSTAMP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt

2012-04-16 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3049:
---

Attachment: HDFS-3049.002.patch

* fix tests

During the normal loading NN startup process, fall back on a different
EditLog if we see one that is corrupt

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-13 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Status: Open (was: Patch Available)

harden edit log loader against malformed or malicious input
---

Key: HDFS-3134
URL: https://issues.apache.org/jira/browse/HDFS-3134
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-3134.001.patch, HDFS-3134.002.patch,
HDFS-3134.003.patch, HDFS-3134.004.patch

Currently, the edit log loader does not handle bad or malicious input
sensibly.
We can often cause OutOfMemory exceptions, null pointer exceptions, or other
unchecked exceptions to be thrown by feeding the edit log loader bad input.
In some environments, an out of memory error can cause the JVM process to be
terminated.
It's clear that we want these exceptions to be thrown as IOException instead
of as unchecked exceptions. We also want to avoid out of memory situations.
The main task here is to put a sensible upper limit on the lengths of arrays
and strings we allocate on command. The other task is to try to avoid
creating unchecked exceptions (by dereferencing potentially-NULL pointers,
for example). Instead, we should verify ahead of time and give a more
sensible error message that reflects the problem with the input.

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-13 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Attachment: HDFS-3134.005.patch

new patch with the common stuff split out. Requires HADOOP-8275

harden edit log loader against malformed or malicious input
---

[jira] [Updated] (HDFS-3049) During the normal loading NN startup process, fall back on a different image or EditLog if we see one that is corrupt

2012-04-13 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3049:
---

Attachment: HDFS-3049.001.patch

* implement edit log failover (no image stuff in here)

During the normal loading NN startup process, fall back on a different image
or EditLog if we see one that is corrupt
-

Key: HDFS-3049
URL: https://issues.apache.org/jira/browse/HDFS-3049
Project: Hadoop HDFS
Issue Type: New Feature
Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
Fix For: 0.24.0

Attachments: HDFS-3049.001.patch

[jira] [Updated] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt

2012-04-13 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3049:
---

Summary: During the normal loading NN startup process, fall back on a
different EditLog if we see one that is corrupt (was: During the normal
loading NN startup process, fall back on a different image or EditLog if we see
one that is corrupt)

remove references to FSImage (there is now a separate JIRA for that)

During the normal loading NN startup process, fall back on a different
EditLog if we see one that is corrupt

Attachments: HDFS-3049.001.patch

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-13 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Affects Version/s: 0.23.0
Fix Version/s: 2.0.0

harden edit log loader against malformed or malicious input
---

Attachments: HDFS-3134.001.patch, HDFS-3134.002.patch,
HDFS-3134.003.patch, HDFS-3134.004.patch, HDFS-3134.005.patch

[jira] [Updated] (HDFS-3049) During the normal loading NN startup process, fall back on a different EditLog if we see one that is corrupt

2012-04-13 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3049:
---

Status: Patch Available (was: Open)

During the normal loading NN startup process, fall back on a different
EditLog if we see one that is corrupt

Attachments: HDFS-3049.001.patch

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-13 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Target Version/s: 2.0.0
Fix Version/s: (was: 2.0.0)

harden edit log loader against malformed or malicious input
---

Key: HDFS-3134
URL: https://issues.apache.org/jira/browse/HDFS-3134
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.23.0
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-3134.001.patch, HDFS-3134.002.patch,
HDFS-3134.003.patch, HDFS-3134.004.patch, HDFS-3134.005.patch

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-12 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Release Note: This is a new feature.  It is documented in 
hdfs_user_guide.xml.

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.1.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch, 
 HDFS-3055-b1.006.patch, HDFS-3055-b1.007.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-12 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Release Note: This is a new feature. It is documented in
hdfs_user_guide.xml.

Implement Recovery Mode
---

Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch,
HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch,
HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch,
HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch,
HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch,
HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch,
HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch,
HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch,
HDFS-3004.037.patch, HDFS-3004.038.patch, HDFS-3004.039.patch,
HDFS-3004.040.patch, HDFS-3004.041.patch, HDFS-3004.042.patch,
HDFS-3004.042.patch, HDFS-3004.042.patch, HDFS-3004.043.patch,
HDFS-3004__namenode_recovery_tool.txt

When the NameNode metadata is corrupt for some reason, we want to be able to
fix it. Obviously, we would prefer never to get in this case. In a perfect
world, we never would. However, bad data on disk can happen from time to
time, because of hardware errors or misconfigurations. In the past we have
had to correct it manually, which is time-consuming and which can result in
downtime.
Recovery mode is initialized by the system administrator. When the NameNode
starts up in Recovery Mode, it will try to load the FSImage file, apply all
the edits from the edits log, and then write out a new image. Then it will
shut down.
Unlike in the normal startup process, the recovery mode startup process will
be interactive. When the NameNode finds something that is inconsistent, it
will prompt the operator as to what it should do. The operator can also
choose to take the first option for all prompts by starting up with the '-f'
flag, or typing 'a' at one of the prompts.
I have reused as much code as possible from the NameNode in this tool.
Hopefully, the effort that was spent developing this will also make the
NameNode editLog and image processing even more robust than it already is.

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-12 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Attachment: HDFS-3134.004.patch

* add unit test

* make range check more flexible (adding upper as well as lower bound, make
lower bound configurable)

* fix bug where we might not decode certain DelegationKey objects because we
encoded them with length = -1 (i.e. no key)

harden edit log loader against malformed or malicious input
---

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-11 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.007.patch

In the docs, refer to -force, not --force

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch, 
 HDFS-3055-b1.006.patch, HDFS-3055-b1.007.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3169) TestFsck should test multiple -move operations in a row

2012-04-11 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3169:
---

Status: Patch Available  (was: Open)

 TestFsck should test multiple -move operations in a row
 ---

 Key: HDFS-3169
 URL: https://issues.apache.org/jira/browse/HDFS-3169
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3169.001.patch


 TestFsck should test multiple -move operations in a row.  Overall, it would 
 be nice to have more coverage on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-11 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Attachment: HDFS-3134.003.patch

rebase on trunk

harden edit log loader against malformed or malicious input
---

[jira] [Updated] (HDFS-3248) bootstrapstanby repeated twice in hdfs namenode usage message

2012-04-10 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3248:
---

Status: Patch Available  (was: Open)

 bootstrapstanby repeated twice in hdfs namenode usage message
 -

 Key: HDFS-3248
 URL: https://issues.apache.org/jira/browse/HDFS-3248
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3248.002.patch


 The HDFS usage message repeats bootstrapStandby twice.
 {code}
 Usage: java NameNode [-backup] | [-checkpoint] | [-format[-clusterid cid ]] | 
 [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | 
 [-bootstrapStandby] | [-initializeSharedEdits] | [-bootstrapStandby] | 
 [-recover [ -force ] ]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3248) bootstrapstanby repeated twice in hdfs namenode usage message

2012-04-10 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3248:
---

Attachment: HDFS-3248.002.patch

 bootstrapstanby repeated twice in hdfs namenode usage message
 -

 Key: HDFS-3248
 URL: https://issues.apache.org/jira/browse/HDFS-3248
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3248.002.patch


 The HDFS usage message repeats bootstrapStandby twice.
 {code}
 Usage: java NameNode [-backup] | [-checkpoint] | [-format[-clusterid cid ]] | 
 [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | 
 [-bootstrapStandby] | [-initializeSharedEdits] | [-bootstrapStandby] | 
 [-recover [ -force ] ]
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-09 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.005.patch

* move askOperator to MetaRecoveryContext::editLogLoaderPrompt

* remove unecessary toString() call

* warn about losing data from your HDFS filesystem rather than losing data 
from your filesystem

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-09 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.006.patch

* TestNameNodeRecovery: use StringUtils instead of StringWriter to serialize 
exception

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch, HDFS-3055-b1.005.patch, 
 HDFS-3055-b1.006.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: (was: HDFS-3004__namenode_recovery_tool.txt)

Implement Recovery Mode
---

Key: HDFS-3004
URL: https://issues.apache.org/jira/browse/HDFS-3004
Project: Hadoop HDFS
Issue Type: New Feature
Components: tools
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-3004.010.patch, HDFS-3004.011.patch,
HDFS-3004.012.patch, HDFS-3004.013.patch, HDFS-3004.015.patch,
HDFS-3004.016.patch, HDFS-3004.017.patch, HDFS-3004.018.patch,
HDFS-3004.019.patch, HDFS-3004.020.patch, HDFS-3004.022.patch,
HDFS-3004.023.patch, HDFS-3004.024.patch, HDFS-3004.026.patch,
HDFS-3004.027.patch, HDFS-3004.029.patch, HDFS-3004.030.patch,
HDFS-3004.031.patch, HDFS-3004.032.patch, HDFS-3004.033.patch,
HDFS-3004.034.patch, HDFS-3004.035.patch, HDFS-3004.036.patch,
HDFS-3004.037.patch, HDFS-3004.038.patch, HDFS-3004.039.patch,
HDFS-3004.040.patch, HDFS-3004.041.patch,
HDFS-3004__namenode_recovery_tool.txt

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004__namenode_recovery_tool.txt

* update design document

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.042.patch

* always prompt about possible data loss, even when -force is specified

* update hdfs_user_guide.xml so that it talks about -force

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Attachment: HDFS-3134.001.patch

* In the edit log loader, don't try to allocate arrays of negative length.
Instead, throw an IOException.

* When deserializing a variable length number into a java integer, do not
ignore problems resulting from truncation-- throw an IOException instead.

harden edit log loader against malformed or malicious input
---

Key: HDFS-3134
URL: https://issues.apache.org/jira/browse/HDFS-3134
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-3134.001.patch

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Status: Patch Available (was: Open)

harden edit log loader against malformed or malicious input
---

Key: HDFS-3134
URL: https://issues.apache.org/jira/browse/HDFS-3134
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Attachments: HDFS-3134.001.patch

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.004.patch

* update patch to reflect comments from HDFS-3004

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch, HDFS-3055-b1.004.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3134) harden edit log loader against malformed or malicious input

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3134:
---

Attachment: HDFS-3134.002.patch

harden edit log loader against malformed or malicious input
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-06 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.042.patch

* reposting so jenkins will test

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3206) oev: correctly serialize SetOwner operations in which the user is not changed

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3206:
---

Attachment: HDFS-3206.001.patch

* fix

 oev: correctly serialize SetOwner operations in which the user is not changed
 -

 Key: HDFS-3206
 URL: https://issues.apache.org/jira/browse/HDFS-3206
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3206.001.patch


 SetOwner operations can change both the user and group which a file or 
 directory belongs to, or just one of those.  Currently, in the XML 
 serialization/deserialization code, we don't handle the case where just the 
 group is set, not the user.  We should handle this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3206) oev: miscellaneous xml cleanups

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3206:
---

Description:
* SetOwner operations can change both the user and group which a file or
directory belongs to, or just one of those. Currently, in the XML
serialization/deserialization code, we don't handle the case where just the
group is set, not the user. We should handle this case.

* consistently serialize generation stamp as GENSTAMP.

was:SetOwner operations can change both the user and group which a file or
directory belongs to, or just one of those. Currently, in the XML
serialization/deserialization code, we don't handle the case where just the
group is set, not the user. We should handle this case.

Summary: oev: miscellaneous xml cleanups (was: oev: correctly
serialize SetOwner operations in which the user is not changed)

oev: miscellaneous xml cleanups
---

Key: HDFS-3206
URL: https://issues.apache.org/jira/browse/HDFS-3206
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
Attachments: HDFS-3206.001.patch

* SetOwner operations can change both the user and group which a file or
directory belongs to, or just one of those. Currently, in the XML
serialization/deserialization code, we don't handle the case where just the
group is set, not the user. We should handle this case.
* consistently serialize generation stamp as GENSTAMP.

[jira] [Updated] (HDFS-3206) oev: miscellaneous xml cleanups

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3206:
---

Status: Patch Available  (was: Open)

 oev: miscellaneous xml cleanups
 ---

 Key: HDFS-3206
 URL: https://issues.apache.org/jira/browse/HDFS-3206
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3206.001.patch


 * SetOwner operations can change both the user and group which a file or 
 directory belongs to, or just one of those.  Currently, in the XML 
 serialization/deserialization code, we don't handle the case where just the 
 group is set, not the user.  We should handle this case.
 * consistently serialize generation stamp as GENSTAMP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.003.patch

* rebase on branch-1

todd: yes, this is up to date, and I've run the following tests:
TestCheckpoint,
TestEditLog,
TestNameNodeRecovery,
TestEditLogLoading,
TestNameNodeMXBean,
TestSaveNamespace,
TestSecurityTokenEditLog,
TestStorageDirectoryFailure,
TestStorageRestore

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch, 
 HDFS-3055-b1.003.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.039.patch

* rebase on trunk

* rename RecoveryContext to MetaRecoveryContext

* rename -autoChooseDefault to -noPrompt

* EditLogInputException.java: remove pointless whitespace change

* some whitespace and punctuation improvements to the recovery prompt text.

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3206) oev: miscellaneous xml cleanups

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3206:
---

Attachment: HDFS-3206.002.patch

* put OP_END_LOG_SEGMENT at the end of the edit log, which it would be in a 
real edit log.

 oev: miscellaneous xml cleanups
 ---

 Key: HDFS-3206
 URL: https://issues.apache.org/jira/browse/HDFS-3206
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3206.001.patch, HDFS-3206.002.patch


 * SetOwner operations can change both the user and group which a file or 
 directory belongs to, or just one of those.  Currently, in the XML 
 serialization/deserialization code, we don't handle the case where just the 
 group is set, not the user.  We should handle this case.
 * consistently serialize generation stamp as GENSTAMP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.040.patch

* update findbugsExcludeFile.xml to reflect the fact that RecoveryContext is
now called MetaRecoveryContext

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-05 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.041.patch

* implement -force

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3050) rework OEV to share more code with the NameNode

2012-04-04 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.019.patch

* fix error handling bug that could lead to open files getting leaked 
(theoretically)

* suppress javadoc warnings resulting from com.sun.* API use

 rework OEV to share more code with the NameNode
 ---

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch, 
 HDFS-3050.015.patch, HDFS-3050.016.patch, HDFS-3050.017.patch, 
 HDFS-3050.018.patch, HDFS-3050.019.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.  By using the existing FSEditLogLoader code to 
 load edits in OEV, we can avoid having to update two places when the format 
 changes.
 We should not put opcode checksums into the XML, because they are a 
 serialization detail, not related to what the data is what we're storing.  
 This will also make it possible to modify the XML file and translate this 
 modified file back to a binary edits log file.
 Finally, this changes introduces --fix-txids.  When OEV is passed this flag, 
 it will close gaps in the transaction log by modifying the sequence numbers.  
 This is useful if you want to modify the edit log XML (say, by removing a 
 transaction), and transform the modified XML back into a valid binary edit 
 log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) rework OEV to share more code with the NameNode

2012-04-04 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.020.patch

* update OK_JAVADOC_WARNINGS

 rework OEV to share more code with the NameNode
 ---

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch, 
 HDFS-3050.015.patch, HDFS-3050.016.patch, HDFS-3050.017.patch, 
 HDFS-3050.018.patch, HDFS-3050.019.patch, HDFS-3050.020.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.  By using the existing FSEditLogLoader code to 
 load edits in OEV, we can avoid having to update two places when the format 
 changes.
 We should not put opcode checksums into the XML, because they are a 
 serialization detail, not related to what the data is what we're storing.  
 This will also make it possible to modify the XML file and translate this 
 modified file back to a binary edits log file.
 Finally, this changes introduces --fix-txids.  When OEV is passed this flag, 
 it will close gaps in the transaction log by modifying the sequence numbers.  
 This is useful if you want to modify the edit log XML (say, by removing a 
 transaction), and transform the modified XML back into a valid binary edit 
 log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-04 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.038.patch

* rebase

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-04-03 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.002.patch

* add unit test

* some fixes to NN unclean shutdown (to allow unit test to work)

* better error reporting for the branch-1 edit log stuff (print out the offset 
when we encounter a problem)

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch, HDFS-3055-b1.002.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1378) Edit log replay should track and report file offsets in case of errors

2012-04-03 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-1378:
---

Attachment: HDFS-1378-b1.002.patch

* port to branch-1

 Edit log replay should track and report file offsets in case of errors
 --

 Key: HDFS-1378
 URL: https://issues.apache.org/jira/browse/HDFS-1378
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
 Fix For: 0.23.0

 Attachments: HDFS-1378-b1.002.patch, hdfs-1378-branch20.txt, 
 hdfs-1378.0.patch, hdfs-1378.1.patch, hdfs-1378.2.txt


 Occasionally there are bugs or operational mistakes that result in corrupt 
 edit logs which I end up having to repair by hand. In these cases it would be 
 very handy to have the error message also print out the file offsets of the 
 last several edit log opcodes so it's easier to find the right place to edit 
 in the OP_INVALID marker. We could also use this facility to provide a rough 
 estimate of how far along edit log replay the NN is during startup (handy 
 when a 2NN has died and replay takes a while)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1378) Edit log replay should track and report file offsets in case of errors

2012-04-03 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-1378:
---

Attachment: HDFS-1378-b1.003.patch

* include bug fix from revised patch

* backport unit test as well

 Edit log replay should track and report file offsets in case of errors
 --

 Key: HDFS-1378
 URL: https://issues.apache.org/jira/browse/HDFS-1378
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
 Fix For: 0.23.0

 Attachments: HDFS-1378-b1.002.patch, HDFS-1378-b1.003.patch, 
 hdfs-1378-branch20.txt, hdfs-1378.0.patch, hdfs-1378.1.patch, hdfs-1378.2.txt


 Occasionally there are bugs or operational mistakes that result in corrupt 
 edit logs which I end up having to repair by hand. In these cases it would be 
 very handy to have the error message also print out the file offsets of the 
 last several edit log opcodes so it's easier to find the right place to edit 
 in the OP_INVALID marker. We could also use this facility to provide a rough 
 estimate of how far along edit log replay the NN is during startup (handy 
 when a 2NN has died and replay takes a while)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-1378) Edit log replay should track and report file offsets in case of errors

2012-04-03 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-1378:
---

Attachment: HDFS-1378-b1.004.patch

* add unit test

 Edit log replay should track and report file offsets in case of errors
 --

 Key: HDFS-1378
 URL: https://issues.apache.org/jira/browse/HDFS-1378
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Colin Patrick McCabe
 Fix For: 0.23.0

 Attachments: HDFS-1378-b1.002.patch, HDFS-1378-b1.003.patch, 
 HDFS-1378-b1.004.patch, hdfs-1378-branch20.txt, hdfs-1378.0.patch, 
 hdfs-1378.1.patch, hdfs-1378.2.txt


 Occasionally there are bugs or operational mistakes that result in corrupt 
 edit logs which I end up having to repair by hand. In these cases it would be 
 very handy to have the error message also print out the file offsets of the 
 last several edit log opcodes so it's easier to find the right place to edit 
 in the OP_INVALID marker. We could also use this facility to provide a rough 
 estimate of how far along edit log replay the NN is during startup (handy 
 when a 2NN has died and replay takes a while)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) rework OEV to share more code with the NameNode

2012-04-03 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.018.patch

* rebase on latest trunk

 rework OEV to share more code with the NameNode
 ---

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch, 
 HDFS-3050.015.patch, HDFS-3050.016.patch, HDFS-3050.017.patch, 
 HDFS-3050.018.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.  By using the existing FSEditLogLoader code to 
 load edits in OEV, we can avoid having to update two places when the format 
 changes.
 We should not put opcode checksums into the XML, because they are a 
 serialization detail, not related to what the data is what we're storing.  
 This will also make it possible to modify the XML file and translate this 
 modified file back to a binary edits log file.
 Finally, this changes introduces --fix-txids.  When OEV is passed this flag, 
 it will close gaps in the transaction log by modifying the sequence numbers.  
 This is useful if you want to modify the edit log XML (say, by removing a 
 transaction), and transform the modified XML back into a valid binary edit 
 log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) rework OEV to share more code with the NameNode

2012-04-02 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.017.patch

* rebase on trunk

 rework OEV to share more code with the NameNode
 ---

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch, 
 HDFS-3050.015.patch, HDFS-3050.016.patch, HDFS-3050.017.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.  By using the existing FSEditLogLoader code to 
 load edits in OEV, we can avoid having to update two places when the format 
 changes.
 We should not put opcode checksums into the XML, because they are a 
 serialization detail, not related to what the data is what we're storing.  
 This will also make it possible to modify the XML file and translate this 
 modified file back to a binary edits log file.
 Finally, this changes introduces --fix-txids.  When OEV is passed this flag, 
 it will close gaps in the transaction log by modifying the sequence numbers.  
 This is useful if you want to modify the edit log XML (say, by removing a 
 transaction), and transform the modified XML back into a valid binary edit 
 log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-04-02 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.037.patch

* style fixes

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3181) org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart fails intermittently

2012-04-02 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3181:
---

Attachment: testOut.txt

exception, standard output, etc of a failure

 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
  fails intermittently
 

 Key: HDFS-3181
 URL: https://issues.apache.org/jira/browse/HDFS-3181
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
 Attachments: testOut.txt


 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
  seems to be failing intermittently on jenkins.
 {code}
 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 Failing for the past 1 build (Since Failed#2163 )
 Took 8.4 sec.
 Error Message
 Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed 
 by DFSClient_NONMAPREDUCE_1147689755_1  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)  at 
 java.security.AccessController.doPrivileged(Native Method)  at 
 javax.security.auth.Subject.doAs(Subject.java:396)  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) 
 Stacktrace
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
 DFSClient_NONMAPREDUCE_1147689755_1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655)
   at org.apache.hadoop.ipc.Client.call(Client.java:1159)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:185)
   at $Proxy15.getAdditionalDatanode(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
   at $Proxy15.getAdditionalDatanode(Unknown Source)
   at

[jira] [Updated] (HDFS-3181) testHardLeaseRecoveryAfterNameNodeRestart fails intermittently

2012-04-02 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3181:
---

Summary: testHardLeaseRecoveryAfterNameNodeRestart fails intermittently  
(was: 
org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 fails intermittently)

 testHardLeaseRecoveryAfterNameNodeRestart fails intermittently
 --

 Key: HDFS-3181
 URL: https://issues.apache.org/jira/browse/HDFS-3181
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Colin Patrick McCabe
 Attachments: testOut.txt


 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
  seems to be failing intermittently on jenkins.
 {code}
 org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart
 Failing for the past 1 build (Since Failed#2163 )
 Took 8.4 sec.
 Error Message
 Lease mismatch on /hardLeaseRecovery owned by HDFS_NameNode but is accessed 
 by DFSClient_NONMAPREDUCE_1147689755_1  at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)  at 
 org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)  at 
 java.security.AccessController.doPrivileged(Native Method)  at 
 javax.security.auth.Subject.doAs(Subject.java:396)  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655) 
 Stacktrace
 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: Lease mismatch 
 on /hardLeaseRecovery owned by HDFS_NameNode but is accessed by 
 DFSClient_NONMAPREDUCE_1147689755_1
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2076)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2051)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalDatanode(FSNamesystem.java:1983)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getAdditionalDatanode(NameNodeRpcServer.java:492)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getAdditionalDatanode(ClientNamenodeProtocolServerSideTranslatorPB.java:311)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:42604)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:417)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:891)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1661)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1657)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1205)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1655)
   at org.apache.hadoop.ipc.Client.call(Client.java:1159)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:185)
   at $Proxy15.getAdditionalDatanode(Unknown Source)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
   at $Proxy15.getAdditionalDatanode(Unknown Source)
   at

[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default

2012-03-30 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3044:
---

Attachment: HDFS-3044-b1.004.patch

* remember to run fsck -delete before checking to see if the file is really 
deleted (d'oh!)

* add test that running fsck -move a few times in a row has no harmful effects

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Fix For: 1.1.0, 2.0.0

 Attachments: HDFS-3044-b1.002.patch, HDFS-3044-b1.004.patch, 
 HDFS-3044.002.patch, HDFS-3044.003.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) rework OEV to share more code with the NameNode

2012-03-30 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3050:
---

Description:
Current, OEV (the offline edits viewer) re-implements all of the opcode parsing
logic found in the NameNode. This duplicated code creates a maintenance burden
for us.

OEV should be refactored to simply use the normal EditLog parsing code, rather
than rolling its own. By using the existing FSEditLogLoader code to load edits
in OEV, we can avoid having to update two places when the format changes.

We should not put opcode checksums into the XML, because they are a
serialization detail, not related to what the data is what we're storing. This
will also make it possible to modify the XML file and translate this modified
file back to a binary edits log file.

Finally, this changes introduces --fix-txids. When OEV is passed this flag, it
will close gaps in the transaction log by modifying the sequence numbers. This
is useful if you want to modify the edit log XML (say, by removing a
transaction), and transform the modified XML back into a valid binary edit log
file.

was:
Current, OEV (the offline edits viewer) re-implements all of the opcode parsing
logic found in the NameNode. This duplicated code creates a maintenance burden
for us.

OEV should be refactored to simply use the normal EditLog parsing code, rather
than rolling its own.

Summary: rework OEV to share more code with the NameNode (was:
refactor OEV to share more code with the NameNode)

rework OEV to share more code with the NameNode
---

Key: HDFS-3050
URL: https://issues.apache.org/jira/browse/HDFS-3050
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch,
HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch,
HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch,
HDFS-3050.015.patch

Current, OEV (the offline edits viewer) re-implements all of the opcode
parsing logic found in the NameNode. This duplicated code creates a
maintenance burden for us.
OEV should be refactored to simply use the normal EditLog parsing code,
rather than rolling its own. By using the existing FSEditLogLoader code to
load edits in OEV, we can avoid having to update two places when the format
changes.
We should not put opcode checksums into the XML, because they are a
serialization detail, not related to what the data is what we're storing.
This will also make it possible to modify the XML file and translate this
modified file back to a binary edits log file.
Finally, this changes introduces --fix-txids. When OEV is passed this flag,
it will close gaps in the transaction log by modifying the sequence numbers.
This is useful if you want to modify the edit log XML (say, by removing a
transaction), and transform the modified XML back into a valid binary edit
log file.

[jira] [Updated] (HDFS-3050) rework OEV to share more code with the NameNode

2012-03-30 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.016.patch

* rebase against current trunk

* posted improved patch description and name

 rework OEV to share more code with the NameNode
 ---

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch, 
 HDFS-3050.015.patch, HDFS-3050.016.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.  By using the existing FSEditLogLoader code to 
 load edits in OEV, we can avoid having to update two places when the format 
 changes.
 We should not put opcode checksums into the XML, because they are a 
 serialization detail, not related to what the data is what we're storing.  
 This will also make it possible to modify the XML file and translate this 
 modified file back to a binary edits log file.
 Finally, this changes introduces --fix-txids.  When OEV is passed this flag, 
 it will close gaps in the transaction log by modifying the sequence numbers.  
 This is useful if you want to modify the edit log XML (say, by removing a 
 transaction), and transform the modified XML back into a valid binary edit 
 log file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-30 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.036.patch

* rebase on trunk

* slight cleanup of EditLogBackupInputStream::nextValidOp()

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-29 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.015.patch

* set 2-space indentation in output XML to match old XML

* add Javadoc comments to XMLUtils

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch, 
 HDFS-3050.015.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default

2012-03-29 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3044:
---

Attachment: HDFS-3050-b1.001.patch

* port to branch-1

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Fix For: 2.0.0

 Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch, 
 HDFS-3050-b1.001.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default

2012-03-29 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3044:
---

Attachment: HDFS-3044-b1.002.patch

* fix patch name

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Fix For: 2.0.0

 Attachments: HDFS-3044-b1.002.patch, HDFS-3044.002.patch, 
 HDFS-3044.003.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default

2012-03-29 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3044:
---

Attachment: (was: HDFS-3050-b1.001.patch)

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Fix For: 2.0.0

 Attachments: HDFS-3044-b1.002.patch, HDFS-3044.002.patch, 
 HDFS-3044.003.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-29 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.035.patch

* TestNameNodeRecovery: don't need data nodes for this test

* TestNameNodeRecovery: use set rather than array

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-28 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.034.patch

* address Todd's suggestions

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-28 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.014.patch

* fix unit test

* enable reading edit logs from XML 

* add -f / -fix-txids option, which makes oev close any holes in the 
transaction ID series.

* Editing the XML no longer allows you to manually recalculate checksums for 
the edited opcode.

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch, HDFS-3050.014.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-26 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.012.patch

* fix findbugs warnings

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch, HDFS-3050.012.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-26 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.033.patch

* rebase on trunk

* fix some resource leaks in TestNameNodeRecovery

* some whitespace and logging cleanups

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-23 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.032.patch

* hdfs_user_guide.xml: fix paragraph breaks

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3055) Implement recovery mode for branch-1

2012-03-23 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3055:
---

Attachment: HDFS-3055-b1.001.patch

* initial version

 Implement recovery mode for branch-1
 

 Key: HDFS-3055
 URL: https://issues.apache.org/jira/browse/HDFS-3055
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Fix For: 1.0.0

 Attachments: HDFS-3055-b1.001.patch


 Implement recovery mode for branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.027.patch

* fix bug uncovered by jenkins

* add a little more debug in TestNameNodeRecovery

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.009.patch

rebase

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.009.patch

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: (was: HDFS-3050.009.patch)

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.010.patch

remove all changes to common

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3129) NetworkTopology: add test that getLeaf should check for invalid topologies

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3129:
---

Attachment: HDFS-3129.001.patch

 NetworkTopology: add test that getLeaf should check for invalid topologies
 --

 Key: HDFS-3129
 URL: https://issues.apache.org/jira/browse/HDFS-3129
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3129.001.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.011.patch

* fix broken import lines

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch, HDFS-3050.009.patch, HDFS-3050.010.patch, 
 HDFS-3050.011.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.029.patch

* rebase on trunk

* fix findbugs suppression

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3129) NetworkTopology: add test that getLeaf should check for invalid topologies

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3129:
---

Attachment: HDFS-3129-b1.001.patch

* branch-1 version

 NetworkTopology: add test that getLeaf should check for invalid topologies
 --

 Key: HDFS-3129
 URL: https://issues.apache.org/jira/browse/HDFS-3129
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3129-b1.001.patch, HDFS-3129.001.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.030.patch

fix bug in prompting

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-22 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.031.patch

testing done: manually corrupted an edit log, recovered it with recovery mode.

patch update: whitespace tweak for recovery prompt

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-21 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.024.patch

* HdfsServerConstants: use equals for string equality

* fix bugs with the upgrade process

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-21 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.026.patch

rebase on latest trunk

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.020.patch

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Status: Open (was: Patch Available)

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Status: Patch Available (was: Open)

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.007.patch

rebase on latest trunk

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: (was: HDFS-3050.008.patch)

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3050) refactor OEV to share more code with the NameNode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3050:
---

Attachment: HDFS-3050.008.patch
HDFS-3050.008.patch

 refactor OEV to share more code with the NameNode
 -

 Key: HDFS-3050
 URL: https://issues.apache.org/jira/browse/HDFS-3050
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor
 Attachments: HDFS-3050.006.patch, HDFS-3050.007.patch, 
 HDFS-3050.008.patch


 Current, OEV (the offline edits viewer) re-implements all of the opcode 
 parsing logic found in the NameNode.  This duplicated code creates a 
 maintenance burden for us.
 OEV should be refactored to simply use the normal EditLog parsing code, 
 rather than rolling its own.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3044:
---

Attachment: HDFS-3044.003.patch

address eli's comments

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3044.002.patch, HDFS-3044.003.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.022.patch

* remove some unecessary whitespace changes

* re-introduce EditLogInputException

* edit log input stream: change API as we discussed.

* FSEditLogLoader: re-organize this file. Fix some corner cases relating to
out-of-order transaction IDs

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-20 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.023.patch

* OpInstanceCache needs to be thread-local to work correctly

* update exception text regex in TestFSEditLogLoader

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-17 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.019.patch

* remove obsolete references to JournalStream in comments

* rename resync to skipBrokenEdits

* rename -f to -chooseFirst

* fix EditLogInputStream comments

* ELIS::rewind - ELIS::putOp

* FSEditLogLoader: fix a case where the numEdits return could be incorrect

* FSEditLogLoader: improve handling of missing transactions

* fix some cases in which we had been assuming that there are no gaps in the
transaction log stream. Try to avoid doing fancy arithmetic by counting the
number of edits we've decoded, etc. Instead, just rely on looking at the
transaction ID of the last edit we decoded.

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-16 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.016.patch

* rebase on latest trunk

* make nextOp protected in all subclasses of ELIS

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-16 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.017.patch

add a section about recovery to hdfs_user_guide.xml

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-16 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.018.patch

* move to using a per-Reader or per-FSEditLog operation cache, rather than a
purely per-thread operation cache. Now that we are caching a single opcode
inside the FSInputStream, we definitely don't want these instances shared
between threads.

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-15 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.012.patch

* make more exceptions skippable

* rename StartupOption.ALWAYS_CHOOSE_YES to StartupOption.ALWAYS_CHOOSE_FIRST,
to better reflect what it does.

* refactor EditLogInputStream a bit

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-15 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.013.patch

* remove SkippableEditLogException, as it turned out not to be necessary

* test skipping in EditLogInputStream

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3004) Implement Recovery Mode

2012-03-15 Thread Colin Patrick McCabe (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Colin Patrick McCabe updated HDFS-3004:
---

Attachment: HDFS-3004.015.patch

fix small bug in opcode skipping

Implement Recovery Mode
---

[jira] [Updated] (HDFS-3044) fsck move should be non-destructive by default

2012-03-12 Thread Colin Patrick McCabe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3044:
---

Attachment: (was: HDFS-3044.001.patch)

 fsck move should be non-destructive by default
 --

 Key: HDFS-3044
 URL: https://issues.apache.org/jira/browse/HDFS-3044
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Eli Collins
Assignee: Colin Patrick McCabe
 Attachments: HDFS-3044.002.patch


 The fsck move behavior in the code and originally articulated in HADOOP-101 
 is:
 {quote}Current failure modes for DFS involve blocks that are completely 
 missing. The only way to fix them would be to recover chains of blocks and 
 put them into lost+found{quote}
 A directory is created with the file name, the blocks that are accessible are 
 created as individual files in this directory, then the original file is 
 removed. 
 I suspect the rationale for this behavior was that you can't use files that 
 are missing locations, and copying the block as files at least makes part of 
 the files accessible. However this behavior can also result in permanent 
 dataloss. Eg:
 - Some datanodes don't come up (eg due to a HW issues) and checkin on cluster 
 startup, files with blocks where all replicas are on these set of datanodes 
 are marked corrupt
 - Admin does fsck move, which deletes the corrupt files, saves whatever 
 blocks were available
 - The HW issues with datanodes are resolved, they are started and join the 
 cluster. The NN tells them to delete their blocks for the corrupt files since 
 the file was deleted. 
 I think we should:
 - Make fsck move non-destructive by default (eg just does a move into 
 lost+found)
 - Make the destructive behavior optional (eg --destructive so admins think 
 about what they're doing)
 - Provide better sanity checks and warnings, eg if you're running fsck and 
 not all the slaves have checked in (if using dfs.hosts) then fsck should 
 print a warning indicating this that an admin should have to override if they 
 want to do something destructive

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

1 2 >

1 - 100 of 153 matches

Mail list logo