[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739293#comment-13739293
 ] 

Colin Patrick McCabe commented on HDFS-4504:


I don't think adding a new RPC would be too bad.  It would be very similar to 
recoverLease.

bq. But may be difficult to handle "suppose you have two threads, T1 and T2. 
They both have a client name of C." case since client is same.

I think we should do this in HDFS-4688 rather than trying to solve it here.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739288#comment-13739288
 ] 

Vinay commented on HDFS-4504:
-

bq. It seems to me like it would be better to call completeFile() or perhaps 
some new abortFile() RPC, which would first verify that the client name trying 
to abort the lease is the same as the current lease holder.
This looks good. Seems this would take lot of code changes and also lot of 
cases to handle. But may be difficult to handle "suppose you have two threads, 
T1 and T2. They both have a client name of C." case since client is same.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-570) When opening a file for read, make the file length avaliable to client.

2013-08-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-570:


Release Note: In order to support read consistency, get last block length 
from a data-node when opening a file being written to. 

> When opening a file for read, make the file length avaliable to client.
> ---
>
> Key: HDFS-570
> URL: https://issues.apache.org/jira/browse/HDFS-570
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: Append Branch
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: Append Branch
>
> Attachments: h570_20090828.patch, h570_20090922.patch, 
> h570_20090924.patch, h570_20090925b.patch, h570_20090925c.patch, 
> h570_20090925.patch, h570_20090926.patch, h570_20090926.patch
>
>
> In order to support read consistency, DFSClient needs the file length at the 
> file opening time.  In the current implmentation, DFSClient obtains the file 
> length at the file opening time but the length is inaccurate if the file is 
> being written.
> For more details, see Section 4 in the [append design 
> doc|https://issues.apache.org/jira/secure/attachment/12415768/appendDesign2.pdf].

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-636) SafeMode should count only complete blocks.

2013-08-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-636:


Release Note: SafeMode counts only complete blocks instead of all blocks.
Hadoop Flags: Incompatible change

> SafeMode should count only complete blocks.
> ---
>
> Key: HDFS-636
> URL: https://issues.apache.org/jira/browse/HDFS-636
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: Append Branch
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Fix For: Append Branch
>
> Attachments: completeBlockTotal.patch
>
>
> During start up the name-node is in safe mode and is counting blocks reported 
> by data-nodes. When the number of minimally replicated blocks reaches the 
> configured threshold the name-node leaves safe mode. Currently all blocks are 
> counted towards the threshold including the ones that are under construction. 
> The under-construction blocks should be excluded from the count, because they 
> need to be recovered, which may take long time (lease expires in 1 hour by 
> default). Also the recovery may result in deleting those blocks so counting 
> them in the blocks total is incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739215#comment-13739215
 ] 

Colin Patrick McCabe commented on HDFS-4504:


OK, I thought about this a little more.  Since we handle symlinks by throwing 
{{UnresolvedLinkException}}, maybe the scenario I outlined can't happen.  The 
client would get {{UnresolvedLinkException}} when trying to create 
{{/baz/bar}}, and the resolve it to {{/foo/bar}}.  At that point, we could 
reasonably detect that it was the same file as the one in the first thread.

We might reasonably be able to just do something very similar to recoverLease, 
but checking that the client name is the same as the one on the lease.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739204#comment-13739204
 ] 

Colin Patrick McCabe commented on HDFS-4504:


The problem with calling completeFile is that it may never succeed.  If the 
last block can't be replicated adequately, completeFile will return false 
forever.  I had a change previously which at first called completeFile, but 
then switched to recoverLease after a few tries.  But it seemed like such a 
corner csae for a corner case that it wasn't worth doing.

I agree that there are some thorny issues surrounding leases and multiple 
clients.  I looked at this for a long time and concluded that it's impossible 
to solve these problems without switching the lease mechanism to use (our 
globally unique) inode numbers.

One example is: suppose you have two threads, T1 and T2.  They both have a 
client name of C.

T1 creates a file /foo/bar, writes some stuff, and tries to close.  But he 
fails and becomes a zombie.

At some point later, T2 creates /baz/bar.  Now, /baz is a symlink to /foo.  So 
now the NameNode recovers the lease.  But will the zombie recovery thread stomp 
on T2?  It definitely might.

The problem is that a close attempt needs to be associated with a particular 
file creation attempt.  Right now, all we have is a path and a client name, and 
these aren't enough to uniquely identify the file creation.  Your point is that 
we should be stricter in matching the client name in create with the client 
name in completeFile/recoverLease.  But even being stricter there won't close 
all the holes.

Maybe a good compromise in the meantime is to expose basically expose 
recoverLeaseInternal(force=false), by adding an optional boolean parameter to 
the recoverLease protobuf.  In the long term, we eventually need a more 
extensive rework of the leases to be inode-based, which would fix a lot of 
other sore spots as well (like the rename of open files issue.)

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3755) Creating an already-open-for-write file with overwrite=true fails

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739199#comment-13739199
 ] 

Suresh Srinivas commented on HDFS-3755:
---

Given a regression from branch-1 was fixed in this Jira, why is it incompatible?



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


> Creating an already-open-for-write file with overwrite=true fails
> -
>
> Key: HDFS-3755
> URL: https://issues.apache.org/jira/browse/HDFS-3755
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 3.0.0, 2.0.2-alpha
>
> Attachments: hdfs-3755.txt, hdfs-3755.txt
>
>
> If a file is already open for write by one client, and another client calls 
> {{fs.create()}} with {{overwrite=true}}, the file should be deleted and the 
> new file successfully created. Instead, it is currently throwing 
> AlreadyBeingCreatedException.
> This is a regression since branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2887) Define a FSVolume interface

2013-08-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2887:
-

Release Note: FSVolume, is a part of FSDatasetInterface implementation, 
should not be referred outside FSDataset.  A new FSVolumeInterface is defined.  
The BlockVolumeChoosingPolicy.chooseVolume(..) method signature is also updated.

> Define a FSVolume interface
> ---
>
> Key: HDFS-2887
> URL: https://issues.apache.org/jira/browse/HDFS-2887
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Fix For: 0.23.2
>
> Attachments: h2887_20120203.patch, h2887_20120207.patch
>
>
> FSVolume is an inner class in FSDataset.  It is actually a part of the 
> implementation of FSDatasetInterface.  It is better to define a new 
> interface, namely FSVolumeInterface, to capture the abstraction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4080) Add a separate logger for block state change logs to enable turning off those logs

2013-08-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-4080:
-

Target Version/s: 0.23.5, 2.0.3-alpha, 3.0.0  (was: 3.0.0, 2.0.3-alpha, 
0.23.5)
Release Note: Add a separate logger "BlockStateChange" for block state 
change logs.
Hadoop Flags: Incompatible change,Reviewed  (was: Reviewed)

> Add a separate logger for block state change logs to enable turning off those 
> logs
> --
>
> Key: HDFS-4080
> URL: https://issues.apache.org/jira/browse/HDFS-4080
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Kihwal Lee
>Assignee: Kihwal Lee
> Fix For: 3.0.0, 2.0.3-alpha, 0.23.5
>
> Attachments: hdfs-4080.1.patch, hdfs-4080-branch-0.23.patch, 
> hdfs-4080.patch, hdfs-4080.patch, hdfs-4080-trunk.patch
>
>
> Although the block-level logging in namenode is useful for debugging, it can 
> add a significant overhead to busy hdfs clusters since they are done while 
> the namespace write lock is held. One example is shown in HDFS-4075. In this 
> example, the write lock was held for 5 minutes while logging 11 million log 
> messages for 5.5 million block invalidation events. 
> It will be useful if we have an option to disable these block-level log 
> messages and keep other state change messages going.  If others feel that 
> they can turned into DEBUG (with addition of isDebugEnabled() checks), that 
> may also work too, but there might be people depending on the messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3755) Creating an already-open-for-write file with overwrite=true fails

2013-08-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3755:
-

Target Version/s: 2.0.2-alpha, 3.0.0  (was: 3.0.0, 2.0.2-alpha)
Release Note: This is an incompatible change: Before this change, if a 
file is already open for write by one client, and another client calls 
fs.create() with overwrite=true, an AlreadyBeingCreatedException is thrown.  
After this change, the file will be deleted and the new file will be created 
successfully.
Hadoop Flags: Reviewed  (was: Incompatible change,Reviewed)

Added release note.

> Creating an already-open-for-write file with overwrite=true fails
> -
>
> Key: HDFS-3755
> URL: https://issues.apache.org/jira/browse/HDFS-3755
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 3.0.0, 2.0.2-alpha
>
> Attachments: hdfs-3755.txt, hdfs-3755.txt
>
>
> If a file is already open for write by one client, and another client calls 
> {{fs.create()}} with {{overwrite=true}}, the file should be deleted and the 
> new file successfully created. Instead, it is currently throwing 
> AlreadyBeingCreatedException.
> This is a regression since branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-08-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739170#comment-13739170
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-2832:
--

Just have created a new branch: 
http://svn.apache.org/viewvc/hadoop/common/branches/HDFS-2832/

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739168#comment-13739168
 ] 

Todd Lipcon commented on HDFS-4504:
---

I don't think {{recoverLease}} is the right API here.. here's an example where 
it could cause problems:

- Process A is writing /file, and loses its network connection right before 
calling close(). Thus it gets registered as a zombie.
- Process B calls append() on the file after the soft lease has expired. This 
allows B to keep appending where A left off.
- Process A recovers its network. The recoverLease() call will then kick 
process B out from writing.

Given that these RPCs are also pathname-based, it could even kick a writer off 
of a new file that just happened to share the file path.

It seems to me like it would be better to call completeFile() or perhaps some 
new abortFile() RPC, which would first verify that the client name trying to 
abort the lease is the same as the current lease holder.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739162#comment-13739162
 ] 

Colin Patrick McCabe commented on HDFS-4504:


It looks like there are a few more unit tests that need to be fixed.  I have 
some fixes, will post later today.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739152#comment-13739152
 ] 

Hadoop QA commented on HDFS-5076:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597857/HDFS-5076.003.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4819//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4819//console

This message is automatically generated.

> Create http servlets to enable querying NN's last applied transaction ID and 
> most recent checkpoint's transaction ID
> 
>
> Key: HDFS-5076
> URL: https://issues.apache.org/jira/browse/HDFS-5076
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
> HDFS-5076.003.patch
>
>
> Currently NameNode already provides RPC calls to get its last applied 
> transaction ID and most recent checkpoint's transaction ID. It can be helpful 
> to provide servlets to enable querying these information through http, so 
> that administrators and applications like Ambari can easily decide if a 
> forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap

2013-08-13 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739145#comment-13739145
 ] 

Andrew Wang commented on HDFS-4953:
---

Overall looks really solid, nice work. I have mostly nitty stuff, only a few 
potential bugs.

I haven't deduped this with Brandon's comments, apologies.

General
- LOG.debug statements should be wrapped in {{LOG.isDebugEnabled}} checks, 
because of the limitations of log4j

hdfs-default.xml
- Has some extra lines of java pasted in.
- For cache size, mention fds, virtual address space, and application working 
set, as hints on how to size the cache properly.
- The timeout javadoc mentions that the timeout is a minimum, but unreferenced 
mmaps can be evicted before the timeout when under cache pressure.

ZeroCopyCursor
- Let's beef up the class javadoc. We need some place that documents the big 
picture of ZCR and how to use it, e.g. how to get and use the cursor properly, 
the purpose of the fallback buffer and when it's used, the implications of the 
skip checksums and short read options. Right now it's sort of there in the 
method javadocs, but it's hard to get a sense of how to use it and how it all 
fits together. An example API usage snippet would be great.
- read() javadoc: EOF here refers to an EOF when reading a block, not EOF of 
the HDFS file. Would prefer to see "end of block".

HdfsZeroCopyCursor
- Would like to see explicit setting of {{allowShortReads}} to false in the 
constructor for clarity.

ZeroCopyUnavailableException
- serialVersionUID should be private

DFSClient
- Comment on the lifecycle of MMAP_MANAGER_FACTORY, it's shared among multiple 
DFSClients which is why the refcount is important.
- Maybe rename {{put}} to {{unref}} or {{close}}? It's not actually "putting" 
in the data structure sense, which is confusing.

ClientMmap
- let's not call people "bad programmers", just say "accidentally leaked 
references".
- {{unmap}}: add to javadoc that it should only be called if the manager has 
been closed, or by the manager with the lock held.
{code}
MappedByteBuffer map= 
{code}
Need a space before the "=".

ClientMmapManager
- In an offline discussion, I asked Colin about using Guava's cache instead of 
building up this reference counting cache infrastructure. Colin explained that 
having a background CacheCleaner thread is preferable for more control and 
being able to do explicit cleanup, which is nice since mmaps use up a file 
descriptor.
- Let's add some javadoc on the lifecycle of cached mmaps, and mention why it's 
important to cache them (performance). IIUC, it only evicts unreferenced 
ClientMmaps, and does this either on cache pressure (when we do a fetch) or on 
a relatively long timescale in the background via the CacheCleaner (15 minutes).
- I think {{fromConf}}-style factory methods are more normally called {{get}}, 
e.g. {{FileSystem.get}}.
- Why is the CacheCleaner executor using half the timeout for the delay and 
period? I'd think the delay can be the timeout (or timeout+period), since 
nothing will expire naturally before that. For the period, I guess we need some 
arbitrary staleness bound (and timeout/2 seems reasonable), but this might be 
worth mentioning in hdfs-default.xml.
- The {{evictable}} javadoc mentions jittering by a nanosecond, but it's being 
keyed off of {{Time.monotonicNow}} which is milliseconds. We might in fact want 
to key off of {{System.nanoTime}} for fewer collisions.
- I think {{evictOne}} would be clearer if you used {{TreeSet#pollFirst}} 
rather than an iterator.
{code}
Iterator> iter =
  evictable.entrySet().iterator(); 
{code}
This has 10 spaces, where elsewhere in the file you use a double-indent of 4.

BlockReaderLocal
- Remaining TODO for blocks bigger than 2GB, want to file a follow-on JIRA for 
this?
- {{readZeroCopy}} catches and re-sets the interrupted status, does something 
else check this later?
- Is it worth re-trying the mmap after a {{CacheCleaner}} period in case some 
space has been freed up in the cache?
- The clientMmap from the manager can be null if the cache is full. I don't see 
a check for this case.

Tests
- Would like to see some tests for cache eviction behavior
- How about a Java test without a backing buffer?
- JNI test has some commented out fprintfs

> enable HDFS local reads via mmap
> 
>
> Key: HDFS-4953
> URL: https://issues.apache.org/jira/browse/HDFS-4953
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch
>
>
> Currently, the short-circuit local read pathway al

[jira] [Commented] (HDFS-4816) transitionToActive blocks if the SBN is doing checkpoint image transfer

2013-08-13 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739135#comment-13739135
 ] 

Aaron T. Myers commented on HDFS-4816:
--

+1, the latest patch looks good to me.

I also checked with Todd offline and confirmed that he's good with the latest 
patch.

Thanks a lot, Andrew.

> transitionToActive blocks if the SBN is doing checkpoint image transfer
> ---
>
> Key: HDFS-4816
> URL: https://issues.apache.org/jira/browse/HDFS-4816
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.4-alpha
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-4816-1.patch, hdfs-4816-2.patch, hdfs-4816-3.patch, 
> hdfs-4816-4.patch, hdfs-4816-slow-shutdown.txt, stacks.out
>
>
> The NN and SBN do this dance during checkpoint image transfer with nested 
> HTTP GETs via {{HttpURLConnection}}. When an admin does a 
> {{-transitionToActive}} during this transfer, part of that is interrupting an 
> ongoing checkpoint so we can transition immediately.
> However, the {{thread.interrupt()}} in {{StandbyCheckpointer#stop}} gets 
> swallowed by {{connection.getResponseCode()}} in 
> {{TransferFsImage#doGetUrl}}. None of the methods in HttpURLConnection throw 
> InterruptedException, so we need to do something else (perhaps HttpClient 
> [1]):
> [1]: http://hc.apache.org/httpclient-3.x/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap

2013-08-13 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739131#comment-13739131
 ] 

Brandon Li commented on HDFS-4953:
--

Some comments and questions:

1. DFSClient: looks like all the DFSClient instances share the same 
ClientMmapManager instance. If this is the case, why not have one static 
ClientMmapManager with a refcount to it, and remove ClientMmapManagerFactory 
class and variable mmapManager?
2. HdfsZeroCopyCursor: might want to also initialize allowShortReads in the 
constructor. 
3. HdfsZeroCopyCursor: readViaSlowPath() throws when shortReads is disallowed 
but fallbackBuffer is not provided. Since shortReads is false by default, the 
user has to remember to setFallbackBuffer before doing zero copy read. Not sure 
which case is more expected by the users, shortReads allowed or disallowed. 
4. DFSInputStream: remove unused import and add debug level check for 
DFSClient.LOG.Debug().
5. TestBlockReader: Assume.assumeTrue(SystemUtils.IS_OS_UNIX), guess you meant 
IS_OS_LINUX
6. test_libhdfs_zerocopy.c: remove repeated
  hdfsBuilderConfSetStr(bld, "dfs.block.size", 
TO_STR(TEST_ZEROCOPY_FULL_BLOCK_SIZE));
7. TestBlockReaderLocal.java: remove unused import
8. please add javadoc to some classes, e.g., ClientMap,ClientMapManager



> enable HDFS local reads via mmap
> 
>
> Key: HDFS-4953
> URL: https://issues.apache.org/jira/browse/HDFS-4953
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 2.3.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, 
> HDFS-4953.003.patch, HDFS-4953.004.patch, HDFS-4953.005.patch, 
> HDFS-4953.006.patch
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access 
> files directly without going through the DataNode.  However, all of these 
> reads involve a copy at the operating system level, since they rely on the 
> read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable 
> truly zero-copy reads.
> In the initial implementation, zero-copy reads will only be performed when 
> checksums were disabled.  Later, we can use the DataNode's cache awareness to 
> only perform zero-copy reads when we know that checksum has already been 
> verified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739095#comment-13739095
 ] 

Hadoop QA commented on HDFS-5076:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597840/HDFS-5076.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4818//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4818//console

This message is automatically generated.

> Create http servlets to enable querying NN's last applied transaction ID and 
> most recent checkpoint's transaction ID
> 
>
> Key: HDFS-5076
> URL: https://issues.apache.org/jira/browse/HDFS-5076
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
> HDFS-5076.003.patch
>
>
> Currently NameNode already provides RPC calls to get its last applied 
> transaction ID and most recent checkpoint's transaction ID. It can be helpful 
> to provide servlets to enable querying these information through http, so 
> that administrators and applications like Ambari can easily decide if a 
> forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5093) TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows

2013-08-13 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated HDFS-5093:


Attachment: HDFS-5093.1.patch

Attach a new patch that puts clean up step in the finally clause to make it 
more robust.

> TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows
> --
>
> Key: HDFS-5093
> URL: https://issues.apache.org/jira/browse/HDFS-5093
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Attachments: HDFS-5093.1.patch, HDFS-5093.patch
>
>
> Some test cases in TestGlobPaths fail on Windows because they try to create a 
> new MiniDFSCluster though there is already one created at {{setUp()}}. This 
> leads to failure on Windows because the new cluster will try to clean the old 
> name node file that was opened by the existing cluster -- on Windows, the 
> process or thread cannot delete the file opened in normal Java APIs by 
> another process or thread.
> An example failure run looks like the following.
> {noformat}
> testGlobWithSymlinksOnFS(org.apache.hadoop.fs.TestGlobPaths)  Time elapsed: 
> 47 sec  <<< ERROR!
> java.io.IOException: Could not fully delete 
> E:\tr\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:334)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
>   at 
> org.apache.hadoop.fs.TestGlobPaths.testOnFileSystem(TestGlobPaths.java:805)
>   at 
> org.apache.hadoop.fs.TestGlobPaths.testGlobWithSymlinksOnFS(TestGlobPaths.java:889)
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4632) globStatus using backslash for escaping does not work on Windows

2013-08-13 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated HDFS-4632:


Attachment: HDFS-4632-trunk.patch

Chris, I attached my patch in HDFS-5090 to simply disable the unit test case.

> globStatus using backslash for escaping does not work on Windows
> 
>
> Key: HDFS-4632
> URL: https://issues.apache.org/jira/browse/HDFS-4632
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
> Attachments: HDFS-4632-trunk.patch
>
>
> {{Path}} normalizes backslashes to forward slashes on Windows.  Later, when 
> passed to {{FileSystem#globStatus}}, the path is no longer treated as an 
> escape sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5093) TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows

2013-08-13 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739083#comment-13739083
 ] 

Chuan Liu commented on HDFS-5093:
-

Note pTestEscape still fails on Windows due to HDFS-4632.

> TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows
> --
>
> Key: HDFS-5093
> URL: https://issues.apache.org/jira/browse/HDFS-5093
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Attachments: HDFS-5093.patch
>
>
> Some test cases in TestGlobPaths fail on Windows because they try to create a 
> new MiniDFSCluster though there is already one created at {{setUp()}}. This 
> leads to failure on Windows because the new cluster will try to clean the old 
> name node file that was opened by the existing cluster -- on Windows, the 
> process or thread cannot delete the file opened in normal Java APIs by 
> another process or thread.
> An example failure run looks like the following.
> {noformat}
> testGlobWithSymlinksOnFS(org.apache.hadoop.fs.TestGlobPaths)  Time elapsed: 
> 47 sec  <<< ERROR!
> java.io.IOException: Could not fully delete 
> E:\tr\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:334)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
>   at 
> org.apache.hadoop.fs.TestGlobPaths.testOnFileSystem(TestGlobPaths.java:805)
>   at 
> org.apache.hadoop.fs.TestGlobPaths.testGlobWithSymlinksOnFS(TestGlobPaths.java:889)
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5093) TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows

2013-08-13 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated HDFS-5093:


Attachment: HDFS-5093.patch

Attach a patch that re-uses the MiniDFSCluster to avoid the file deletion issue.

> TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows
> --
>
> Key: HDFS-5093
> URL: https://issues.apache.org/jira/browse/HDFS-5093
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Attachments: HDFS-5093.patch
>
>
> Some test cases in TestGlobPaths fail on Windows because they try to create a 
> new MiniDFSCluster though there is already one created at {{setUp()}}. This 
> leads to failure on Windows because the new cluster will try to clean the old 
> name node file that was opened by the existing cluster -- on Windows, the 
> process or thread cannot delete the file opened in normal Java APIs by 
> another process or thread.
> An example failure run looks like the following.
> {noformat}
> testGlobWithSymlinksOnFS(org.apache.hadoop.fs.TestGlobPaths)  Time elapsed: 
> 47 sec  <<< ERROR!
> java.io.IOException: Could not fully delete 
> E:\tr\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
>   at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:334)
>   at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
>   at 
> org.apache.hadoop.fs.TestGlobPaths.testOnFileSystem(TestGlobPaths.java:805)
>   at 
> org.apache.hadoop.fs.TestGlobPaths.testGlobWithSymlinksOnFS(TestGlobPaths.java:889)
> ...
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5091:


   Resolution: Fixed
Fix Version/s: 2.1.1-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review Suresh! I've committed this to trunk, branch-2, 
branch-2.1-beta.

> Support for spnego keytab separate from the JournalNode keytab for secure HA
> 
>
> Key: HDFS-5091
> URL: https://issues.apache.org/jira/browse/HDFS-5091
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Fix For: 2.1.1-beta
>
> Attachments: HDFS-5091.001.patch
>
>
> This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
> use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5093) TestGlobPaths should re-use the MiniDFSCluster to avoid failure on Windows

2013-08-13 Thread Chuan Liu (JIRA)
Chuan Liu created HDFS-5093:
---

 Summary: TestGlobPaths should re-use the MiniDFSCluster to avoid 
failure on Windows
 Key: HDFS-5093
 URL: https://issues.apache.org/jira/browse/HDFS-5093
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor


Some test cases in TestGlobPaths fail on Windows because they try to create a 
new MiniDFSCluster though there is already one created at {{setUp()}}. This 
leads to failure on Windows because the new cluster will try to clean the old 
name node file that was opened by the existing cluster -- on Windows, the 
process or thread cannot delete the file opened in normal Java APIs by another 
process or thread.

An example failure run looks like the following.

{noformat}
testGlobWithSymlinksOnFS(org.apache.hadoop.fs.TestGlobPaths)  Time elapsed: 47 
sec  <<< ERROR!
java.io.IOException: Could not fully delete 
E:\tr\hadoop-hdfs-project\hadoop-hdfs\target\test\data\dfs\name1
at 
org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:759)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:644)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:334)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:316)
at 
org.apache.hadoop.fs.TestGlobPaths.testOnFileSystem(TestGlobPaths.java:805)
at 
org.apache.hadoop.fs.TestGlobPaths.testGlobWithSymlinksOnFS(TestGlobPaths.java:889)
...
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739075#comment-13739075
 ] 

Hudson commented on HDFS-5091:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4255 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4255/])
HDFS-5091. Support for spnego keytab separate from the JournalNode keytab for 
secure HA. Contributed by Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513700)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNodeHttpServer.java


> Support for spnego keytab separate from the JournalNode keytab for secure HA
> 
>
> Key: HDFS-5091
> URL: https://issues.apache.org/jira/browse/HDFS-5091
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5091.001.patch
>
>
> This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
> use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-08-13 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: 20130813-HeterogeneousStorage.pdf

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-08-13 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: (was: 20130813-HeterogeneousStorage.pdf)

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID

2013-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5076:


Attachment: HDFS-5076.003.patch

Update the patch to address Suresh's comments. Also add some unit tests.

> Create http servlets to enable querying NN's last applied transaction ID and 
> most recent checkpoint's transaction ID
> 
>
> Key: HDFS-5076
> URL: https://issues.apache.org/jira/browse/HDFS-5076
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch, 
> HDFS-5076.003.patch
>
>
> Currently NameNode already provides RPC calls to get its last applied 
> transaction ID and most recent checkpoint's transaction ID. It can be helpful 
> to provide servlets to enable querying these information through http, so 
> that administrators and applications like Ambari can easily decide if a 
> forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739045#comment-13739045
 ] 

Suresh Srinivas commented on HDFS-5051:
---

bq. In cache report, do we need to have block length and generation stamp or 
just the block ID suffices?
I do not think this is captured in HDFS-5053. I also think that is not the 
right place to do this change. Lets discuss if generation stamp and block 
length is required in cache report in this jira. Based on the outcome we can 
create another jira to address it.

> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739042#comment-13739042
 ] 

Suresh Srinivas commented on HDFS-5051:
---

Can you comment on this:
bq. I see the code related to initial cache report after random jitter. When a 
datanode starts, do we expect any thing to be in the cache at all? 

> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2832) Enable support for heterogeneous storages in HDFS

2013-08-13 Thread Arpit Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arpit Agarwal updated HDFS-2832:


Attachment: 20130813-HeterogeneousStorage.pdf

Here is a design doc for this feature.

We will be creating a new HDFS-2832 branch for this work. We plan to split the 
feature into more sub-tasks and work on it in two phases:
# Heterogeneous storage support in HDFS
# APIs to expose support to applications

Any feedback on the document is welcome. We can schedule a meeting to have a 
discussion on the feature  some time in late August or early September 
depending on the interest.

> Enable support for heterogeneous storages in HDFS
> -
>
> Key: HDFS-2832
> URL: https://issues.apache.org/jira/browse/HDFS-2832
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.24.0
>Reporter: Suresh Srinivas
>Assignee: Suresh Srinivas
> Attachments: 20130813-HeterogeneousStorage.pdf
>
>
> HDFS currently supports configuration where storages are a list of 
> directories. Typically each of these directories correspond to a volume with 
> its own file system. All these directories are homogeneous and therefore 
> identified as a single storage at the namenode. I propose, change to the 
> current model where Datanode * is a * storage, to Datanode * is a collection 
> * of strorages. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738991#comment-13738991
 ] 

Suresh Srinivas commented on HDFS-5076:
---

This looks lot simpler than previous patch. Thanks for doing this change. Minor 
comment: getNameTxnIds could be named as getJournalTransactionInfo. Please do 
add tests.

> Create http servlets to enable querying NN's last applied transaction ID and 
> most recent checkpoint's transaction ID
> 
>
> Key: HDFS-5076
> URL: https://issues.apache.org/jira/browse/HDFS-5076
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch
>
>
> Currently NameNode already provides RPC calls to get its last applied 
> transaction ID and most recent checkpoint's transaction ID. It can be helpful 
> to provide servlets to enable querying these information through http, so 
> that administrators and applications like Ambari can easily decide if a 
> forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5076) Create http servlets to enable querying NN's last applied transaction ID and most recent checkpoint's transaction ID

2013-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5076:


Attachment: HDFS-5076.002.patch

Thanks for the comments Suresh! Update the patch: use MxBean instead of new 
http servlets.

I've verified the new JMX properties in a local cluster. Will add some unit 
tests later.

> Create http servlets to enable querying NN's last applied transaction ID and 
> most recent checkpoint's transaction ID
> 
>
> Key: HDFS-5076
> URL: https://issues.apache.org/jira/browse/HDFS-5076
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5076.001.patch, HDFS-5076.002.patch
>
>
> Currently NameNode already provides RPC calls to get its last applied 
> transaction ID and most recent checkpoint's transaction ID. It can be helpful 
> to provide servlets to enable querying these information through http, so 
> that administrators and applications like Ambari can easily decide if a 
> forced checkpoint by calling saveNamespace is necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738949#comment-13738949
 ] 

Colin Patrick McCabe commented on HDFS-5051:


bq. Nit: Document the \@return DatanodeProtocol#cacheReport.  Nit: "used to 
communicated blocks" -> "used to communicate blocks"

This JIRA is already committed (it's marked as "fixed").  Let's do this in 
HDFS-5053.  I have pasted the comments there so we won't forget.

bq. In cache report, do we need to have block length and generation stamp or 
just the block ID suffices?

Let's address this later in HDFS-5053.

bq. Should we send cache report only if some thing in the cache changes, 
instead of blindly every 10 seconds? This could happen in a later patch/jira.

We filed HDFS-5092 for this.

> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5053) NameNode should invoke DataNode APIs to coordinate caching

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738953#comment-13738953
 ] 

Colin Patrick McCabe commented on HDFS-5053:


Let's remember to make these style fixes:
bq. Nit: Document the @return DatanodeProtocol#cacheReport
bq. Nit: "used to communicated blocks" -> "used to communicate blocks"


> NameNode should invoke DataNode APIs to coordinate caching
> --
>
> Key: HDFS-5053
> URL: https://issues.apache.org/jira/browse/HDFS-5053
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>
> The NameNode should invoke the DataNode APIs to coordinate caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738945#comment-13738945
 ] 

Jing Zhao commented on HDFS-5091:
-

The failed test is mainly caused by "java.net.BindException: Address already in 
use". I think it should be un-related to the patch.

> Support for spnego keytab separate from the JournalNode keytab for secure HA
> 
>
> Key: HDFS-5091
> URL: https://issues.apache.org/jira/browse/HDFS-5091
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5091.001.patch
>
>
> This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
> use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5087) Allowing specific JAVA heap max setting for HDFS related services

2013-08-13 Thread Kai Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738942#comment-13738942
 ] 

Kai Zheng commented on HDFS-5087:
-

It only changed a script and was tested manually.

> Allowing specific JAVA heap max setting for HDFS related services
> -
>
> Key: HDFS-5087
> URL: https://issues.apache.org/jira/browse/HDFS-5087
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: scripts
>Reporter: Kai Zheng
>Priority: Minor
> Attachments: HDFS-5087.patch
>
>
> This allows specific JAVA heap max setting for HDFS related services as it 
> does for YARN services, to be consistent. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738920#comment-13738920
 ] 

Hadoop QA commented on HDFS-5091:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597803/HDFS-5091.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.web.TestWebHdfsTimeouts

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4816//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4816//console

This message is automatically generated.

> Support for spnego keytab separate from the JournalNode keytab for secure HA
> 
>
> Key: HDFS-5091
> URL: https://issues.apache.org/jira/browse/HDFS-5091
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5091.001.patch
>
>
> This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
> use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail

2013-08-13 Thread Tao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-2994:
--

Attachment: HDFS-2994_3.patch

> If lease is recovered successfully inline with create, create can fail
> --
>
> Key: HDFS-2994
> URL: https://issues.apache.org/jira/browse/HDFS-2994
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: amith
> Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
> HDFS-2994_3.patch
>
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
> [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
> DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
> then the INode will be replaced with a new one, meaning the later 
> {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail

2013-08-13 Thread Tao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-2994:
--

Attachment: (was: HDFS-2994_3.patch)

> If lease is recovered successfully inline with create, create can fail
> --
>
> Key: HDFS-2994
> URL: https://issues.apache.org/jira/browse/HDFS-2994
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: amith
> Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
> HDFS-2994_3.patch
>
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
> [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
> DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
> then the INode will be replaced with a new one, meaning the later 
> {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3245) Add metrics and web UI for cluster version summary

2013-08-13 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738901#comment-13738901
 ] 

Ravi Prakash commented on HDFS-3245:


Hi Todd and folks!
I'm going to take this up. Please let me know your opinions on these following 
questions:

bq. Summary at the top: Indicates either "All versions in the cluster are 
consistent" or "Rolling upgrade in progress - some nodes have mismatched 
versions."
I presume we expect this on the dfshealth.jsp page. Please correct me if I'm 
wrong.

{quote}
Per-version summary. For each version, list:
Number of DNs with this version
If the number is fewer than 10, list their addresses.
{quote}
Were you thinking of a page different from dfsnodelist.jsp? Or just another 
column in that table (and make the table display more spiffy)?

bq.NNs with this version (list addresses)
Should this be on the dfsclusterhealth.jsp page? Or again, on the different 
page?



> Add metrics and web UI for cluster version summary
> --
>
> Key: HDFS-3245
> URL: https://issues.apache.org/jira/browse/HDFS-3245
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Ravi Prakash
>
> With the introduction of protocol compatibility, once HDFS-2983 is committed, 
> we have the possibility that different nodes in a cluster are running 
> different software versions. To aid operators, we should add the ability to 
> summarize the status of versions in the cluster, so they can easily determine 
> whether a rolling upgrade is in progress or if some nodes "missed" an upgrade 
> (eg maybe they were out of service when the software was updated)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2994) If lease is recovered successfully inline with create, create can fail

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738902#comment-13738902
 ] 

Hadoop QA commented on HDFS-2994:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597802/HDFS-2994_3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4815//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4815//console

This message is automatically generated.

> If lease is recovered successfully inline with create, create can fail
> --
>
> Key: HDFS-2994
> URL: https://issues.apache.org/jira/browse/HDFS-2994
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: amith
> Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
> HDFS-2994_3.patch
>
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
> [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
> DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
> then the INode will be replaced with a new one, meaning the later 
> {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1373#comment-1373
 ] 

Hadoop QA commented on HDFS-4504:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597800/HDFS-4504.011.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The following test timeouts occurred in 
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestFileCreationDelete
org.apache.hadoop.hdfs.TestHFlush
org.apache.hadoop.hdfs.TestFileCreationClient

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4814//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4814//console

This message is automatically generated.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738841#comment-13738841
 ] 

Suresh Srinivas commented on HDFS-5051:
---

Some comments:
# Nit: Document the @return DatanodeProtocol#cacheReport
# Nit: "used to communicated blocks" -> "used to communicate blocks"
# In cache report, do we need to have block length and generation stamp or just 
the block ID suffices?
# I see the code related to initial cache report after random jitter. When a 
datanode starts, do we expect any thing to be in the cache at all? Also 10 
seconds cache report means, quite a lot of messages from datanode to namenode. 
Should we send cache report only if some thing in the cache changes, instead of 
blindly every 10 seconds? This could happen in a later patch/jira.


> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-2268) Remove unused paranamer processing

2013-08-13 Thread Luke Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Lu resolved HDFS-2268.
---

Resolution: Fixed

> Remove unused paranamer processing
> --
>
> Key: HDFS-2268
> URL: https://issues.apache.org/jira/browse/HDFS-2268
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.23.0
>Reporter: Luke Lu
>Assignee: Luke Lu
> Fix For: 0.24.0
>
>
> The paranamer processing is required by older avro. HDFS currently doesn't 
> use avro at all. Removing the paranamer processing is required for 
> HADOOP-7264, which upgrade avro to the current version. The change will be 
> part of the atomic HADOOP-7264 commit. The jira is created for HDFS change 
> log.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738835#comment-13738835
 ] 

Colin Patrick McCabe commented on HDFS-5051:


bq. Andrew, quick comment - do you expect datanode to send cache report every 
10 seconds or only when datanode has cache enabled?

Once we piggyback cache additions and removals on the existing DN heartbeats 
(see HDFS-5092), we only need to send full cache reports rarely-- maybe every 
few minutes.  At that point, it should not be a huge issue to send an empty 
full cache report every 10 minutes if caching is disabled.  If we want to 
optimize it further for the no-cache case, we can follow up with that later.

> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5092) piggyback incremental cache reports on DN heartbeats

2013-08-13 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-5092:
--

 Summary: piggyback incremental cache reports on DN heartbeats
 Key: HDFS-5092
 URL: https://issues.apache.org/jira/browse/HDFS-5092
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Colin Patrick McCabe
Priority: Minor


We should send incremental cache reports as part of DN heartbeats, similar to 
how we do incremental block reports.  Then we would only need to send full 
cache reports rarely (again similar to full block reports).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe resolved HDFS-5051.


Resolution: Fixed

> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738821#comment-13738821
 ] 

Suresh Srinivas commented on HDFS-5051:
---

Andrew, quick comment - do you expect datanode to send cache report every 10 
seconds or only when datanode has cache enabled?

> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5068) Convert NNThroughputBenchmark to a Tool to allow generic options.

2013-08-13 Thread Ravi Prakash (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738819#comment-13738819
 ] 

Ravi Prakash commented on HDFS-5068:


Thanks Konstantin! My +1 stands!

> Convert NNThroughputBenchmark to a Tool to allow generic options.
> -
>
> Key: HDFS-5068
> URL: https://issues.apache.org/jira/browse/HDFS-5068
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: benchmarks
>Reporter: Konstantin Shvachko
>Assignee: Konstantin Shvachko
> Attachments: NNThBenchTool.patch, NNThBenchTool.patch
>
>
> Currently NNThroughputBenchmark does not recognize generic options like 
> -conf, etc. A simple way to enable such functionality is to make it implement 
> Tool interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738816#comment-13738816
 ] 

Suresh Srinivas commented on HDFS-5091:
---

+1 for the patch.

> Support for spnego keytab separate from the JournalNode keytab for secure HA
> 
>
> Key: HDFS-5091
> URL: https://issues.apache.org/jira/browse/HDFS-5091
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5091.001.patch
>
>
> This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
> use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5051) Propagate cache status information from the DataNode to the NameNode

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738808#comment-13738808
 ] 

Colin Patrick McCabe commented on HDFS-5051:


+1

> Propagate cache status information from the DataNode to the NameNode
> 
>
> Key: HDFS-5051
> URL: https://issues.apache.org/jira/browse/HDFS-5051
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>Assignee: Andrew Wang
> Attachments: hdfs-5051-1.patch, hdfs-5051-2.patch
>
>
> The DataNode needs to inform the NameNode of its current cache state. Let's 
> wire up the RPCs and stub out the relevant methods on the DN and NN side.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5091:


Attachment: HDFS-5091.001.patch

Upload the patch.

> Support for spnego keytab separate from the JournalNode keytab for secure HA
> 
>
> Key: HDFS-5091
> URL: https://issues.apache.org/jira/browse/HDFS-5091
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5091.001.patch
>
>
> This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
> use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Jing Zhao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5091:


Status: Patch Available  (was: Open)

> Support for spnego keytab separate from the JournalNode keytab for secure HA
> 
>
> Key: HDFS-5091
> URL: https://issues.apache.org/jira/browse/HDFS-5091
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Minor
> Attachments: HDFS-5091.001.patch
>
>
> This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also 
> use the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5091) Support for spnego keytab separate from the JournalNode keytab for secure HA

2013-08-13 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-5091:
---

 Summary: Support for spnego keytab separate from the JournalNode 
keytab for secure HA
 Key: HDFS-5091
 URL: https://issues.apache.org/jira/browse/HDFS-5091
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor


This is similar to HDFS-3466 and HDFS-4105: for JournalNode we should also use 
the web keytab file for SPNEGO filter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active

2013-08-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738751#comment-13738751
 ] 

Jing Zhao commented on HDFS-5080:
-

[~tlipcon], could you take a look at the new patch? Thanks!

> BootstrapStandby not working with QJM when the existing NN is active
> 
>
> Key: HDFS-5080
> URL: https://issues.apache.org/jira/browse/HDFS-5080
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5080.000.patch, HDFS-5080.001.patch, 
> HDFS-5080.002.patch
>
>
> Currently when QJM is used, running BootstrapStandby while the existing NN is 
> active can get the following exception:
> {code}
> FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 
> from the configured shared edits storage. Please copy these logs into the 
> shared edits storage or call saveNamespace on the active node.
> Error: Gap in transactions. Expected to be able to read up until at least 
> txid 6175405 but unable to find any edit logs containing txid 6175405
> java.io.IOException: Gap in transactions. Expected to be able to read up 
> until at least txid 6175405 but unable to find any edit logs containing txid 
> 6175405
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229)
> {code}
> Looks like the cause of the exception is that, when the active NN is queries 
> by BootstrapStandby about the last written transaction ID, the in-progress 
> edit log segment is included. However, when journal nodes are asked about the 
> last written transaction ID, in-progress edit log is excluded. This causes 
> BootstrapStandby#checkLogsAvailableForRead to complain gaps. 
> To fix this, we can either let journal nodes take into account the 
> in-progress editlog, or let active NN exclude the in-progress edit log 
> segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2994) If lease is recovered successfully inline with create, create can fail

2013-08-13 Thread Tao Luo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Luo updated HDFS-2994:
--

Attachment: HDFS-2994_3.patch

> If lease is recovered successfully inline with create, create can fail
> --
>
> Key: HDFS-2994
> URL: https://issues.apache.org/jira/browse/HDFS-2994
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: amith
> Attachments: HDFS-2994_1.patch, HDFS-2994_1.patch, HDFS-2994_2.patch, 
> HDFS-2994_3.patch
>
>
> I saw the following logs on my test cluster:
> {code}
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
> [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
> DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
> 2012-02-22 14:35:22,887 INFO 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
>  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
> pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
> internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
> closed.
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
> NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
> /benchmarks/TestDFSIO/io_data/test_io_6
> {code}
> It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
> then the INode will be replaced with a new one, meaning the later 
> {{replaceNode}} call can fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-4504:
---

Attachment: HDFS-4504.011.patch

* rename DFSClient#endFileLease to DFSClient#stopRenewingFileLease

* add DFSOutputStream#closeCalled as a separate boolean from 
DFSOutputStream#close.  This ensures that the first time a user calls 
DFSOutputStream#close, we actually call completeFile.

* Call completeFile before stopping renewing the file lease, not after.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch, HDFS-4504.011.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5053) NameNode should invoke DataNode APIs to coordinate caching

2013-08-13 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5053:
---

Description: The NameNode should invoke the DataNode APIs to coordinate 
caching.  (was: The NameNode should invoke the DataNode mlock APIs to 
coordinate caching.)

> NameNode should invoke DataNode APIs to coordinate caching
> --
>
> Key: HDFS-5053
> URL: https://issues.apache.org/jira/browse/HDFS-5053
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>
> The NameNode should invoke the DataNode APIs to coordinate caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5053) NameNode should invoke DataNode APIs to coordinate caching

2013-08-13 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5053:
---

Summary: NameNode should invoke DataNode APIs to coordinate caching  (was: 
NameNode should invoke DataNode mlock APIs to coordinate caching)

> NameNode should invoke DataNode APIs to coordinate caching
> --
>
> Key: HDFS-5053
> URL: https://issues.apache.org/jira/browse/HDFS-5053
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>
> The NameNode should invoke the DataNode mlock APIs to coordinate caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5053) NameNode should invoke DataNode APIs to coordinate caching

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738748#comment-13738748
 ] 

Colin Patrick McCabe commented on HDFS-5053:


You are correct-- mlock is an implementation detail, not relevant to the 
NameNode.  I removed the word "mlock" from the summary and description

> NameNode should invoke DataNode APIs to coordinate caching
> --
>
> Key: HDFS-5053
> URL: https://issues.apache.org/jira/browse/HDFS-5053
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>
> The NameNode should invoke the DataNode APIs to coordinate caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4329) DFSShell issues with directories with spaces in name

2013-08-13 Thread Cristina L. Abad (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738700#comment-13738700
 ] 

Cristina L. Abad commented on HDFS-4329:


Findbug warnings related to class 
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem were not introduced by this 
patch. 

> DFSShell issues with directories with spaces in name
> 
>
> Key: HDFS-4329
> URL: https://issues.apache.org/jira/browse/HDFS-4329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 0.23.10, 2.1.1-beta
>Reporter: Andy Isaacson
>Assignee: Cristina L. Abad
> Attachments: 4329.branch-0.23.patch, 4329.branch-0.23.v3.patch, 
> 4329.branch-2.patch, 4329.trunk.patch, 4329.trunk.v2.patch, 
> 4329.trunk.v3.patch
>
>
> This bug was discovered by Casey Ching.
> The command {{dfs -put /foo/hello.txt dir}} is supposed to create 
> {{dir/hello.txt}} on HDFS.  It doesn't work right if "dir" has a space in it:
> {code}
> [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat'
> [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat'
> [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd'
> cat: `space cat/motd': No such file or directory
> [adi@haus01 ~]$ hdfs dfs -ls space\*
> Found 1 items
> -rw-r--r--   2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd
> [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd'
> Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64)
> ...
> {code}
> Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded 
> directory name, turning {{%20}} into {{%2520}}.  It does the same thing with 
> space:
> {code}
> [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo'
> [adi@haus01 ~]$ hdfs dfs -ls 'space cat'
> Found 1 items
> -rw-r--r--   2 adi supergroup  0 2012-12-20 11:36 space%20cat/foo
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4329) DFSShell issues with directories with spaces in name

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738661#comment-13738661
 ] 

Hadoop QA commented on HDFS-4329:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597760/4329.trunk.v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4813//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4813//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4813//console

This message is automatically generated.

> DFSShell issues with directories with spaces in name
> 
>
> Key: HDFS-4329
> URL: https://issues.apache.org/jira/browse/HDFS-4329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 0.23.10, 2.1.1-beta
>Reporter: Andy Isaacson
>Assignee: Cristina L. Abad
> Attachments: 4329.branch-0.23.patch, 4329.branch-0.23.v3.patch, 
> 4329.branch-2.patch, 4329.trunk.patch, 4329.trunk.v2.patch, 
> 4329.trunk.v3.patch
>
>
> This bug was discovered by Casey Ching.
> The command {{dfs -put /foo/hello.txt dir}} is supposed to create 
> {{dir/hello.txt}} on HDFS.  It doesn't work right if "dir" has a space in it:
> {code}
> [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat'
> [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat'
> [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd'
> cat: `space cat/motd': No such file or directory
> [adi@haus01 ~]$ hdfs dfs -ls space\*
> Found 1 items
> -rw-r--r--   2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd
> [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd'
> Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64)
> ...
> {code}
> Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded 
> directory name, turning {{%20}} into {{%2520}}.  It does the same thing with 
> space:
> {code}
> [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo'
> [adi@haus01 ~]$ hdfs dfs -ls 'space cat'
> Found 1 items
> -rw-r--r--   2 adi supergroup  0 2012-12-20 11:36 space%20cat/foo
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738646#comment-13738646
 ] 

Colin Patrick McCabe commented on HDFS-4504:


You're right-- we're going to need a solution for ensuring that completeFile is 
called when the stream closes itself due to IOException.

> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active

2013-08-13 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738628#comment-13738628
 ] 

Jing Zhao commented on HDFS-5080:
-

The javadoc warnings should be introduced by HADOOP-9848.

> BootstrapStandby not working with QJM when the existing NN is active
> 
>
> Key: HDFS-5080
> URL: https://issues.apache.org/jira/browse/HDFS-5080
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5080.000.patch, HDFS-5080.001.patch, 
> HDFS-5080.002.patch
>
>
> Currently when QJM is used, running BootstrapStandby while the existing NN is 
> active can get the following exception:
> {code}
> FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 
> from the configured shared edits storage. Please copy these logs into the 
> shared edits storage or call saveNamespace on the active node.
> Error: Gap in transactions. Expected to be able to read up until at least 
> txid 6175405 but unable to find any edit logs containing txid 6175405
> java.io.IOException: Gap in transactions. Expected to be able to read up 
> until at least txid 6175405 but unable to find any edit logs containing txid 
> 6175405
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229)
> {code}
> Looks like the cause of the exception is that, when the active NN is queries 
> by BootstrapStandby about the last written transaction ID, the in-progress 
> edit log segment is included. However, when journal nodes are asked about the 
> last written transaction ID, in-progress edit log is excluded. This causes 
> BootstrapStandby#checkLogsAvailableForRead to complain gaps. 
> To fix this, we can either let journal nodes take into account the 
> in-progress editlog, or let active NN exclude the in-progress edit log 
> segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5089) When a LayoutVersion support SNAPSHOT, it must support FSIMAGE_NAME_OPTIMIZATION.

2013-08-13 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738616#comment-13738616
 ] 

Todd Lipcon commented on HDFS-5089:
---

Why does RESERVED_REL_1_3_0 support SNAPSHOT? I haven't seen any discussion 
about backporting snapshots to branch-1, and I'd be pretty strongly against it. 
Maybe this should be a discussion on the dev list?

> When a LayoutVersion support SNAPSHOT, it must support 
> FSIMAGE_NAME_OPTIMIZATION.
> -
>
> Key: HDFS-5089
> URL: https://issues.apache.org/jira/browse/HDFS-5089
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5089_20130813.patch
>
>
> The SNAPSHOT layout requires FSIMAGE_NAME_OPTIMIZATION as a prerequisite.  
> However, RESERVED_REL1_3_0 supports SNAPSHOT but not 
> FSIMAGE_NAME_OPTIMIZATION.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-3245) Add metrics and web UI for cluster version summary

2013-08-13 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash reassigned HDFS-3245:
--

Assignee: Ravi Prakash

> Add metrics and web UI for cluster version summary
> --
>
> Key: HDFS-3245
> URL: https://issues.apache.org/jira/browse/HDFS-3245
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>Assignee: Ravi Prakash
>
> With the introduction of protocol compatibility, once HDFS-2983 is committed, 
> we have the possibility that different nodes in a cluster are running 
> different software versions. To aid operators, we should add the ability to 
> summarize the status of versions in the cluster, so they can easily determine 
> whether a rolling upgrade is in progress or if some nodes "missed" an upgrade 
> (eg maybe they were out of service when the software was updated)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5089) When a LayoutVersion support SNAPSHOT, it must support FSIMAGE_NAME_OPTIMIZATION.

2013-08-13 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738592#comment-13738592
 ] 

Arpit Agarwal commented on HDFS-5089:
-

+1 for the patch.

Verified the new test passes with the fix to {{LayoutVersion}} and fails 
without.

> When a LayoutVersion support SNAPSHOT, it must support 
> FSIMAGE_NAME_OPTIMIZATION.
> -
>
> Key: HDFS-5089
> URL: https://issues.apache.org/jira/browse/HDFS-5089
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5089_20130813.patch
>
>
> The SNAPSHOT layout requires FSIMAGE_NAME_OPTIMIZATION as a prerequisite.  
> However, RESERVED_REL1_3_0 supports SNAPSHOT but not 
> FSIMAGE_NAME_OPTIMIZATION.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2882) DN continues to start up, even if block pool fails to initialize

2013-08-13 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738588#comment-13738588
 ] 

Colin Patrick McCabe commented on HDFS-2882:


Did you reproduce the problem?  If so, what were the steps to reproduce?

Also, your patch seems to make the DataNode loop endlessly trying to initialize 
any block pools that don't come up.  I don't think that's what we want to do 
here.

> DN continues to start up, even if block pool fails to initialize
> 
>
> Key: HDFS-2882
> URL: https://issues.apache.org/jira/browse/HDFS-2882
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.2-alpha
>Reporter: Todd Lipcon
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-2882.patch, hdfs-2882.txt
>
>
> I started a DN on a machine that was completely out of space on one of its 
> drives. I saw the following:
> 2012-02-02 09:56:50,499 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-448349972-172.29.5.192-1323816762969 (storage id 
> DS-507718931-172.29.5.194-11072-12978
> 42002148) service to styx01.sf.cloudera.com/172.29.5.192:8021
> java.io.IOException: Mkdirs failed to create 
> /data/1/scratch/todd/styx-datadir/current/BP-448349972-172.29.5.192-1323816762969/tmp
> at 
> org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockPoolSlice.(FSDataset.java:335)
> but the DN continued to run, spewing NPEs when it tried to do block reports, 
> etc. This was on the HDFS-1623 branch but may affect trunk as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4632) globStatus using backslash for escaping does not work on Windows

2013-08-13 Thread Chuan Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738507#comment-13738507
 ] 

Chuan Liu commented on HDFS-4632:
-

I tend to agree 1) is the only feasible solution in short term.

Is it possible to only use "file:///path/foo/a/b/c" format for local path? So 
we could effectively get rid of forward slash on Windows. I understand this 
breaks backward compatibility. But we will have a unified format for paths on 
different OS's. Users may be less confused of which format to use. Any thoughts 
on this?

> globStatus using backslash for escaping does not work on Windows
> 
>
> Key: HDFS-4632
> URL: https://issues.apache.org/jira/browse/HDFS-4632
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Chris Nauroth
>Assignee: Chris Nauroth
>
> {{Path}} normalizes backslashes to forward slashes on Windows.  Later, when 
> passed to {{FileSystem#globStatus}}, the path is no longer treated as an 
> escape sequence.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-5090) Disable TestGlobPaths#pTestEscape on Windows

2013-08-13 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu resolved HDFS-5090.
-

Resolution: Duplicate

Just realize this is a duplicate of HDFS-4632. Resolving this one. We can use 
the old JIRA for discussion.

> Disable TestGlobPaths#pTestEscape on Windows
> 
>
> Key: HDFS-5090
> URL: https://issues.apache.org/jira/browse/HDFS-5090
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Attachments: HDFS-5090-trunk.patch
>
>
> We should skip this test case on Windows because '\' is not treated as an 
> escaping character on Windows -- it is treated as an alternative path 
> separator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5090) Disable TestGlobPaths#pTestEscape on Windows

2013-08-13 Thread Chuan Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuan Liu updated HDFS-5090:


Attachment: HDFS-5090-trunk.patch

Attaching a patch to disable this test on Windows.

> Disable TestGlobPaths#pTestEscape on Windows
> 
>
> Key: HDFS-5090
> URL: https://issues.apache.org/jira/browse/HDFS-5090
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.3.0
>Reporter: Chuan Liu
>Assignee: Chuan Liu
>Priority: Minor
> Attachments: HDFS-5090-trunk.patch
>
>
> We should skip this test case on Windows because '\' is not treated as an 
> escaping character on Windows -- it is treated as an alternative path 
> separator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5090) Disable TestGlobPaths#pTestEscape on Windows

2013-08-13 Thread Chuan Liu (JIRA)
Chuan Liu created HDFS-5090:
---

 Summary: Disable TestGlobPaths#pTestEscape on Windows
 Key: HDFS-5090
 URL: https://issues.apache.org/jira/browse/HDFS-5090
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.3.0
Reporter: Chuan Liu
Assignee: Chuan Liu
Priority: Minor


We should skip this test case on Windows because '\' is not treated as an 
escaping character on Windows -- it is treated as an alternative path separator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5055) nn->2nn ignores dfs.namenode.secondary.http-address

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738487#comment-13738487
 ] 

Hadoop QA commented on HDFS-5055:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597731/HDFS-5055.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4812//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4812//console

This message is automatically generated.

> nn->2nn ignores dfs.namenode.secondary.http-address
> ---
>
> Key: HDFS-5055
> URL: https://issues.apache.org/jira/browse/HDFS-5055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.1.0-beta
>Reporter: Allen Wittenauer
>Assignee: Vinay
>Priority: Blocker
>  Labels: regression
> Attachments: HDFS-5055.patch, HDFS-5055.patch
>
>
> The primary namenode attempts to connect back to (incoming hostname):port 
> regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name

2013-08-13 Thread Cristina L. Abad (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cristina L. Abad updated HDFS-4329:
---

Affects Version/s: 2.1.1-beta
   0.23.10
   Status: Patch Available  (was: Open)

> DFSShell issues with directories with spaces in name
> 
>
> Key: HDFS-4329
> URL: https://issues.apache.org/jira/browse/HDFS-4329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0, 0.23.10, 2.1.1-beta
>Reporter: Andy Isaacson
>Assignee: Cristina L. Abad
> Attachments: 4329.branch-0.23.patch, 4329.branch-0.23.v3.patch, 
> 4329.branch-2.patch, 4329.trunk.patch, 4329.trunk.v2.patch, 
> 4329.trunk.v3.patch
>
>
> This bug was discovered by Casey Ching.
> The command {{dfs -put /foo/hello.txt dir}} is supposed to create 
> {{dir/hello.txt}} on HDFS.  It doesn't work right if "dir" has a space in it:
> {code}
> [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat'
> [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat'
> [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd'
> cat: `space cat/motd': No such file or directory
> [adi@haus01 ~]$ hdfs dfs -ls space\*
> Found 1 items
> -rw-r--r--   2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd
> [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd'
> Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64)
> ...
> {code}
> Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded 
> directory name, turning {{%20}} into {{%2520}}.  It does the same thing with 
> space:
> {code}
> [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo'
> [adi@haus01 ~]$ hdfs dfs -ls 'space cat'
> Found 1 items
> -rw-r--r--   2 adi supergroup  0 2012-12-20 11:36 space%20cat/foo
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name

2013-08-13 Thread Cristina L. Abad (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cristina L. Abad updated HDFS-4329:
---

Attachment: 4329.trunk.v3.patch
4329.branch-0.23.v3.patch

Daryn: thanks for suggesting adding those tests! It turns out the 
scheme-qualified type of paths was broken in the branch 2 and trunk patches. 
Attaching new patches for 23 and trunk (trunk one also works for branch 2) with 
the following changes: (1) added 5 more unit tests (relative path, 
scheme-qualified, and absolute/relative/scheme-qualified with globbing); and 
(2) the patch for trunk/branch-2 fixes the problem with decoding 
scheme-qualified paths.

> DFSShell issues with directories with spaces in name
> 
>
> Key: HDFS-4329
> URL: https://issues.apache.org/jira/browse/HDFS-4329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Andy Isaacson
>Assignee: Cristina L. Abad
> Attachments: 4329.branch-0.23.patch, 4329.branch-0.23.v3.patch, 
> 4329.branch-2.patch, 4329.trunk.patch, 4329.trunk.v2.patch, 
> 4329.trunk.v3.patch
>
>
> This bug was discovered by Casey Ching.
> The command {{dfs -put /foo/hello.txt dir}} is supposed to create 
> {{dir/hello.txt}} on HDFS.  It doesn't work right if "dir" has a space in it:
> {code}
> [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat'
> [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat'
> [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd'
> cat: `space cat/motd': No such file or directory
> [adi@haus01 ~]$ hdfs dfs -ls space\*
> Found 1 items
> -rw-r--r--   2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd
> [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd'
> Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64)
> ...
> {code}
> Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded 
> directory name, turning {{%20}} into {{%2520}}.  It does the same thing with 
> space:
> {code}
> [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo'
> [adi@haus01 ~]$ hdfs dfs -ls 'space cat'
> Found 1 items
> -rw-r--r--   2 adi supergroup  0 2012-12-20 11:36 space%20cat/foo
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4329) DFSShell issues with directories with spaces in name

2013-08-13 Thread Cristina L. Abad (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cristina L. Abad updated HDFS-4329:
---

Status: Open  (was: Patch Available)

Per Daryn's suggestion, added more unit test.

> DFSShell issues with directories with spaces in name
> 
>
> Key: HDFS-4329
> URL: https://issues.apache.org/jira/browse/HDFS-4329
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Andy Isaacson
>Assignee: Cristina L. Abad
> Attachments: 4329.branch-0.23.patch, 4329.branch-2.patch, 
> 4329.trunk.patch, 4329.trunk.v2.patch
>
>
> This bug was discovered by Casey Ching.
> The command {{dfs -put /foo/hello.txt dir}} is supposed to create 
> {{dir/hello.txt}} on HDFS.  It doesn't work right if "dir" has a space in it:
> {code}
> [adi@haus01 ~]$ hdfs dfs -mkdir 'space cat'
> [adi@haus01 ~]$ hdfs dfs -put /etc/motd 'space cat'
> [adi@haus01 ~]$ hdfs dfs -cat 'space cat/motd'
> cat: `space cat/motd': No such file or directory
> [adi@haus01 ~]$ hdfs dfs -ls space\*
> Found 1 items
> -rw-r--r--   2 adi supergroup251 2012-12-20 11:16 space%2520cat/motd
> [adi@haus01 ~]$ hdfs dfs -cat 'space%20cat/motd'
> Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-30-generic x86_64)
> ...
> {code}
> Note that the {{dfs -ls}} output wrongly encodes the wrongly encoded 
> directory name, turning {{%20}} into {{%2520}}.  It does the same thing with 
> space:
> {code}
> [adi@haus01 ~]$ hdfs dfs -touchz 'space cat/foo'
> [adi@haus01 ~]$ hdfs dfs -ls 'space cat'
> Found 1 items
> -rw-r--r--   2 adi supergroup  0 2012-12-20 11:36 space%20cat/foo
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738423#comment-13738423
 ] 

Hadoop QA commented on HDFS-3618:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597724/HDFS-3618.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 2 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestSshFenceByTcpPort
  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4811//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4811//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4811//console

This message is automatically generated.

> SSH fencing option may incorrectly succeed if nc (netcat) command not present
> -
>
> Key: HDFS-3618
> URL: https://issues.apache.org/jira/browse/HDFS-3618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.0.0-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Vinay
> Attachments: HDFS-3618.patch, HDFS-3618.patch, zkfc_threaddump.out, 
> zkfc.txt
>
>
> Started NN's and zkfc's in Suse11.
> Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
> work)..
> While executing following command, got command not found hence rc will be 
> other than zero and assuming that server was down..Here we are ending up 
> without checking whether service is down or not..
> {code}
> LOG.info(
> "Indeterminate response from trying to kill service. " +
> "Verifying whether it is running using nc...");
> rc = execCommand(session, "nc -z " + serviceAddr.getHostName() +
> " " + serviceAddr.getPort());
> if (rc == 0) {
>   // the service is still listening - we are unable to fence
>   LOG.warn("Unable to fence - it is running but we cannot kill it");
>   return false;
> } else {
>   LOG.info("Verified that the service is down.");
>   return true;  
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HDFS-5055) nn->2nn ignores dfs.namenode.secondary.http-address

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738310#comment-13738310
 ] 

Suresh Srinivas edited comment on HDFS-5055 at 8/13/13 2:54 PM:


[~vinayrpet] Thanks for jumping on this. The patch looks good. I updated the 
patch with a small change.
Instead of:
{code}
   String machine = imageListenAddress.getHostName();
   if (machine == null || machine.isEmpty() || machine.equals("0.0.0.0")) {
 machine = null;
   }
{code}

the updated patch has:
{code}
   String machine = imageListenAddress.getAddress().isAnyLocalAddress() ? 
 null : imageListenAddress.getHostName();
{code}


  was (Author: sureshms):
[~vinayrpet] Thanks for jumping on this. The patch looks good. I updated 
the patch with a small change.
Instead of:
{code}
   String machine = imageListenAddress.getHostName();
   if (machine == null || machine.isEmpty() || machine.equals("0.0.0.0")) {
 machine = null;
   }
{code}

the update patch has:
{code}
   String machine = imageListenAddress.getAddress().isAnyLocalAddress() ? 
 null : imageListenAddress.getHostName();
{code}

  
> nn->2nn ignores dfs.namenode.secondary.http-address
> ---
>
> Key: HDFS-5055
> URL: https://issues.apache.org/jira/browse/HDFS-5055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.1.0-beta
>Reporter: Allen Wittenauer
>Assignee: Vinay
>Priority: Blocker
>  Labels: regression
> Attachments: HDFS-5055.patch, HDFS-5055.patch
>
>
> The primary namenode attempts to connect back to (incoming hostname):port 
> regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-5055) nn->2nn ignores dfs.namenode.secondary.http-address

2013-08-13 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-5055:
--

Attachment: HDFS-5055.patch

[~vinayrpet] Thanks for jumping on this. The patch looks good. I updated the 
patch with a small change.
Instead of:
{code}
   String machine = imageListenAddress.getHostName();
   if (machine == null || machine.isEmpty() || machine.equals("0.0.0.0")) {
 machine = null;
   }
{code}

the update patch has:
{code}
   String machine = imageListenAddress.getAddress().isAnyLocalAddress() ? 
 null : imageListenAddress.getHostName();
{code}


> nn->2nn ignores dfs.namenode.secondary.http-address
> ---
>
> Key: HDFS-5055
> URL: https://issues.apache.org/jira/browse/HDFS-5055
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.1.0-beta
>Reporter: Allen Wittenauer
>Assignee: Vinay
>Priority: Blocker
>  Labels: regression
> Attachments: HDFS-5055.patch, HDFS-5055.patch
>
>
> The primary namenode attempts to connect back to (incoming hostname):port 
> regardless of how dfs.namenode.secondary.http-address is configured.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5053) NameNode should invoke DataNode mlock APIs to coordinate caching

2013-08-13 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738257#comment-13738257
 ] 

Suresh Srinivas commented on HDFS-5053:
---

Colin, is the summary correct? Do you really mean NameNode should invoke 
Datanode mlock API? mlock is a detail on how the cache is implemented in the 
datanode. So the right summary would be "Namenode interaction with datanode to 
cache replicas"? This is essentially done by sending list of block Ids that 
need to be cached at the datanode, right?

> NameNode should invoke DataNode mlock APIs to coordinate caching
> 
>
> Key: HDFS-5053
> URL: https://issues.apache.org/jira/browse/HDFS-5053
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Colin Patrick McCabe
>
> The NameNode should invoke the DataNode mlock APIs to coordinate caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738249#comment-13738249
 ] 

Hadoop QA commented on HDFS-4516:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597457/HDFS-4516.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4810//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4810//console

This message is automatically generated.

> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.0.0, 2.0.3-alpha
>Reporter: Uma Maheswara Rao G
>Assignee: Vinay
>Priority: Critical
> Attachments: HDFS-4516.patch, HDFS-4516.patch, HDFS-4516-Test.patch, 
> HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present

2013-08-13 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-3618:


Attachment: HDFS-3618.patch

Posting updated patch

> SSH fencing option may incorrectly succeed if nc (netcat) command not present
> -
>
> Key: HDFS-3618
> URL: https://issues.apache.org/jira/browse/HDFS-3618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.0.0-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Vinay
> Attachments: HDFS-3618.patch, HDFS-3618.patch, zkfc_threaddump.out, 
> zkfc.txt
>
>
> Started NN's and zkfc's in Suse11.
> Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
> work)..
> While executing following command, got command not found hence rc will be 
> other than zero and assuming that server was down..Here we are ending up 
> without checking whether service is down or not..
> {code}
> LOG.info(
> "Indeterminate response from trying to kill service. " +
> "Verifying whether it is running using nc...");
> rc = execCommand(session, "nc -z " + serviceAddr.getHostName() +
> " " + serviceAddr.getPort());
> if (rc == 0) {
>   // the service is still listening - we are unable to fence
>   LOG.warn("Unable to fence - it is running but we cannot kill it");
>   return false;
> } else {
>   LOG.info("Verified that the service is down.");
>   return true;  
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5089) When a LayoutVersion support SNAPSHOT, it must support FSIMAGE_NAME_OPTIMIZATION.

2013-08-13 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738210#comment-13738210
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-5089:
--

The javadoc warnings was introduced by HADOOP-9848 but not related to this.

> When a LayoutVersion support SNAPSHOT, it must support 
> FSIMAGE_NAME_OPTIMIZATION.
> -
>
> Key: HDFS-5089
> URL: https://issues.apache.org/jira/browse/HDFS-5089
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h5089_20130813.patch
>
>
> The SNAPSHOT layout requires FSIMAGE_NAME_OPTIMIZATION as a prerequisite.  
> However, RESERVED_REL1_3_0 supports SNAPSHOT but not 
> FSIMAGE_NAME_OPTIMIZATION.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3656) ZKFC may write a null "breadcrumb" znode

2013-08-13 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738208#comment-13738208
 ] 

Vinay commented on HDFS-3656:
-

I think, HADOOP-9459 is same as this, which is already fixed.

But cannot confirm, as we dont have any particular logs or trace..

Hi [~tlipcon] , Please check, if same can close as duplicate?

> ZKFC may write a null "breadcrumb" znode
> 
>
> Key: HDFS-3656
> URL: https://issues.apache.org/jira/browse/HDFS-3656
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.0.0-alpha
>Reporter: Todd Lipcon
>
> A user [reported|https://issues.cloudera.org/browse/DISTRO-412] an NPE trying 
> to read the "breadcrumb" znode in the failover controller. This happened 
> repeatedly, implying that an earlier process set the znode to null - probably 
> some race, though I don't see anything obvious in the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4450) Duplicate data node on the name node after formatting data node

2013-08-13 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738173#comment-13738173
 ] 

Vinay commented on HDFS-4450:
-

I have verified as mentioned 
1. HA cluster with 3 Datanodes
2. Remove the data directories of one datanode and restart datanode.
3. After restart, NameNode was displaying only one datanode for that host.

I didn't find any duplicate datanodes.


> Duplicate data node on the name node after formatting data node
> ---
>
> Key: HDFS-4450
> URL: https://issues.apache.org/jira/browse/HDFS-4450
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.2-alpha
>Reporter: WenJin Ma
> Attachments: exception.bmp, normal.bmp
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Duplicate data node on the name node after formatting data node。
> When we registered data node,use nodeReg.getXferPort() to find 
> DatanodeDescriptor.
> {code}
>  DatanodeDescriptor nodeN = host2DatanodeMap.getDatanodeByXferAddr(
> nodeReg.getIpAddr(), nodeReg.getXferPort());
> {code}
> but add data node use node.getIpAddr().
> {code}
> /** add node to the map 
>* return true if the node is added; false otherwise.
>*/
>   boolean add(DatanodeDescriptor node) {
> hostmapLock.writeLock().lock();
> try {
>   if (node==null || contains(node)) {
> return false;
>   }
>   
>   String ipAddr = node.getIpAddr();
>   DatanodeDescriptor[] nodes = map.get(ipAddr);
>   DatanodeDescriptor[] newNodes;
>   if (nodes==null) {
> newNodes = new DatanodeDescriptor[1];
> newNodes[0]=node;
>   } else { // rare case: more than one datanode on the host
> newNodes = new DatanodeDescriptor[nodes.length+1];
> System.arraycopy(nodes, 0, newNodes, 0, nodes.length);
> newNodes[nodes.length] = node;
>   }
>   map.put(ipAddr, newNodes);
>   return true;
> } finally {
>   hostmapLock.writeLock().unlock();
> }
>   }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4504) DFSOutputStream#close doesn't always release resources (such as leases)

2013-08-13 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738134#comment-13738134
 ] 

Vinay commented on HDFS-4504:
-

{quote}In some cases, DFSOutputStream#close and DFSOutputStream#lastException 
will be set by the DataStreamer, prior to DFSOutputStream#close being called. 
In those cases, we need to throw an exception from close prior to clearing the 
exception.{quote}
I assume these cases were never handled. Without handling pipeline failure 
cases, this patch will be incomplete.
Pipeline failures while writing data are also most likely to happen.

In case of pipeline failures {{closed}} will be marked {{true}} by DataStreamer 
thread itself (as mentioned already in [~cmccabe] comment). On first call to 
close() will throw the pipeline failure exception, but next calls to close() 
just returns. *So Stream will never be marked as zombie, also resources will 
never be released.*

You can verify by changing your test {{testCloseWithDatanodeDown}} as follows
{code}+  out.write(100);
+  cluster.stopDataNode(0);{code}

to 
{code}+  out.write(100);
+  out.hflush();
+  out.write(100);
+  cluster.stopDataNode(0);{code}

Please check. 


> DFSOutputStream#close doesn't always release resources (such as leases)
> ---
>
> Key: HDFS-4504
> URL: https://issues.apache.org/jira/browse/HDFS-4504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-4504.001.patch, HDFS-4504.002.patch, 
> HDFS-4504.007.patch, HDFS-4504.008.patch, HDFS-4504.009.patch, 
> HDFS-4504.010.patch
>
>
> {{DFSOutputStream#close}} can throw an {{IOException}} in some cases.  One 
> example is if there is a pipeline error and then pipeline recovery fails.  
> Unfortunately, in this case, some of the resources used by the 
> {{DFSOutputStream}} are leaked.  One particularly important resource is file 
> leases.
> So it's possible for a long-lived HDFS client, such as Flume, to write many 
> blocks to a file, but then fail to close it.  Unfortunately, the 
> {{LeaseRenewerThread}} inside the client will continue to renew the lease for 
> the "undead" file.  Future attempts to close the file will just rethrow the 
> previous exception, and no progress can be made by the client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-4516) Client crash after block allocation and NN switch before lease recovery for the same file can cause readers to fail forever

2013-08-13 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-4516:


Assignee: Vinay
  Status: Patch Available  (was: Open)

> Client crash after block allocation and NN switch before lease recovery for 
> the same file can cause readers to fail forever
> ---
>
> Key: HDFS-4516
> URL: https://issues.apache.org/jira/browse/HDFS-4516
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.0.3-alpha, 3.0.0
>Reporter: Uma Maheswara Rao G
>Assignee: Vinay
>Priority: Critical
> Attachments: HDFS-4516.patch, HDFS-4516.patch, HDFS-4516-Test.patch, 
> HDFS-4516.txt
>
>
> If client crashes just after allocating block( blocks not yet created in DNs) 
> and NN also switched after this, then new Namenode will not know about locs.
> Further details will be in comment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-4211) failed volume causes DataNode#getVolumeInfo NPEs on multi-BP DN

2013-08-13 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738112#comment-13738112
 ] 

Vinay commented on HDFS-4211:
-

Latest patch submitted to HDFS-2882 solves this too.

> failed volume causes DataNode#getVolumeInfo NPEs on multi-BP DN
> ---
>
> Key: HDFS-4211
> URL: https://issues.apache.org/jira/browse/HDFS-4211
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.2-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>
> On a DN with {{failed.volumes.tolerated=0}} a disk went bad. After restarting 
> the DN, the following backtrace was observed when accessing {{/jmx}}:
> {code}
> 2012-06-12 16:21:43,248 ERROR org.apache.hadoop.jmx.JMXJsonServlet:
> getting attribute VolumeInfo of
> Hadoop:service=DataNode,name=DataNodeInfo threw an exception
> javax.management.RuntimeMBeanException: java.lang.NullPointerException
>at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:856)
>at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:869)
>at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:670)
>at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:638)
>at 
> org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:315)
>at 
> org.apache.hadoop.jmx.JMXJsonServlet.listBeans(JMXJsonServlet.java:293)
>at org.apache.hadoop.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:193)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:947)
>at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>   at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>   at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>at org.mortbay.jetty.Server.handle(Server.java:326)
>at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.lang.NullPointerException
>at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.getVolumeInfo(DataNode.java:2130)
>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>at java.lang.reflect.Method.invoke(Method.java:597)
>at 
> com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167)
>at 
> com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96)
>at 
> com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33)
>at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208)
>at 
> com.sun.jmx.mbeanserver.PerInterface.getAttribute(PerInterface.java:65)
> {code}
> Since tolerated=0 the DN should have errored out rather than starting up, but 
> due to having multiple BPs configured t

[jira] [Commented] (HDFS-4223) Browsing filesystem from specific datanode in live nodes page also should include delegation token in the url

2013-08-13 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738110#comment-13738110
 ] 

Vinay commented on HDFS-4223:
-

Hi, Please someone review the patch. thanks

> Browsing filesystem from specific datanode in live nodes page also should 
> include delegation token in the url
> -
>
> Key: HDFS-4223
> URL: https://issues.apache.org/jira/browse/HDFS-4223
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.0.2-alpha
>Reporter: Vinay
>Assignee: Vinay
> Attachments: HDFS-4223.patch
>
>
> Browsing file system from the 'Browse the filesystem' link includes 
> 'tokenString' as a parameter in the URL.
> Same way browsing using specific datanode from live nodes page also should 
> include 'tokenString' as a parameter to avoid following exception
> {noformat}javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-5080) BootstrapStandby not working with QJM when the existing NN is active

2013-08-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737946#comment-13737946
 ] 

Hadoop QA commented on HDFS-5080:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12597659/HDFS-5080.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/4809//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/4809//console

This message is automatically generated.

> BootstrapStandby not working with QJM when the existing NN is active
> 
>
> Key: HDFS-5080
> URL: https://issues.apache.org/jira/browse/HDFS-5080
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jing Zhao
>Assignee: Jing Zhao
> Attachments: HDFS-5080.000.patch, HDFS-5080.001.patch, 
> HDFS-5080.002.patch
>
>
> Currently when QJM is used, running BootstrapStandby while the existing NN is 
> active can get the following exception:
> {code}
> FATAL ha.BootstrapStandby: Unable to read transaction ids 6175397-6175405 
> from the configured shared edits storage. Please copy these logs into the 
> shared edits storage or call saveNamespace on the active node.
> Error: Gap in transactions. Expected to be able to read up until at least 
> txid 6175405 but unable to find any edit logs containing txid 6175405
> java.io.IOException: Gap in transactions. Expected to be able to read up 
> until at least txid 6175405 but unable to find any edit logs containing txid 
> 6175405
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1300)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1258)
>   at 
> org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.checkLogsAvailableForRead(BootstrapStandby.java:229)
> {code}
> Looks like the cause of the exception is that, when the active NN is queries 
> by BootstrapStandby about the last written transaction ID, the in-progress 
> edit log segment is included. However, when journal nodes are asked about the 
> last written transaction ID, in-progress edit log is excluded. This causes 
> BootstrapStandby#checkLogsAvailableForRead to complain gaps. 
> To fix this, we can either let journal nodes take into account the 
> in-progress editlog, or let active NN exclude the in-progress edit log 
> segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira