[jira] [Commented] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410091#comment-13410091
 ] 

Vinay commented on HDFS-3605:
-

@Uma
{quote}Here I have one question, why we are keeping all the blocks which are 
having the same blockID and different genstamps due to append recovery etc.? I 
think we should maintain only the latest block which is reported recently. 
Mostly this block will have the higher genstamp.{quote}

I agree with your point, we can keep only the latest reported state of the 
block regardless of genstamp from each datanode instead of keeping all previous 
states in queue which may be outdated by the time those are processed.

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Todd Lipcon
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Vinay (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410090#comment-13410090
 ] 

Vinay commented on HDFS-3605:
-

{quote}The design of the code should be such that, it will re-process those 
"future" events, but they'll get re-postponed at that point. Maybe the issue is 
specifically in the case where these opcodes get read during the "catchup" 
during transition to active?{quote}
You are correct Todd. this problem will come in case of catch up during 
failover.

{code}if (namesystem.isInStandbyState() &&
namesystem.isGenStampInFuture(block.getGenerationStamp())) {
  queueReportedBlock(dn, block, reportedState,
  QUEUE_REASON_FUTURE_GENSTAMP);
  return null;
}{code}
Since during failover the state is already changed to ACTIVE, block will not be 
added again to queue even though it is in future.


> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Todd Lipcon
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410083#comment-13410083
 ] 

Uma Maheswara Rao G commented on HDFS-3605:
---

{quote}
The design of the code should be such that, it will re-process those "future" 
events, but they'll get re-postponed at that point
{quote}
This is what I mean, if i understand you intent correctly here. leave the 
future genstamps here for processing. Once all the OP codes read and processed 
anyway it is processing all quequed messages again if anything left I remeber. 
So, this should help us in this case.

{quote}
Maybe the issue is specifically in the case where these opcodes get read during 
the "catchup" during transition to active?
{quote}
What issue you are pointing here. Edits are getting read in correct order only 
right?

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Todd Lipcon
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410080#comment-13410080
 ] 

Todd Lipcon commented on HDFS-3605:
---

The design of the code should be such that, it will re-process those "future" 
events, but they'll get re-postponed at that point. Maybe the issue is 
specifically in the case where these opcodes get read during the "catchup" 
during transition to active?

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Todd Lipcon
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410077#comment-13410077
 ] 

Uma Maheswara Rao G commented on HDFS-3605:
---

{code}
public void processQueuedMessagesForBlock(Block b) throws IOException {
Queue queue = pendingDNMessages.takeBlockQueue(b);
if (queue == null) {
  // Nothing to re-process
  return;
}
processQueuedMessages(queue);
  }
{code}

I think here, on first OP_CLOSE edit processing it is trying to process the 
QueuedMessagesForBlock. But here queued messages may contains the more future 
block as well, because duw to many append calls, SNN queued that messages.
Instead processing all the queued messages for that block, it is make sense to 
process that current block(current OP_CLOSE genstamp block)?

pendingDNMessages.takeBlockQueue(b); will give set of blocks which are matching 
to the blockID, because it was queuse by block ID. did not consider genstamp. 
But, by considering current case, do we need to consider getstamp also, while 
getting the blcok from queued message and process. Because there moght be some 
more OPs which will come with that block as part of furtheer appends. At that 
time, anyway that respective genstamp blocks will get processed right?
 



> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Todd Lipcon
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc (netcat) command not present

2012-07-09 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3618:
-

Summary: SSH fencing option may incorrectly succeed if nc (netcat) command 
not present  (was: SSH fencing option may incorrectly succeed if nc(netcat) 
command not present?)

> SSH fencing option may incorrectly succeed if nc (netcat) command not present
> -
>
> Key: HDFS-3618
> URL: https://issues.apache.org/jira/browse/HDFS-3618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Reporter: Brahma Reddy Battula
> Attachments: zkfc.txt, zkfc_threaddump.out
>
>
> Started NN's and zkfc's in Suse11.
> Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
> work)..
> While executing following command, got command not found hence rc will be 
> other than zero and assuming that server was down..Here we are ending up 
> without checking whether service is down or not..
> {code}
> LOG.info(
> "Indeterminate response from trying to kill service. " +
> "Verifying whether it is running using nc...");
> rc = execCommand(session, "nc -z " + serviceAddr.getHostName() +
> " " + serviceAddr.getPort());
> if (rc == 0) {
>   // the service is still listening - we are unable to fence
>   LOG.warn("Unable to fence - it is running but we cannot kill it");
>   return false;
> } else {
>   LOG.info("Verified that the service is down.");
>   return true;  
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3618) SSH fencing option may incorrectly succeed if nc(netcat) command not present?

2012-07-09 Thread Brahma Reddy Battula (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410073#comment-13410073
 ] 

Brahma Reddy Battula commented on HDFS-3618:


Hi Aaron T.Myers

Thanks for looking at this issue.I have changed summary.

> SSH fencing option may incorrectly succeed if nc(netcat) command not present?
> -
>
> Key: HDFS-3618
> URL: https://issues.apache.org/jira/browse/HDFS-3618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Reporter: Brahma Reddy Battula
> Attachments: zkfc.txt, zkfc_threaddump.out
>
>
> Started NN's and zkfc's in Suse11.
> Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
> work)..
> While executing following command, got command not found hence rc will be 
> other than zero and assuming that server was down..Here we are ending up 
> without checking whether service is down or not..
> {code}
> LOG.info(
> "Indeterminate response from trying to kill service. " +
> "Verifying whether it is running using nc...");
> rc = execCommand(session, "nc -z " + serviceAddr.getHostName() +
> " " + serviceAddr.getPort());
> if (rc == 0) {
>   // the service is still listening - we are unable to fence
>   LOG.warn("Unable to fence - it is running but we cannot kill it");
>   return false;
> } else {
>   LOG.info("Verified that the service is down.");
>   return true;  
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3624) fuse_dfs: improve user and group translation

2012-07-09 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HDFS-3624:
--

 Summary: fuse_dfs: improve user and group translation
 Key: HDFS-3624
 URL: https://issues.apache.org/jira/browse/HDFS-3624
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: contrib/fuse-dfs
Affects Versions: 2.0.1-alpha
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor


In fuse_dfs, we should translate HDFS unknown user names to the UNIX UID or GID 
for 'nobody' or 'nogroup' by default.  This should also be configurable for 
systems that want to use a different UID for this purpose.  (Currently we hard 
code this as UID 99).

Similarly, 'superuser' should be translated to 'root', and this translation 
should also be made configurable.

fuse_dfs should not do its own permission checks, but instead rely on the Java 
code to do this.  Trying to use the translated UIDs and GIDs for permission 
checking (which is what FUSE does when you enable default_permissions) leads to 
problems.

Finally, the HDFS user name to UID mapping should be cached for a short amount 
of time, rather than queried multiple times during every operation.  It changes 
extremely infrequently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3618) SSH fencing option may incorrectly succeed if nc(netcat) command not present?

2012-07-09 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-3618:
---

Summary: SSH fencing option may incorrectly succeed if nc(netcat) command 
not present?  (was: If RC is other than zero, we are assuming that Service is 
down (What if NC command itself not found..?))

> SSH fencing option may incorrectly succeed if nc(netcat) command not present?
> -
>
> Key: HDFS-3618
> URL: https://issues.apache.org/jira/browse/HDFS-3618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Reporter: Brahma Reddy Battula
> Attachments: zkfc.txt, zkfc_threaddump.out
>
>
> Started NN's and zkfc's in Suse11.
> Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
> work)..
> While executing following command, got command not found hence rc will be 
> other than zero and assuming that server was down..Here we are ending up 
> without checking whether service is down or not..
> {code}
> LOG.info(
> "Indeterminate response from trying to kill service. " +
> "Verifying whether it is running using nc...");
> rc = execCommand(session, "nc -z " + serviceAddr.getHostName() +
> " " + serviceAddr.getPort());
> if (rc == 0) {
>   // the service is still listening - we are unable to fence
>   LOG.warn("Unable to fence - it is running but we cannot kill it");
>   return false;
> } else {
>   LOG.info("Verified that the service is down.");
>   return true;  
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HDFS-3605:
-

Assignee: Todd Lipcon

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>Assignee: Todd Lipcon
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410066#comment-13410066
 ] 

Todd Lipcon commented on HDFS-3605:
---

Thanks for the test. Very helpful. I'll take a look at this.

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3611) NameNode prints unnecessary WARNs about edit log normally skipping a few bytes

2012-07-09 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe reassigned HDFS-3611:
--

Assignee: Colin Patrick McCabe

> NameNode prints unnecessary WARNs about edit log normally skipping a few bytes
> --
>
> Key: HDFS-3611
> URL: https://issues.apache.org/jira/browse/HDFS-3611
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Assignee: Colin Patrick McCabe
>Priority: Trivial
>  Labels: newbie
>
> The NameNode currently warns these form of lines at every startup, even if 
> there's no trouble really. For instance, the below is from a NN startup that 
> was only just freshly formatted.
> {code}
> 12/07/08 20:00:22 WARN namenode.EditLogInputStream: skipping 1048563 bytes at 
> the end of edit log  
> '/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/data/current/edits_003-003':
>  reached txid 3 out of 3
> {code}
> If this skipping is not really a cause for warning, we should not log it at a 
> WARN level but at an INFO or even DEBUG one. Avoids users getting 
> unnecessarily concerned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3611) NameNode prints unnecessary WARNs about edit log normally skipping a few bytes

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410061#comment-13410061
 ] 

Colin Patrick McCabe commented on HDFS-3611:


I guess changing it to an INFO might be appropriate.  It's definitely not worth 
a WARN.

> NameNode prints unnecessary WARNs about edit log normally skipping a few bytes
> --
>
> Key: HDFS-3611
> URL: https://issues.apache.org/jira/browse/HDFS-3611
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Trivial
>  Labels: newbie
>
> The NameNode currently warns these form of lines at every startup, even if 
> there's no trouble really. For instance, the below is from a NN startup that 
> was only just freshly formatted.
> {code}
> 12/07/08 20:00:22 WARN namenode.EditLogInputStream: skipping 1048563 bytes at 
> the end of edit log  
> '/Users/harshchouraria/Work/installs/temp-space/tmp-default/dfs-cdh4/data/current/edits_003-003':
>  reached txid 3 out of 3
> {code}
> If this skipping is not really a cause for warning, we should not log it at a 
> WARN level but at an INFO or even DEBUG one. Avoids users getting 
> unnecessarily concerned.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-3605:
---

Attachment: TestAppendBlockMiss.java

Hi Todd,
Thanks for taking a look..Attaching unit test to reproduce this issue.

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
> Attachments: TestAppendBlockMiss.java
>
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3623) BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of ZKSessionTimeout instead.

2012-07-09 Thread Uma Maheswara Rao G (JIRA)
Uma Maheswara Rao G created HDFS-3623:
-

 Summary: BKJM: zkLatchWaitTimeout hard coded to 6000. Make use of 
ZKSessionTimeout instead.
 Key: HDFS-3623
 URL: https://issues.apache.org/jira/browse/HDFS-3623
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G


{code}
if (!zkConnectLatch.await(6000, TimeUnit.MILLISECONDS)) {
{code}

we can make use of session timeout instead of hardcoding this value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410036#comment-13410036
 ] 

Harsh J commented on HDFS-3617:
---

Noting that I got the same value in my MAPREDUCE-4415 test-patch run too. Some 
extra warnings are due to new features of Findbugs we may not be interested in 
at this point (Internationalization, etc.? Sounds useful to have though, should 
we fix these via another JIRA?)

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch, hadoop-findbugs-report.html
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security

2012-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410026#comment-13410026
 ] 

Hadoop QA commented on HDFS-3568:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535772/HDFS-3568.005.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 1 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.TestDatanodeBlockScanner

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2765//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2765//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2765//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2765//console

This message is automatically generated.

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.1-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch, HDFS-3568.004.patch, HDFS-3568.005.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-3617:
--

Attachment: hadoop-findbugs-report.html

Findbugs (version 2.0.1-rc3) is what I used, so it may be that (I thought 1.3 
was extinct long ago? I've always had 2.0.0 on my Mac at least, and for this 
build I ran on a remote Linux machine I had to download whatever was latest). 
I've attached the report.

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch, hadoop-findbugs-report.html
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3582) Hook daemon process exit for testing

2012-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410010#comment-13410010
 ] 

Hadoop QA commented on HDFS-3582:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535764/hdfs-3582.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 10 new or modified test 
files.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse.  The patch built with eclipse:eclipse.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal:

  org.apache.hadoop.hdfs.server.namenode.TestBackupNode

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2764//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/2764//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2764//console

This message is automatically generated.

> Hook daemon process exit for testing 
> -
>
> Key: HDFS-3582
> URL: https://issues.apache.org/jira/browse/HDFS-3582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
> hdfs-3582.txt, hdfs-3582.txt
>
>
> Occasionally the tests fail with "java.util.concurrent.ExecutionException: 
> org.apache.maven.surefire.booter.SurefireBooterForkException:
> Error occurred in starting fork, check output in log" because the NN is 
> exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't 
> retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
> know which part of the test triggered which exit in HDFS. To make this easier 
> to debug let's hook all daemon process exits when running the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3077) Quorum-based protocol for reading and writing edit logs

2012-07-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409998#comment-13409998
 ] 

Todd Lipcon commented on HDFS-3077:
---

Thanks, Suresh and Aaron for your comments. I'm working on updating the patch 
and doing a bit more cleanup as well. I'll also see what I can do to make the 
server side a little more generic, if possible. I think it's impossible to 
share an IPC protocol with the BackupNode, but maybe it's possible to support 
both client-side policies for the standalone journal usecase like Suresh 
suggests above. I should have something in a couple days - been moving 
apartments the last couple weeks so a little less productive than usual.

> Quorum-based protocol for reading and writing edit logs
> ---
>
> Key: HDFS-3077
> URL: https://issues.apache.org/jira/browse/HDFS-3077
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ha, name-node
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: hdfs-3077-partial.txt, hdfs-3077.txt, hdfs-3077.txt, 
> qjournal-design.pdf, qjournal-design.pdf
>
>
> Currently, one of the weak points of the HA design is that it relies on 
> shared storage such as an NFS filer for the shared edit log. One alternative 
> that has been proposed is to depend on BookKeeper, a ZooKeeper subproject 
> which provides a highly available replicated edit log on commodity hardware. 
> This JIRA is to implement another alternative, based on a quorum commit 
> protocol, integrated more tightly in HDFS and with the requirements driven 
> only by HDFS's needs rather than more generic use cases. More details to 
> follow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache

2012-07-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409992#comment-13409992
 ] 

Aaron T. Myers commented on HDFS-3608:
--

That seems like a pretty decent idea to me, i.e. use stat(2) but rate limit the 
check and occasionally reap old FS instances.

> fuse_dfs: detect changes in UID ticket cache
> 
>
> Key: HDFS-3608
> URL: https://issues.apache.org/jira/browse/HDFS-3608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> Currently in fuse_dfs, if one kinits as some principal "foo" and then does 
> some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
> subsequent operations done via fuse_dfs will still use cached credentials for 
> "foo". The reason for this is that fuse_dfs caches Filesystem instances using 
> the UID of the user running the command as the key into the cache.  This is a 
> very uncommon scenario, since it's pretty uncommon for a single user to want 
> to use credentials for several different principals on the same box.
> However, we can use inotify to detect changes in the Kerberos ticket cache 
> file and force the next operation to create a new FileSystem instance in that 
> case.  This will also require a reference counting mechanism in fuse_dfs so 
> that we can free the FileSystem classes when they refer to previous Kerberos 
> ticket caches.
> Another mechanism is to run a stat periodically on the ticket cache file.  
> This is a good fallback mechanism if inotify does not work on the file (for 
> example, because it's on an NFS mount.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409988#comment-13409988
 ] 

Colin Patrick McCabe commented on HDFS-3608:


I guess another way to do this would be to have a timer go off every minute or 
so causing the ticket cache files to be marked as "must stat next time."  Of 
course the timer should only be armed when there is actually something in the 
cache in the first place.  That would actually be a reasonable way to do it.  
As a bonus, we could finally dispose of the memory in FileSystem objects after 
a while (something we do not currently do-- even after being used once, they'll 
exist forever right now.)

> fuse_dfs: detect changes in UID ticket cache
> 
>
> Key: HDFS-3608
> URL: https://issues.apache.org/jira/browse/HDFS-3608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> Currently in fuse_dfs, if one kinits as some principal "foo" and then does 
> some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
> subsequent operations done via fuse_dfs will still use cached credentials for 
> "foo". The reason for this is that fuse_dfs caches Filesystem instances using 
> the UID of the user running the command as the key into the cache.  This is a 
> very uncommon scenario, since it's pretty uncommon for a single user to want 
> to use credentials for several different principals on the same box.
> However, we can use inotify to detect changes in the Kerberos ticket cache 
> file and force the next operation to create a new FileSystem instance in that 
> case.  This will also require a reference counting mechanism in fuse_dfs so 
> that we can free the FileSystem classes when they refer to previous Kerberos 
> ticket caches.
> Another mechanism is to run a stat periodically on the ticket cache file.  
> This is a good fallback mechanism if inotify does not work on the file (for 
> example, because it's on an NFS mount.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409982#comment-13409982
 ] 

Colin Patrick McCabe commented on HDFS-3608:


fair enough.  I updated the summary and description.

> fuse_dfs: detect changes in UID ticket cache
> 
>
> Key: HDFS-3608
> URL: https://issues.apache.org/jira/browse/HDFS-3608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> Currently in fuse_dfs, if one kinits as some principal "foo" and then does 
> some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
> subsequent operations done via fuse_dfs will still use cached credentials for 
> "foo". The reason for this is that fuse_dfs caches Filesystem instances using 
> the UID of the user running the command as the key into the cache.  This is a 
> very uncommon scenario, since it's pretty uncommon for a single user to want 
> to use credentials for several different principals on the same box.
> However, we can use inotify to detect changes in the Kerberos ticket cache 
> file and force the next operation to create a new FileSystem instance in that 
> case.  This will also require a reference counting mechanism in fuse_dfs so 
> that we can free the FileSystem classes when they refer to previous Kerberos 
> ticket caches.
> Another mechanism is to run a stat periodically on the ticket cache file.  
> This is a good fallback mechanism if inotify does not work on the file (for 
> example, because it's on an NFS mount.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3608) fuse_dfs: detect changes in UID ticket cache

2012-07-09 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3608:
---

Description: 
Currently in fuse_dfs, if one kinits as some principal "foo" and then does some 
operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
subsequent operations done via fuse_dfs will still use cached credentials for 
"foo". The reason for this is that fuse_dfs caches Filesystem instances using 
the UID of the user running the command as the key into the cache.  This is a 
very uncommon scenario, since it's pretty uncommon for a single user to want to 
use credentials for several different principals on the same box.

However, we can use inotify to detect changes in the Kerberos ticket cache file 
and force the next operation to create a new FileSystem instance in that case.  
This will also require a reference counting mechanism in fuse_dfs so that we 
can free the FileSystem classes when they refer to previous Kerberos ticket 
caches.

Another mechanism is to run a stat periodically on the ticket cache file.  This 
is a good fallback mechanism if inotify does not work on the file (for example, 
because it's on an NFS mount.)

  was:
Currently in fuse_dfs, if one kinits as some principal "foo" and then does some 
operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
subsequent operations done via fuse_dfs will still use cached credentials for 
"foo". The reason for this is that fuse_dfs caches Filesystem instances using 
the UID of the user running the command as the key into the cache.  This is a 
very uncommon scenario, since it's pretty uncommon for a single user to want to 
use credentials for several different principals on the same box.

However, we can use inotify to detect changes in the Kerberos ticket cache file 
and force the next operation to create a new FileSystem instance in that case.  
This will also require a reference counting mechanism in fuse_dfs so that we 
can free the FileSystem classes when they refer to previous Kerberos ticket 
caches.

Summary: fuse_dfs: detect changes in UID ticket cache  (was: fuse_dfs: 
use inotify to detect changes in UID ticket cache)

> fuse_dfs: detect changes in UID ticket cache
> 
>
> Key: HDFS-3608
> URL: https://issues.apache.org/jira/browse/HDFS-3608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> Currently in fuse_dfs, if one kinits as some principal "foo" and then does 
> some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
> subsequent operations done via fuse_dfs will still use cached credentials for 
> "foo". The reason for this is that fuse_dfs caches Filesystem instances using 
> the UID of the user running the command as the key into the cache.  This is a 
> very uncommon scenario, since it's pretty uncommon for a single user to want 
> to use credentials for several different principals on the same box.
> However, we can use inotify to detect changes in the Kerberos ticket cache 
> file and force the next operation to create a new FileSystem instance in that 
> case.  This will also require a reference counting mechanism in fuse_dfs so 
> that we can free the FileSystem classes when they refer to previous Kerberos 
> ticket caches.
> Another mechanism is to run a stat periodically on the ticket cache file.  
> This is a good fallback mechanism if inotify does not work on the file (for 
> example, because it's on an NFS mount.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3568) fuse_dfs: add support for security

2012-07-09 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3568:
---

Attachment: HDFS-3568.005.patch

I'm not sure what's going on with Jenkins.  It seems to be aborting before it 
actually runs any tests.  On the off chance that this is because of the lack of 
test changes in this patch, here's a patch which does change a test.

The reason why we have no tests for this patch is that there are no tests for 
FUSE, and no tests that use a KDC (kerberos domain controller.)  Since this 
patch uses both of those, meaningful unit testing is impossible at this point.

I am working on a FUSE unit test, so that should improve in the near future.

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.1-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch, HDFS-3568.004.patch, HDFS-3568.005.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3608) fuse_dfs: use inotify to detect changes in UID ticket cache

2012-07-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409980#comment-13409980
 ] 

Aaron T. Myers commented on HDFS-3608:
--

I wasn't saying that we definitely shouldn't use inotify, just that using 
inotify is not the goal of the JIRA. The goal of the JIRA is to make fuse_dfs 
not cache Filesystem instances longer than it should. One potential 
implementation is to use inotify. Thus, we should update the 
summary/description of the JIRA.

> fuse_dfs: use inotify to detect changes in UID ticket cache
> ---
>
> Key: HDFS-3608
> URL: https://issues.apache.org/jira/browse/HDFS-3608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> Currently in fuse_dfs, if one kinits as some principal "foo" and then does 
> some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
> subsequent operations done via fuse_dfs will still use cached credentials for 
> "foo". The reason for this is that fuse_dfs caches Filesystem instances using 
> the UID of the user running the command as the key into the cache.  This is a 
> very uncommon scenario, since it's pretty uncommon for a single user to want 
> to use credentials for several different principals on the same box.
> However, we can use inotify to detect changes in the Kerberos ticket cache 
> file and force the next operation to create a new FileSystem instance in that 
> case.  This will also require a reference counting mechanism in fuse_dfs so 
> that we can free the FileSystem classes when they refer to previous Kerberos 
> ticket caches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3583) Convert remaining tests to Junit4

2012-07-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409977#comment-13409977
 ] 

Aaron T. Myers commented on HDFS-3583:
--

+1, this is how we deal with renaming directories, and I think it makes sense 
to do so in this case as well.

The other important thing to make sure of is that we don't accidentally cause 
some test cases to no longer be run, since JUnit 4 requires all tests be 
annotated with {{@Test}}, and we don't want to miss anything.

> Convert remaining tests to Junit4
> -
>
> Key: HDFS-3583
> URL: https://issues.apache.org/jira/browse/HDFS-3583
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>  Labels: newbie
>
> JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's 
> convert the remaining tests over to Junit4 style.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3608) fuse_dfs: use inotify to detect changes in UID ticket cache

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409978#comment-13409978
 ] 

Colin Patrick McCabe commented on HDFS-3608:


I don't think running a stat() on the ticket cache file for every operation is 
a very good idea.  stat is a system call and rather slow.  We're talking orders 
of magnitude slower here.

This isn't really that hard to implement (with inotify or not) so give me a 
chance here.


> fuse_dfs: use inotify to detect changes in UID ticket cache
> ---
>
> Key: HDFS-3608
> URL: https://issues.apache.org/jira/browse/HDFS-3608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> Currently in fuse_dfs, if one kinits as some principal "foo" and then does 
> some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
> subsequent operations done via fuse_dfs will still use cached credentials for 
> "foo". The reason for this is that fuse_dfs caches Filesystem instances using 
> the UID of the user running the command as the key into the cache.  This is a 
> very uncommon scenario, since it's pretty uncommon for a single user to want 
> to use credentials for several different principals on the same box.
> However, we can use inotify to detect changes in the Kerberos ticket cache 
> file and force the next operation to create a new FileSystem instance in that 
> case.  This will also require a reference counting mechanism in fuse_dfs so 
> that we can free the FileSystem classes when they refer to previous Kerberos 
> ticket caches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security

2012-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409975#comment-13409975
 ] 

Hadoop QA commented on HDFS-3568:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535712/HDFS-3568.004.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2763//console

This message is automatically generated.

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.1-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch, HDFS-3568.004.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3583) Convert remaining tests to Junit4

2012-07-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409974#comment-13409974
 ] 

Todd Lipcon commented on HDFS-3583:
---

If some of them can be done automatically, maybe we should do this in two 
steps. First, develop whatever script automatically does the conversion, and 
review that. Then, run it to generate a patch, and commit it. Then anything 
that was too hard for the script we can do by hand later. Maybe reasonable?

> Convert remaining tests to Junit4
> -
>
> Key: HDFS-3583
> URL: https://issues.apache.org/jira/browse/HDFS-3583
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>  Labels: newbie
>
> JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's 
> convert the remaining tests over to Junit4 style.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3583) Convert remaining tests to Junit4

2012-07-09 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409970#comment-13409970
 ] 

Andrew Wang commented on HDFS-3583:
---

I'd like to take a hack at this. It's going to be a very large patch, and the 
trick here is making sure not to introduce any regressions.

> Convert remaining tests to Junit4
> -
>
> Key: HDFS-3583
> URL: https://issues.apache.org/jira/browse/HDFS-3583
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>  Labels: newbie
>
> JUnit4 style tests are easier to debug (eg can use @Timeout etc), let's 
> convert the remaining tests over to Junit4 style.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security

2012-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409967#comment-13409967
 ] 

Hadoop QA commented on HDFS-3568:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12535712/HDFS-3568.004.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/2762//console

This message is automatically generated.

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.1-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch, HDFS-3568.004.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2827) Cannot save namespace after renaming a directory above a file with an open lease

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409961#comment-13409961
 ] 

Eli Collins commented on HDFS-2827:
---

{noformat} 
 [exec] 
 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 6 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 7 new Findbugs 
(version 1.3.9) warnings.
 [exec]
{noformat} 

findbugs are HADOOP-7847.

> Cannot save namespace after renaming a directory above a file with an open 
> lease
> 
>
> Key: HDFS-2827
> URL: https://issues.apache.org/jira/browse/HDFS-2827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2827-test.patch, HDFS-2827.patch, hdfs-2827-b1.txt
>
>
> When i execute the following operations and wait for checkpoint to complete.
> fs.mkdirs(new Path("/test1"));
> FSDataOutputStream create = fs.create(new Path("/test/abc.txt")); //dont close
> fs.rename(new Path("/test/"), new Path("/test1/"));
> Check-pointing is failing with the following exception.
> 2012-01-23 15:03:14,204 ERROR namenode.FSImage (FSImage.java:run(795)) - 
> Unable to save image for 
> E:\HDFS-1623\hadoop-hdfs-project\hadoop-hdfs\build\test\data\dfs\name3
> java.io.IOException: saveLeases found path /test1/est/abc.txt but no matching 
> entry in namespace.[/test1/est/abc.txt]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4336)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:588)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:761)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:789)
>   at java.lang.Thread.run(Unknown Source)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409964#comment-13409964
 ] 

Eli Collins commented on HDFS-3617:
---

Forgot to mention, I'm using findbugs 1.3.9

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3618) If RC is other than zero, we are assuming that Service is down (What if NC command itself not found..?)

2012-07-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409963#comment-13409963
 ] 

Aaron T. Myers commented on HDFS-3618:
--

Good catch, Brahma.

How about changing the summary of this JIRA to something like "SSH fencing 
option may incorrectly succeed if netcat command not present" ?

> If RC is other than zero, we are assuming that Service is down (What if NC 
> command itself not found..?)
> ---
>
> Key: HDFS-3618
> URL: https://issues.apache.org/jira/browse/HDFS-3618
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Reporter: Brahma Reddy Battula
> Attachments: zkfc.txt, zkfc_threaddump.out
>
>
> Started NN's and zkfc's in Suse11.
> Suse11 will have netcat installation and netcat -z will work(but nc -z wn't 
> work)..
> While executing following command, got command not found hence rc will be 
> other than zero and assuming that server was down..Here we are ending up 
> without checking whether service is down or not..
> {code}
> LOG.info(
> "Indeterminate response from trying to kill service. " +
> "Verifying whether it is running using nc...");
> rc = execCommand(session, "nc -z " + serviceAddr.getHostName() +
> " " + serviceAddr.getPort());
> if (rc == 0) {
>   // the service is still listening - we are unable to fence
>   LOG.warn("Unable to fence - it is running but we cannot kill it");
>   return false;
> } else {
>   LOG.info("Verified that the service is down.");
>   return true;  
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409962#comment-13409962
 ] 

Eli Collins commented on HDFS-3617:
---

Harsh,

What version of findbugs are you using, and what are most of the 218 findbugs 
due to? I ran test-patch for HDFS-2827 and only got 7.

Thanks,
Eli

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3582) Hook daemon process exit for testing

2012-07-09 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3582:
--

Description: 
Occasionally the tests fail with "java.util.concurrent.ExecutionException: 
org.apache.maven.surefire.booter.SurefireBooterForkException:
Error occurred in starting fork, check output in log" because the NN is 
exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't 
retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
know which part of the test triggered which exit in HDFS. To make this easier 
to debug let's hook all daemon process exits when running the tests.

  was:
Occasionally the tests fail with "java.util.concurrent.ExecutionException: 
org.apache.maven.surefire.booter.SurefireBooterForkException:
Error occurred in starting fork, check output in log" because the NN is 
exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't 
retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
know which part of the test triggered which exit in HDFS. To make this 
debuggable, let's hook this in MiniDFSCluster  via installing a security 
manager that overrides checkExit (ala TestClusterId) or mock out System.exit in 
the code itself. I think the former is preferable though we'll need to keep the 
door open for tests that want to set their own security manager (should be fine 
to override this one some times).

Summary: Hook daemon process exit for testing   (was: Hook System.exit 
in MiniDFSCluster)

> Hook daemon process exit for testing 
> -
>
> Key: HDFS-3582
> URL: https://issues.apache.org/jira/browse/HDFS-3582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
> hdfs-3582.txt, hdfs-3582.txt
>
>
> Occasionally the tests fail with "java.util.concurrent.ExecutionException: 
> org.apache.maven.surefire.booter.SurefireBooterForkException:
> Error occurred in starting fork, check output in log" because the NN is 
> exit'ing (via System#exit or Runtime#exit). Unfortunately Surefire doesn't 
> retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
> know which part of the test triggered which exit in HDFS. To make this easier 
> to debug let's hook all daemon process exits when running the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3582) Hook System.exit in MiniDFSCluster

2012-07-09 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3582:
--

Attachment: hdfs-3582.txt

Updated patch attached.
- Was missing the Runtime#exit calls, fixed these and updated all the relevant 
edit log tests to match
- Now passing a message to ExitUtil#terminate so the particular cause of the 
exist is captured in the exception is available to the tests
- NB: I'm just hooking daemon exits, not eg all the tool exits (balancer, 
*admin, recovery mode)
- Made logs "fatal" that were "error" but terminated
- Made the NN/DN/2NN exit failure codes consistent (use 1 in places we were 
using -1)

> Hook System.exit in MiniDFSCluster
> --
>
> Key: HDFS-3582
> URL: https://issues.apache.org/jira/browse/HDFS-3582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
> hdfs-3582.txt, hdfs-3582.txt
>
>
> Occasionally the tests fail with "java.util.concurrent.ExecutionException: 
> org.apache.maven.surefire.booter.SurefireBooterForkException:
> Error occurred in starting fork, check output in log" because the NN is 
> exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't 
> retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
> know which part of the test triggered which exit in HDFS. To make this 
> debuggable, let's hook this in MiniDFSCluster  via installing a security 
> manager that overrides checkExit (ala TestClusterId) or mock out System.exit 
> in the code itself. I think the former is preferable though we'll need to 
> keep the door open for tests that want to set their own security manager 
> (should be fine to override this one some times).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2827) Cannot save namespace after renaming a directory above a file with an open lease

2012-07-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409942#comment-13409942
 ] 

Aaron T. Myers commented on HDFS-2827:
--

+1, the branch-1 patch looks good to me.

> Cannot save namespace after renaming a directory above a file with an open 
> lease
> 
>
> Key: HDFS-2827
> URL: https://issues.apache.org/jira/browse/HDFS-2827
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.24.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
> Fix For: 0.24.0, 0.23.1
>
> Attachments: HDFS-2827-test.patch, HDFS-2827.patch, hdfs-2827-b1.txt
>
>
> When i execute the following operations and wait for checkpoint to complete.
> fs.mkdirs(new Path("/test1"));
> FSDataOutputStream create = fs.create(new Path("/test/abc.txt")); //dont close
> fs.rename(new Path("/test/"), new Path("/test1/"));
> Check-pointing is failing with the following exception.
> 2012-01-23 15:03:14,204 ERROR namenode.FSImage (FSImage.java:run(795)) - 
> Unable to save image for 
> E:\HDFS-1623\hadoop-hdfs-project\hadoop-hdfs\build\test\data\dfs\name3
> java.io.IOException: saveLeases found path /test1/est/abc.txt but no matching 
> entry in namespace.[/test1/est/abc.txt]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4336)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Saver.save(FSImageFormat.java:588)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:761)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSImage$FSImageSaver.run(FSImage.java:789)
>   at java.lang.Thread.run(Unknown Source)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409941#comment-13409941
 ] 

Eli Collins commented on HDFS-3617:
---

Thanks Harsh, mind updating HADOOP-7847 with your report?  218 is kind of 
alarming.

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3597) SNN can fail to start on upgrade

2012-07-09 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3597:
-

Target Version/s: 2.0.1-alpha
  Status: Patch Available  (was: Open)

> SNN can fail to start on upgrade
> 
>
> Key: HDFS-3597
> URL: https://issues.apache.org/jira/browse/HDFS-3597
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Andy Isaacson
>Assignee: Andy Isaacson
>Priority: Minor
> Attachments: hdfs-3597-2.txt, hdfs-3597-3.txt, hdfs-3597.txt
>
>
> When upgrading from 1.x to 2.0.0, the SecondaryNameNode can fail to start up:
> {code}
> 2012-06-16 09:52:33,812 ERROR 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in 
> doCheckpoint
> java.io.IOException: Inconsistent checkpoint fields.
> LV = -40 namespaceID = 64415959 cTime = 1339813974990 ; clusterId = 
> CID-07a82b97-8d04-4fdd-b3a1-f40650163245 ; blockpoolId = 
> BP-1792677198-172.29.121.67-1339813967723.
> Expecting respectively: -19; 64415959; 0; ; .
> at 
> org.apache.hadoop.hdfs.server.namenode.CheckpointSignature.validateStorageInfo(CheckpointSignature.java:120)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:454)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doWork(SecondaryNameNode.java:334)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$2.run(SecondaryNameNode.java:301)
> at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:438)
> at 
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:297)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> The error check we're hitting came from HDFS-1073, and it's intended to 
> verify that we're connecting to the correct NN.  But the check is too strict 
> and considers "different metadata version" to be the same as "different 
> clusterID".
> I believe the check in {{doCheckpoint}} simply needs to explicitly check for 
> and handle the update case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3608) fuse_dfs: use inotify to detect changes in UID ticket cache

2012-07-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409935#comment-13409935
 ] 

Aaron T. Myers commented on HDFS-3608:
--

I don't think that using inotify should be a hard requirement here. It might be 
acceptable, and quite a bit simpler, to just check the last modification time 
of the ticket cache file when fetching the cached Filesystem instance, and 
create a new FS if the mod time has changed since the last time it was accessed.

Given that, I think we should remove inotify from the summary of this JIRA.

> fuse_dfs: use inotify to detect changes in UID ticket cache
> ---
>
> Key: HDFS-3608
> URL: https://issues.apache.org/jira/browse/HDFS-3608
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> Currently in fuse_dfs, if one kinits as some principal "foo" and then does 
> some operation on fuse_dfs, then kdestroy and kinit as some principal "bar", 
> subsequent operations done via fuse_dfs will still use cached credentials for 
> "foo". The reason for this is that fuse_dfs caches Filesystem instances using 
> the UID of the user running the command as the key into the cache.  This is a 
> very uncommon scenario, since it's pretty uncommon for a single user to want 
> to use credentials for several different principals on the same box.
> However, we can use inotify to detect changes in the Kerberos ticket cache 
> file and force the next operation to create a new FileSystem instance in that 
> case.  This will also require a reference counting mechanism in fuse_dfs so 
> that we can free the FileSystem classes when they refer to previous Kerberos 
> ticket caches.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3607) log a message when fuse_dfs is not built

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409932#comment-13409932
 ] 

Colin Patrick McCabe commented on HDFS-3607:


And just to clarify the clarification, fuse_dfs is silently ignored when 
fuse-devel (or the equivalent development files) are not present, or when the 
operating system is not Linux.  We have a few optional build components, and 
this is one of them.

Another way around this problem might be to provide a maven profile or setting 
that forces failure to build fuse_dfs to be a hard error.  I believe this would 
be useful to people working on packaging.

> log a message when fuse_dfs is not built
> 
>
> Key: HDFS-3607
> URL: https://issues.apache.org/jira/browse/HDFS-3607
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/fuse-dfs
>Affects Versions: 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> We should log a message when fuse_dfs is not built explaining why

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3607) log a message when fuse_dfs is not built

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409930#comment-13409930
 ] 

Colin Patrick McCabe commented on HDFS-3607:


Just to clarify, this JIRA is about logging something to the maven output when 
fuse is not built.  Most developers would like to know that fuse was not built 
if it in fact was not.

> log a message when fuse_dfs is not built
> 
>
> Key: HDFS-3607
> URL: https://issues.apache.org/jira/browse/HDFS-3607
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/fuse-dfs
>Affects Versions: 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
>
> We should log a message when fuse_dfs is not built explaining why

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3561) ZKFC retries for 45 times to connect to other NN during fencing when network between NNs broken and standby Nn will not take over as active

2012-07-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409927#comment-13409927
 ] 

Aaron T. Myers commented on HDFS-3561:
--

Seems to me like these new configs should not be made specific to the ZKFC, but 
rather should apply to all failover controllers. Given that, I think we should 
change the config keys to be named similarly to the other FC graceful 
connection configs, e.g. 
"ha.failover-controller.graceful-fence.rpc-timeout.ms". Furthermore, we should 
push down the handling for this into the FailoverController, and not put it in 
ZKFailoverController.

> ZKFC retries for 45 times to connect to other NN during fencing when network 
> between NNs broken and standby Nn will not take over as active 
> 
>
> Key: HDFS-3561
> URL: https://issues.apache.org/jira/browse/HDFS-3561
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover
>Affects Versions: 2.0.1-alpha, 3.0.0
>Reporter: suja s
>Assignee: Vinay
> Attachments: HDFS-3561.patch
>
>
> Scenario:
> Active NN on machine1
> Standby NN on machine2
> Machine1 is isolated from the network (machine1 network cable unplugged)
> After zk session timeout ZKFC at machine2 side gets notification that NN1 is 
> not there.
> ZKFC tries to failover NN2 as active.
> As part of this during fencing it tries to connect to machine1 and kill NN1. 
> (sshfence technique configured)
> This connection retry happens for 45 times( as it takes  
> ipc.client.connect.max.socket.retries)
> Also after that standby NN is not able to take over as active (because of 
> fencing failure).
> Suggestion: If ZKFC is not able to reach other NN for specified time/no of 
> retries it can consider that NN as dead and instruct the other NN to take 
> over as active as there is no chance of the other NN (NN1) retaining its 
> state as active after zk session timeout when its isolated from network
> From ZKFC log:
> {noformat}
> 2012-06-21 17:46:14,378 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 22 time(s).
> 2012-06-21 17:46:35,378 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 23 time(s).
> 2012-06-21 17:46:56,378 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 24 time(s).
> 2012-06-21 17:47:17,378 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 25 time(s).
> 2012-06-21 17:47:38,382 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 26 time(s).
> 2012-06-21 17:47:59,382 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 27 time(s).
> 2012-06-21 17:48:20,386 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 28 time(s).
> 2012-06-21 17:48:41,386 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 29 time(s).
> 2012-06-21 17:49:02,386 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 30 time(s).
> 2012-06-21 17:49:23,386 INFO org.apache.hadoop.ipc.Client: Retrying connect 
> to server: HOST-xx-xx-xx-102/xx.xx.xx.102:65110. Already tried 31 time(s).
> {noformat}
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409923#comment-13409923
 ] 

Todd Lipcon commented on HDFS-3605:
---

I'm sorry, I'm not entirely understanding the description. Can you post a unit 
test which reproduces the issue?

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3605) Missing Block in following scenario

2012-07-09 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3605:
-

Summary: Missing Block in following scenario  (was: Missing Block in 
following sceanrio.)

> Missing Block in following scenario
> ---
>
> Key: HDFS-3605
> URL: https://issues.apache.org/jira/browse/HDFS-3605
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha, 2.0.1-alpha
>Reporter: Brahma Reddy Battula
>
> Open file for append
> Write data and sync.
> After next log roll and editlog tailing in standbyNN close the append stream.
> Call append multiple times on the same file, before next editlog roll.
> Now abruptly kill the current active namenode.
> Here block is missed..
> this may be because of All latest blocks were queued in StandBy Namenode. 
> During failover, first OP_CLOSE was processing the pending queue and adding 
> the block to corrupted block. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409908#comment-13409908
 ] 

Daryn Sharp commented on HDFS-2617:
---

Ack, no, I didn't think to look in the jetty socket factory.  That begs the 
question: can we change the hardcoded value?

My understanding is kerberos is designed to be used on an insecure network, so 
does ssl provide much benefit?  If yes, then why is ssl used to get a token, 
and then the token is passed in cleartext w/o ssl?

> Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
> --
>
> Key: HDFS-2617
> URL: https://issues.apache.org/jira/browse/HDFS-2617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, 
> HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, 
> HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch
>
>
> The current approach to secure and authenticate nn web services is based on 
> Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now 
> that we have one, we can get rid of the non-standard KSSL and use SPNEGO 
> throughout.  This will simplify setup and configuration.  Also, Kerberized 
> SSL is a non-standard approach with its own quirks and dark corners 
> (HDFS-2386).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1508) Ability to do savenamespace without being in safemode

2012-07-09 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409858#comment-13409858
 ] 

Harsh J commented on HDFS-1508:
---

I think this makes sense to go in, especially with the feature offered via 
HDFS-1509.

Dhruba - Would you have some spare cycles to rebase the patch onto current 
trunk? If not, I'll get it done by the week.

> Ability to do savenamespace without being in safemode
> -
>
> Key: HDFS-1508
> URL: https://issues.apache.org/jira/browse/HDFS-1508
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: savenamespaceWithoutSafemode.txt, 
> savenamespaceWithoutSafemode2.txt, savenamespaceWithoutSafemode3.txt, 
> savenamespaceWithoutSafemode4.txt, savenamespaceWithoutSafemode5.txt
>
>
> In the current code, the administrator can run savenamespace only after 
> putting the namenode in safemode. This means that applications that are 
> writing to HDFS encounters errors because the NN is in safemode. We would 
> like to allow saveNamespace even when the namenode is not in safemode.
> The savenamespace command already acquires the FSNamesystem writelock. There 
> is no need to require that the namenode is in safemode too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3615) Two BlockTokenSecretManager findbugs warnings

2012-07-09 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers reassigned HDFS-3615:


Assignee: Aaron T. Myers

> Two BlockTokenSecretManager findbugs warnings
> -
>
> Key: HDFS-3615
> URL: https://issues.apache.org/jira/browse/HDFS-3615
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Aaron T. Myers
>
> Looks like two findbugs warnings were introduced recently (see these across a 
> couple recent patches). Unclear what change introduced it as the file hasn't 
> been modified and recent committed changes pass the findbugs check.
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.keyUpdateInterval;
>  locked 75% of time
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.serialNo; 
> locked 75% of time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

2012-07-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409853#comment-13409853
 ] 

Allen Wittenauer commented on HDFS-2617:


I guess you haven't noticed that the Hadoop version is hard-coded to use 3DES...


> Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
> --
>
> Key: HDFS-2617
> URL: https://issues.apache.org/jira/browse/HDFS-2617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, 
> HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, 
> HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch
>
>
> The current approach to secure and authenticate nn web services is based on 
> Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now 
> that we have one, we can get rid of the non-standard KSSL and use SPNEGO 
> throughout.  This will simplify setup and configuration.  Also, Kerberized 
> SSL is a non-standard approach with its own quirks and dark corners 
> (HDFS-2386).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder

2012-07-09 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409849#comment-13409849
 ] 

Robert Joseph Evans commented on HDFS-3541:
---

@Uma,

Sorry it took me so long to respond.  Yes, I would be happy to look into do the 
porting, as the patch does not just apply. I filed HDFS-3622 to do this work on.

> Deadlock between recovery, xceiver and packet responder
> ---
>
> Key: HDFS-3541
> URL: https://issues.apache.org/jira/browse/HDFS-3541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: suja s
>Assignee: Vinay
> Fix For: 2.0.1-alpha, 3.0.0
>
> Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch
>
>
> Block Recovery initiated while write in progress at Datanode side. Found a 
> lock between recovery, xceiver and packet responder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3622) Backport HDFS-3541 to branch-0.23

2012-07-09 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created HDFS-3622:
-

 Summary: Backport HDFS-3541 to branch-0.23
 Key: HDFS-3622
 URL: https://issues.apache.org/jira/browse/HDFS-3622
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


HDFS-3541 Deadlock between recovery, xceiver and packet responder 

does not apply directly to branch-0.23, but the bug exists there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3615) Two BlockTokenSecretManager findbugs warnings

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409843#comment-13409843
 ] 

Eli Collins commented on HDFS-3615:
---

I must have been looking at a stale tree, "Fix issue with NN/DN 
re-registration" recently modified this file, is likely the culprit.

> Two BlockTokenSecretManager findbugs warnings
> -
>
> Key: HDFS-3615
> URL: https://issues.apache.org/jira/browse/HDFS-3615
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>
> Looks like two findbugs warnings were introduced recently (see these across a 
> couple recent patches). Unclear what change introduced it as the file hasn't 
> been modified and recent committed changes pass the findbugs check.
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.keyUpdateInterval;
>  locked 75% of time
> ISInconsistent synchronization of 
> org.apache.hadoop.hdfs.security.token.block.BlockTokenSecretManager.serialNo; 
> locked 75% of time

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409836#comment-13409836
 ] 

Daryn Sharp commented on HDFS-2617:
---

I'm interested in learning the details of why kssl is so bad.  I can't find 
much online except early versions of java 6 had an issue, and a solaris kext 
for kssl has had a number of problems.

WEP's usage of RC4 is an egregious example of a bad RC4 implementation.  WPA 
also used RC4 (TKIP) in a more sane manner before WPA2 switched to AES.  As 
best I can tell, the java gss doesn't use a WEP style RC4 impl, and gss also 
supports AES.  Both kssl and spnego are protected via SSL's encryption, and the 
krb tickets are encrypted.  Where is the achille's heel that affects kssl but 
not spnego?

> Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
> --
>
> Key: HDFS-2617
> URL: https://issues.apache.org/jira/browse/HDFS-2617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, 
> HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, 
> HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch
>
>
> The current approach to secure and authenticate nn web services is based on 
> Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now 
> that we have one, we can get rid of the non-standard KSSL and use SPNEGO 
> throughout.  This will simplify setup and configuration.  Also, Kerberized 
> SSL is a non-standard approach with its own quirks and dark corners 
> (HDFS-2386).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)

2012-07-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-3555:
--

 Component/s: hdfs client
  data-node
Hadoop Flags: Reviewed

> idle client socket triggers DN ERROR log (should be INFO or DEBUG)
> --
>
> Key: HDFS-3555
> URL: https://issues.apache.org/jira/browse/HDFS-3555
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.20.2
> Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago)
>Reporter: Jeff Lord
>Assignee: Andy Isaacson
> Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch
>
>
> Datanode service is logging java.net.SocketTimeoutException at ERROR level.
> This message indicates that the datanode is not able to send data to the 
> client because the client has stopped reading. This message is not really a 
> cause for alarm and should be INFO level.
> 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 
> DatanodeRegistration(x.x.x.x:50010, 
> storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/10.10.120.67:50010 remote=/10.10.120.67:59282]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)

2012-07-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-3555:
--

Environment: (was: Red Hat Enterprise Linux Server release 6.2 
(Santiago)
)

> idle client socket triggers DN ERROR log (should be INFO or DEBUG)
> --
>
> Key: HDFS-3555
> URL: https://issues.apache.org/jira/browse/HDFS-3555
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node, hdfs client
>Affects Versions: 0.20.2
>Reporter: Jeff Lord
>Assignee: Andy Isaacson
> Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch
>
>
> Datanode service is logging java.net.SocketTimeoutException at ERROR level.
> This message indicates that the datanode is not able to send data to the 
> client because the client has stopped reading. This message is not really a 
> cause for alarm and should be INFO level.
> 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 
> DatanodeRegistration(x.x.x.x:50010, 
> storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/10.10.120.67:50010 remote=/10.10.120.67:59282]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)

2012-07-09 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409803#comment-13409803
 ] 

Harsh J commented on HDFS-3555:
---

Thanks Andy. Will commit it in pending jenkins' result.

> idle client socket triggers DN ERROR log (should be INFO or DEBUG)
> --
>
> Key: HDFS-3555
> URL: https://issues.apache.org/jira/browse/HDFS-3555
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago)
>Reporter: Jeff Lord
>Assignee: Andy Isaacson
> Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch
>
>
> Datanode service is logging java.net.SocketTimeoutException at ERROR level.
> This message indicates that the datanode is not able to send data to the 
> client because the client has stopped reading. This message is not really a 
> cause for alarm and should be INFO level.
> 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 
> DatanodeRegistration(x.x.x.x:50010, 
> storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/10.10.120.67:50010 remote=/10.10.120.67:59282]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)

2012-07-09 Thread Andy Isaacson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3555:


Attachment: hdfs-3555-3.txt

Attaching correctly formatted patch.

> idle client socket triggers DN ERROR log (should be INFO or DEBUG)
> --
>
> Key: HDFS-3555
> URL: https://issues.apache.org/jira/browse/HDFS-3555
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago)
>Reporter: Jeff Lord
>Assignee: Andy Isaacson
> Attachments: hdfs-3555-2.txt, hdfs-3555-3.txt, hdfs-3555.patch
>
>
> Datanode service is logging java.net.SocketTimeoutException at ERROR level.
> This message indicates that the datanode is not able to send data to the 
> client because the client has stopped reading. This message is not really a 
> cause for alarm and should be INFO level.
> 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 
> DatanodeRegistration(x.x.x.x:50010, 
> storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/10.10.120.67:50010 remote=/10.10.120.67:59282]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409800#comment-13409800
 ] 

Harsh J commented on HDFS-3617:
---

{code}

 [exec] -1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 2 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 218 new Findbugs 
(version 2.0.1-rc3) warnings.
{code}

Findbugs seem unrelated to me. Quick scan through report shows nothing from my 
lines at least.

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)

2012-07-09 Thread Andy Isaacson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409796#comment-13409796
 ] 

Andy Isaacson commented on HDFS-3555:
-

bq. This looks okay to go, but can you rebase it please? It does not apply to 
current trunk.
My mistake, I uploaded "git show -b" which is nice to read but of course 
doesn't get the indentation correct.

bq. Also, is instanceof better than using a specific catch clause for 
SocketTimeoutException?
If there are two catch clauses, the common code gets duplicated. Currently 
that's just one line but it's just begging for someone to mistakenly add code 
to just one of the catch blocks ...
Given that this codepath is already pretty expensive (we're about to tear down 
a TCP socket, we've already constructed the Exception) the small additional 
overhead of instanceof is negligible.

> idle client socket triggers DN ERROR log (should be INFO or DEBUG)
> --
>
> Key: HDFS-3555
> URL: https://issues.apache.org/jira/browse/HDFS-3555
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago)
>Reporter: Jeff Lord
>Assignee: Andy Isaacson
> Attachments: hdfs-3555-2.txt, hdfs-3555.patch
>
>
> Datanode service is logging java.net.SocketTimeoutException at ERROR level.
> This message indicates that the datanode is not able to send data to the 
> client because the client has stopped reading. This message is not really a 
> cause for alarm and should be INFO level.
> 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 
> DatanodeRegistration(x.x.x.x:50010, 
> storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/10.10.120.67:50010 remote=/10.10.120.67:59282]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3555) idle client socket triggers DN ERROR log (should be INFO or DEBUG)

2012-07-09 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409793#comment-13409793
 ] 

Harsh J commented on HDFS-3555:
---

NVM my concern. I got it answered via 
http://stackoverflow.com/questions/103564/the-performance-impact-of-using-instanceof-in-java

Please do send in a properly applying patch. +1 as is.

> idle client socket triggers DN ERROR log (should be INFO or DEBUG)
> --
>
> Key: HDFS-3555
> URL: https://issues.apache.org/jira/browse/HDFS-3555
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.20.2
> Environment: Red Hat Enterprise Linux Server release 6.2 (Santiago)
>Reporter: Jeff Lord
>Assignee: Andy Isaacson
> Attachments: hdfs-3555-2.txt, hdfs-3555.patch
>
>
> Datanode service is logging java.net.SocketTimeoutException at ERROR level.
> This message indicates that the datanode is not able to send data to the 
> client because the client has stopped reading. This message is not really a 
> cause for alarm and should be INFO level.
> 2012-06-18 17:47:13 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode 
> DatanodeRegistration(x.x.x.x:50010, 
> storageID=DS-196671195-10.10.120.67-50010-1334328338972, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/10.10.120.67:50010 remote=/10.10.120.67:59282]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:267)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:163)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2936) File close()-ing hangs indefinitely if the number of live blocks does not match the minimum replication

2012-07-09 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409773#comment-13409773
 ] 

Harsh J commented on HDFS-2936:
---

I've locally finished making the switch, but still need to figure out how to 
write the file-hanger test. Once it hangs, my test does not currently recover 
back. As soon as I have this figured out, I'll post another version up for 
review.

> File close()-ing hangs indefinitely if the number of live blocks does not 
> match the minimum replication
> ---
>
> Key: HDFS-2936
> URL: https://issues.apache.org/jira/browse/HDFS-2936
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Assignee: Harsh J
> Attachments: HDFS-2936.patch
>
>
> If an admin wishes to enforce replication today for all the users of their 
> cluster, he may set {{dfs.namenode.replication.min}}. This property prevents 
> users from creating files with < expected replication factor.
> However, the value of minimum replication set by the above value is also 
> checked at several other points, especially during completeFile (close) 
> operations. If a condition arises wherein a write's pipeline may have gotten 
> only < minimum nodes in it, the completeFile operation does not successfully 
> close the file and the client begins to hang waiting for NN to replicate the 
> last bad block in the background. This form of hard-guarantee can, for 
> example, bring down clusters of HBase during high xceiver load on DN, or disk 
> fill-ups on many of them, etc..
> I propose we should split the property in two parts:
> * dfs.namenode.replication.min
> ** Stays the same name, but only checks file creation time replication factor 
> value and during adjustments made via setrep/etc.
> * dfs.namenode.replication.min.for.write
> ** New property that disconnects the rest of the checks from the above 
> property, such as the checks done during block commit, file complete/close, 
> safemode checks for block availability, etc..
> Alternatively, we may also choose to remove the client-side hang of 
> completeFile/close calls with a set number of retries. This would further 
> require discussion about how a file-closure handle ought to be handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes

2012-07-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-3621:
--

  Component/s: hdfs client
Affects Version/s: 2.0.0-alpha

> Add a main method to HdfsConfiguration, for debug purposes
> --
>
> Key: HDFS-3621
> URL: https://issues.apache.org/jira/browse/HDFS-3621
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Trivial
>  Labels: newbie
>
> Just like Configuration has a main() func that dumps XML out for debug 
> purposes, we should have a similar function under the HdfsConfiguration class 
> that does the same. This is useful in testing out app classpath setups at 
> times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3621) Add a main method to HdfsConfiguration, for debug purposes

2012-07-09 Thread Harsh J (JIRA)
Harsh J created HDFS-3621:
-

 Summary: Add a main method to HdfsConfiguration, for debug purposes
 Key: HDFS-3621
 URL: https://issues.apache.org/jira/browse/HDFS-3621
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Harsh J
Priority: Trivial


Just like Configuration has a main() func that dumps XML out for debug 
purposes, we should have a similar function under the HdfsConfiguration class 
that does the same. This is useful in testing out app classpath setups at times.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3509) WebHdfsFilesystem does not work within a proxyuser doAs call in secure mode

2012-07-09 Thread Alejandro Abdelnur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alejandro Abdelnur updated HDFS-3509:
-

Attachment: HDFS-3509-branch1.patch

backport patch for branch-1. Note that the patch does not have a testcase, in 
trunk/branch-2 this is tested from HttpFS which is not present in branch-1.

> WebHdfsFilesystem does not work within a proxyuser doAs call in secure mode
> ---
>
> Key: HDFS-3509
> URL: https://issues.apache.org/jira/browse/HDFS-3509
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Alejandro Abdelnur
>Priority: Critical
> Attachments: HDFS-3509-branch1.patch, HDFS-3509.patch
>
>
> It does not find kerberos credentials in the context (the UGI is logged in 
> from a keytab) and it fails with the following trace:
> {code}
> java.lang.IllegalStateException: unknown char '<'(60) in 
> org.mortbay.util.ajax.JSON$ReaderSource@23245e75
>   at org.mortbay.util.ajax.JSON.handleUnknown(JSON.java:788)
>   at org.mortbay.util.ajax.JSON.parse(JSON.java:777)
>   at org.mortbay.util.ajax.JSON.parse(JSON.java:603)
>   at org.mortbay.util.ajax.JSON.parse(JSON.java:183)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.jsonParse(WebHdfsFileSystem.java:259)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:268)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.run(WebHdfsFileSystem.java:427)
>   at 
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem.getDelegationToken(WebHdfsFileSystem.java:722)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3582) Hook System.exit in MiniDFSCluster

2012-07-09 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3582:
--

Attachment: hdfs-3582.txt

Good point Colin, updated the javadoc to indicate as such.

> Hook System.exit in MiniDFSCluster
> --
>
> Key: HDFS-3582
> URL: https://issues.apache.org/jira/browse/HDFS-3582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt, 
> hdfs-3582.txt
>
>
> Occasionally the tests fail with "java.util.concurrent.ExecutionException: 
> org.apache.maven.surefire.booter.SurefireBooterForkException:
> Error occurred in starting fork, check output in log" because the NN is 
> exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't 
> retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
> know which part of the test triggered which exit in HDFS. To make this 
> debuggable, let's hook this in MiniDFSCluster  via installing a security 
> manager that overrides checkExit (ala TestClusterId) or mock out System.exit 
> in the code itself. I think the former is preferable though we'll need to 
> keep the door open for tests that want to set their own security manager 
> (should be fine to override this one some times).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3568) fuse_dfs: add support for security

2012-07-09 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-3568:
---

Attachment: HDFS-3568.004.patch

* rebase

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.1-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch, HDFS-3568.004.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

2012-07-09 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409709#comment-13409709
 ] 

Allen Wittenauer commented on HDFS-2617:


No.  KSSL is hard-coded by RFC to only use certain ciphers.  To put this into 
terms that many might have an easier time understanding, KSSL is roughly 
equivalent to WEP in terms of its vulnerability.

I'd also like to point out what our 'spread' looks like:

0.20.2 and lower: insecure only, so irrelevant
0.20.203 through 0.20.205: only had KSSL+hftp
1.0.0 and up: WebHDFS is available

So we're looking at a window of releases of about 5-6 months.  Folks that are 
running something in 0.20.203 through 1.0.1 should really upgrade anyway due to 
the severity of some of the bugs never mind the security holes that have since 
been found.

> Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
> --
>
> Key: HDFS-2617
> URL: https://issues.apache.org/jira/browse/HDFS-2617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, 
> HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, 
> HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch
>
>
> The current approach to secure and authenticate nn web services is based on 
> Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now 
> that we have one, we can get rid of the non-standard KSSL and use SPNEGO 
> throughout.  This will simplify setup and configuration.  Also, Kerberized 
> SSL is a non-standard approach with its own quirks and dark corners 
> (HDFS-2386).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3067) NPE in DFSInputStream.readBuffer if read is repeated on corrupted block

2012-07-09 Thread Aaron T. Myers (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-3067:
-

Fix Version/s: (was: 3.0.0)
   2.0.1-alpha

I've just merged this patch to branch-2 and updated CHANGES.txt in trunk to 
suit.

> NPE in DFSInputStream.readBuffer if read is repeated on corrupted block
> ---
>
> Key: HDFS-3067
> URL: https://issues.apache.org/jira/browse/HDFS-3067
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 0.24.0
>Reporter: Henry Robinson
>Assignee: Henry Robinson
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-3067.1.patch, HDFS-3607.patch
>
>
> With a singly-replicated block that's corrupted, issuing a read against it 
> twice in succession (e.g. if ChecksumException is caught by the client) gives 
> a NullPointerException.
> Here's the body of a test that reproduces the problem:
> {code}
> final short REPL_FACTOR = 1;
> final long FILE_LENGTH = 512L;
> cluster.waitActive();
> FileSystem fs = cluster.getFileSystem();
> Path path = new Path("/corrupted");
> DFSTestUtil.createFile(fs, path, FILE_LENGTH, REPL_FACTOR, 12345L);
> DFSTestUtil.waitReplication(fs, path, REPL_FACTOR);
> ExtendedBlock block = DFSTestUtil.getFirstBlock(fs, path);
> int blockFilesCorrupted = cluster.corruptBlockOnDataNodes(block);
> assertEquals("All replicas not corrupted", REPL_FACTOR, 
> blockFilesCorrupted);
> InetSocketAddress nnAddr =
> new InetSocketAddress("localhost", cluster.getNameNodePort());
> DFSClient client = new DFSClient(nnAddr, conf);
> DFSInputStream dis = client.open(path.toString());
> byte[] arr = new byte[(int)FILE_LENGTH];
> boolean sawException = false;
> try {
>   dis.read(arr, 0, (int)FILE_LENGTH);
> } catch (ChecksumException ex) { 
>   sawException = true;
> }
> 
> assertTrue(sawException);
> sawException = false;
> try {
>   dis.read(arr, 0, (int)FILE_LENGTH); // <-- NPE thrown here
> } catch (ChecksumException ex) { 
>   sawException = true;
> } 
> {code}
> The stack:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:492)
>   at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:545)
> [snip test stack]
> {code}
> and the problem is that currentNode is null. It's left at null after the 
> first read, which fails, and then is never refreshed because the condition in 
> read that protects blockSeekTo is only triggered if the current position is 
> outside the block's range. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3617:
--

Target Version/s: 1.2.0  (was: 1.1.1)
  Status: Open  (was: Patch Available)

canceling patch since this is against branch-1.



> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409704#comment-13409704
 ] 

Eli Collins commented on HDFS-3617:
---

lgtm, +1 pending test-patch results (please post in a comment)

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3620) WebHdfsFileSystem getHomeDirectory() should not resolve locally

2012-07-09 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3620:
--

Component/s: webhdfs

> WebHdfsFileSystem getHomeDirectory() should not resolve locally
> ---
>
> Key: HDFS-3620
> URL: https://issues.apache.org/jira/browse/HDFS-3620
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: webhdfs
>Affects Versions: 1.0.3, 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Priority: Critical
>
> WebHdfsFileSystem getHomeDirectory() method it is hardcoded to return 
> '/user/' + UGI#shortname. Instead, it should make a HTTP REST call with 
> op=GETHOMEDIRECTORY.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409701#comment-13409701
 ] 

Daryn Sharp commented on HDFS-3577:
---

bq. The file size was 1MB in the test but the block size was only 1kB. 
Therefore, it created a lot of local files and failed with 
"java.net.SocketException: Too many open files".

Does this mean there's a fd leak?  Or at least a leak during the create 
request?  If so, is the test at fault?

> webHdfsFileSystem fails to read files with chunked transfer encoding
> 
>
> Key: HDFS-3577
> URL: https://issues.apache.org/jira/browse/HDFS-3577
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: h3577_20120705.patch, h3577_20120708.patch
>
>
> If reading a file large enough for which the httpserver running 
> webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of 
> webhdfs), then the WebHdfsFileSystem client fails with an IOException with 
> message *Content-Length header is missing*.
> It looks like WebHdfsFileSystem is delegating opening of the inputstream to 
> *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* 
> header, but when using chunked transfer encoding the *Content-Length* header 
> is not present and  the *URLOpener.openInputStream()* method thrown an 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3620) WebHdfsFileSystem getHomeDirectory() should not resolve locally

2012-07-09 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HDFS-3620:


 Summary: WebHdfsFileSystem getHomeDirectory() should not resolve 
locally
 Key: HDFS-3620
 URL: https://issues.apache.org/jira/browse/HDFS-3620
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha, 1.0.3
Reporter: Alejandro Abdelnur
Priority: Critical


WebHdfsFileSystem getHomeDirectory() method it is hardcoded to return '/user/' 
+ UGI#shortname. Instead, it should make a HTTP REST call with 
op=GETHOMEDIRECTORY.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-3617:
--

Attachment: HDFS-3617.patch

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J updated HDFS-3617:
--

Status: Patch Available  (was: Open)

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
> Attachments: HDFS-3617.patch
>
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3617) Port HDFS-96 to branch-1 (support blocks greater than 2GB)

2012-07-09 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reassigned HDFS-3617:
-

Assignee: Harsh J

> Port HDFS-96 to branch-1 (support blocks greater than 2GB)
> --
>
> Key: HDFS-3617
> URL: https://issues.apache.org/jira/browse/HDFS-3617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.3
>Reporter: Matt Foley
>Assignee: Harsh J
>
> Please see HDFS-96.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409667#comment-13409667
 ] 

Colin Patrick McCabe commented on HDFS-3568:


bq. hdfsFreeBuilder is dead code, is it used later in the tests?

You would need this if you had a builder, but then found some reason to delete 
the builder without constructing an HDFS instance.  It really needs to be in 
the API because otherwise this would be impossible.

The current code doesn't do anything that could fail between creating the 
builder and using it to build an HDFS instance, so fuse_dfs doesn't use it at 
this time.  But it's good to have that option.

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.1-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-799) libhdfs must call DetachCurrentThread when a thread is destroyed

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409657#comment-13409657
 ] 

Eli Collins commented on HDFS-799:
--

Patch looks good, testing?

> libhdfs must call DetachCurrentThread when a thread is destroyed
> 
>
> Key: HDFS-799
> URL: https://issues.apache.org/jira/browse/HDFS-799
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Christian Kunz
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-799.001.patch
>
>
> Threads that call AttachCurrentThread in libhdfs and disappear without 
> calling DetachCurrentThread cause a memory leak.
> Libhdfs should detach the current thread when this thread exits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3537) put libhdfs source files in a directory named libhdfs

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409632#comment-13409632
 ] 

Eli Collins commented on HDFS-3537:
---

+1 to the move, Colin can you post a patch that should be applied after I do 
the move, which I'll do as follows?

{shell}
native $ svn mkdir libhdfs
native $ svn mv !(libhdfs) libhdfs
{shell}


> put libhdfs source files in a directory named libhdfs
> -
>
> Key: HDFS-3537
> URL: https://issues.apache.org/jira/browse/HDFS-3537
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-3537.001.patch
>
>
> Move libhdfs source files from main/native to main/native/libhdfs.  Rename 
> hdfs_read to libhdfs_test_read; rename hdfs_write to libhdfs_test_write.
> The rationale is that we'd like to add some other stuff under main/native 
> (like fuse_dfs) and it's nice to have separate things in separate directories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3582) Hook System.exit in MiniDFSCluster

2012-07-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409628#comment-13409628
 ] 

Colin Patrick McCabe commented on HDFS-3582:


ExitUtil#terminate:  should we point out in the JavaDoc that this is the *only* 
"exit the process" method that should be called from the NN or DN?  I didn't 
get that sense from reading the comment that's there now.

Other than that, looks great...

> Hook System.exit in MiniDFSCluster
> --
>
> Key: HDFS-3582
> URL: https://issues.apache.org/jira/browse/HDFS-3582
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
>Priority: Minor
> Attachments: hdfs-3582.txt, hdfs-3582.txt, hdfs-3582.txt
>
>
> Occasionally the tests fail with "java.util.concurrent.ExecutionException: 
> org.apache.maven.surefire.booter.SurefireBooterForkException:
> Error occurred in starting fork, check output in log" because the NN is 
> exit'ing (via System.exit or Runtime.exit). Unfortunately Surefire doesn't 
> retain the log output (see SUREFIRE-871) so the test log is empty, we don't 
> know which part of the test triggered which exit in HDFS. To make this 
> debuggable, let's hook this in MiniDFSCluster  via installing a security 
> manager that overrides checkExit (ala TestClusterId) or mock out System.exit 
> in the code itself. I think the former is preferable though we'll need to 
> keep the door open for tests that want to set their own security manager 
> (should be fine to override this one some times).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3581) FSPermissionChecker#checkPermission sticky bit check missing range check

2012-07-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated HDFS-3581:
--

Fix Version/s: 0.23.3

> FSPermissionChecker#checkPermission sticky bit check missing range check 
> -
>
> Key: HDFS-3581
> URL: https://issues.apache.org/jira/browse/HDFS-3581
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 2.0.0-alpha
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 0.23.3, 2.0.1-alpha
>
> Attachments: hdfs-3581.txt
>
>
> The checkStickyBit call in FSPermissionChecker#checkPermission is missing a 
> range check which results in an index out of bounds when accessing root.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3606) libhdfs: create self-contained unit test

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409622#comment-13409622
 ] 

Eli Collins commented on HDFS-3606:
---

Looks good, minor comments

- This function doesn't actually write/read a file. Also I'd pull the code out 
to something like testWriteFile
{code}
/**
 * Test that we can write a file with libhdfs and then read it back
 */
int main(void) {
{code}
- Style nit: remove extern from the function prototypes in nativeMiniDfs.h

> libhdfs: create self-contained unit test
> 
>
> Key: HDFS-3606
> URL: https://issues.apache.org/jira/browse/HDFS-3606
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: libhdfs
>Affects Versions: 2.0.1-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
>Priority: Minor
> Attachments: HDFS-3606.001.patch
>
>
> We should have a self-contained unit test for libhdfs and also for FUSE.
> We do have hdfs_test, but it is not self-contained (it requires a cluster to 
> already be running before it can be used.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3568) fuse_dfs: add support for security

2012-07-09 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409611#comment-13409611
 ] 

Eli Collins commented on HDFS-3568:
---

Approach and patch look good to me.

hdfsFreeBuilder is dead code, is it used later in the tests?

> fuse_dfs: add support for security
> --
>
> Key: HDFS-3568
> URL: https://issues.apache.org/jira/browse/HDFS-3568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0, 2.0.0-alpha
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.1.0, 2.0.1-alpha
>
> Attachments: HDFS-3568.001.patch, HDFS-3568.002.patch, 
> HDFS-3568.003.patch
>
>
> fuse_dfs should have support for Kerberos authentication.  This would allow 
> FUSE to be used in a secure cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3037) TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy

2012-07-09 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3037:
--

Fix Version/s: 2.0.1-alpha
   0.23.3

> TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy
> 
>
> Key: HDFS-3037
> URL: https://issues.apache.org/jira/browse/HDFS-3037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Fix For: 0.23.3, 2.0.1-alpha, 3.0.0
>
> Attachments: HDFS-3037.patch
>
>
> In this test, we restart a DN in a running cluster, call MiniDFS#waitActive, 
> and then assert some things about the DN. Trouble is, 
> MiniDFSCluster#waitActive won't wait any time at all, since the DN had 
> previously registered with the NN and the NN never had time to realize the DN 
> was dead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3037) TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409596#comment-13409596
 ] 

Daryn Sharp commented on HDFS-3037:
---

I've also committed to branch-2 & 23.

> TestMulitipleNNDataBlockScanner#testBlockScannerAfterRestart is racy
> 
>
> Key: HDFS-3037
> URL: https://issues.apache.org/jira/browse/HDFS-3037
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.24.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
>Priority: Minor
> Fix For: 0.23.3, 2.0.1-alpha, 3.0.0
>
> Attachments: HDFS-3037.patch
>
>
> In this test, we restart a DN in a running cluster, call MiniDFS#waitActive, 
> and then assert some things about the DN. Trouble is, 
> MiniDFSCluster#waitActive won't wait any time at all, since the DN had 
> previously registered with the NN and the NN never had time to realize the DN 
> was dead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3603) Decouple TestHDFSTrash from TestTrash

2012-07-09 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3603:
--

   Resolution: Fixed
Fix Version/s: 0.23.3
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> Decouple TestHDFSTrash from TestTrash
> -
>
> Key: HDFS-3603
> URL: https://issues.apache.org/jira/browse/HDFS-3603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 0.23.3, 2.0.1-alpha
>
> Attachments: HDFS-3603.patch
>
>
> TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3603) Decouple TestHDFSTrash from TestTrash

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409589#comment-13409589
 ] 

Daryn Sharp commented on HDFS-3603:
---

I've committed this to branch-23 as well.

> Decouple TestHDFSTrash from TestTrash
> -
>
> Key: HDFS-3603
> URL: https://issues.apache.org/jira/browse/HDFS-3603
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-3603.patch
>
>
> TestHDFSTrash is failing pretty regularly during test builds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3619) isGoodBlockCandidate() in Balancer is not handling properly if replica factor >3

2012-07-09 Thread Junping Du (JIRA)
Junping Du created HDFS-3619:


 Summary: isGoodBlockCandidate() in Balancer is not handling 
properly if replica factor >3
 Key: HDFS-3619
 URL: https://issues.apache.org/jira/browse/HDFS-3619
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: balancer
Affects Versions: 2.0.0-alpha, 1.0.0
Reporter: Junping Du
Assignee: Junping Du


Let's assume:
1. replica factor = 4
2. source node in rack 1 has 1st replica, 2nd and 3rd replica are in rack 2, 
4th replica in rack3 and target node is in rack3. 
So, It should be good for balancer to move replica from source node to target 
node but will return "false" in isGoodBlockCandidate(). I think we can fix it 
by simply making judgement that at least one replica node (other than source) 
is on the different rack of target node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3591) Backport HDFS-3357 to branch-0.23

2012-07-09 Thread Daryn Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3591:
--

   Resolution: Fixed
Fix Version/s: 0.23.3
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

> Backport HDFS-3357 to branch-0.23
> -
>
> Key: HDFS-3591
> URL: https://issues.apache.org/jira/browse/HDFS-3591
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Fix For: 0.23.3
>
> Attachments: HDFS-3357-branch-0.23.txt
>
>
> I would like to have HDFS-3357 in branch-0.23, but it is not a trivial 
> upmerge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3591) Backport HDFS-3357 to branch-0.23

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409553#comment-13409553
 ] 

Daryn Sharp commented on HDFS-3591:
---

+1.  I've committed to branch-23

> Backport HDFS-3357 to branch-0.23
> -
>
> Key: HDFS-3591
> URL: https://issues.apache.org/jira/browse/HDFS-3591
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Robert Joseph Evans
>Assignee: Robert Joseph Evans
> Attachments: HDFS-3357-branch-0.23.txt
>
>
> I would like to have HDFS-3357 in branch-0.23, but it is not a trivial 
> upmerge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2978) The NameNode should expose name dir statuses via JMX

2012-07-09 Thread Robert Joseph Evans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans updated HDFS-2978:
--

Fix Version/s: 0.23.3

> The NameNode should expose name dir statuses via JMX
> 
>
> Key: HDFS-2978
> URL: https://issues.apache.org/jira/browse/HDFS-2978
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Affects Versions: 0.23.0, 1.0.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 1.0.2, 0.23.3, 2.0.0-alpha
>
> Attachments: HDFS-2978-branch-1.patch, HDFS-2978.patch, 
> HDFS-2978.patch
>
>
> We currently display this info on the NN web UI, so users who wish to monitor 
> this must either do it manually or parse HTML. We should publish this 
> information via JMX.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3557) provide means of escaping special characters to `hadoop fs` command

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409520#comment-13409520
 ] 

Daryn Sharp commented on HDFS-3557:
---

Yes, it is a pain to get backslashes through...  20.2 is pretty old.  I don't 
think 20.5 has the problem so you may want to consider upgrading to the latest 
20, or better yet, 1.x.

> provide means of escaping special characters to `hadoop fs` command
> ---
>
> Key: HDFS-3557
> URL: https://issues.apache.org/jira/browse/HDFS-3557
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 0.20.2
>Reporter: Jeff Hodges
>Priority: Minor
>
> When running an investigative job, I used a date parameter that selected 
> multiple directories for the input (e.g. "my_data/2012/06/{18,19,20}"). It 
> used this same date parameter when creating the output directory.
> But `hadoop fs` was unable to ls, getmerge, or rmr it until I used the regex 
> operator "?" and mv to change the name (that is, `-mv 
> output/2012/06/?18,19,20? foobar").
> Shells and filesystems for other systems provide a means of escaping "special 
> characters" generically, but there seems to be no such means in HDFS/`hadoop 
> fs`. Providing one would be a great way to make accessing HDFS more robust.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409508#comment-13409508
 ] 

Daryn Sharp commented on HDFS-3577:
---

Sorry, I freaked out before studying the whole patch.  I still think a chunked 
encoding check should be present unless I'm misunderstanding something.  
There's also not much use in instantiating a {{BoundedInputStream}} w/o a limit.

> webHdfsFileSystem fails to read files with chunked transfer encoding
> 
>
> Key: HDFS-3577
> URL: https://issues.apache.org/jira/browse/HDFS-3577
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: h3577_20120705.patch, h3577_20120708.patch
>
>
> If reading a file large enough for which the httpserver running 
> webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of 
> webhdfs), then the WebHdfsFileSystem client fails with an IOException with 
> message *Content-Length header is missing*.
> It looks like WebHdfsFileSystem is delegating opening of the inputstream to 
> *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* 
> header, but when using chunked transfer encoding the *Content-Length* header 
> is not present and  the *URLOpener.openInputStream()* method thrown an 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3577) webHdfsFileSystem fails to read files with chunked transfer encoding

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409506#comment-13409506
 ] 

Daryn Sharp commented on HDFS-3577:
---

No, no, no!  This is reverting a fix for > 32-bit file transfers.  I think the 
correct fix is to require content-length unless chunked encoding is being used.

> webHdfsFileSystem fails to read files with chunked transfer encoding
> 
>
> Key: HDFS-3577
> URL: https://issues.apache.org/jira/browse/HDFS-3577
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs client
>Affects Versions: 2.0.0-alpha
>Reporter: Alejandro Abdelnur
>Assignee: Tsz Wo (Nicholas), SZE
>Priority: Blocker
> Attachments: h3577_20120705.patch, h3577_20120708.patch
>
>
> If reading a file large enough for which the httpserver running 
> webhdfs/httpfs uses chunked transfer encoding (more than 24K in the case of 
> webhdfs), then the WebHdfsFileSystem client fails with an IOException with 
> message *Content-Length header is missing*.
> It looks like WebHdfsFileSystem is delegating opening of the inputstream to 
> *ByteRangeInputStream.URLOpener* class, which checks for the *Content-Length* 
> header, but when using chunked transfer encoding the *Content-Length* header 
> is not present and  the *URLOpener.openInputStream()* method thrown an 
> exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2617) Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution

2012-07-09 Thread Daryn Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409501#comment-13409501
 ] 

Daryn Sharp commented on HDFS-2617:
---

I'd too like kssl to be supported, even as just a fallback, a bit longer 
because impacts the ability to migrate data from older clusters not yet 
upgraded to 1.x+.  I'm a bit concerned that webhdfs hasn't (yet) been "battle 
hardened" so any bugs may severely impact production environments.

>From a quick search, it looks like 128 bit encryption is considered weak.  128 
>bits isn't exactly terrible, so can we just disable <128 bit ciphers?  

> Replaced Kerberized SSL for image transfer and fsck with SPNEGO-based solution
> --
>
> Key: HDFS-2617
> URL: https://issues.apache.org/jira/browse/HDFS-2617
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: security
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-2617-a.patch, HDFS-2617-b.patch, 
> HDFS-2617-config.patch, HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, 
> HDFS-2617-trunk.patch, HDFS-2617-trunk.patch, hdfs-2617-1.1.patch
>
>
> The current approach to secure and authenticate nn web services is based on 
> Kerberized SSL and was developed when a SPNEGO solution wasn't available. Now 
> that we have one, we can get rid of the non-standard KSSL and use SPNEGO 
> throughout.  This will simplify setup and configuration.  Also, Kerberized 
> SSL is a non-standard approach with its own quirks and dark corners 
> (HDFS-2386).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3482) hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified without arguments

2012-07-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409493#comment-13409493
 ] 

Hudson commented on HDFS-3482:
--

Integrated in Hadoop-Mapreduce-trunk #1131 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/])
HDFS-3482. hdfs balancer throws ArrayIndexOutOfBoundsException if option is 
specified without values. Contributed by Madhukara Phatak. 

Submitted by:   Madhukara Phatak.
Reviewed by:Uma Maheswara Rao G. (Revision 1358812)

 Result = SUCCESS
umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358812
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancer.java


> hdfs balancer throws ArrayIndexOutOfBoundsException if option is specified 
> without arguments
> 
>
> Key: HDFS-3482
> URL: https://issues.apache.org/jira/browse/HDFS-3482
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer
>Affects Versions: 2.0.0-alpha
>Reporter: Stephen Chu
>Assignee: madhukara phatak
>Priority: Minor
>  Labels: newbie
> Fix For: 3.0.0
>
> Attachments: HDFS-3482-1.patch, HDFS-3482-2.patch, HDFS-3482-3.patch, 
> HDFS-3482-4.patch, HDFS-3482-4.patch, HDFS-3482.patch
>
>
> When running the hdfs balancer with an option but no argument, we run into an 
> ArrayIndexOutOfBoundsException. It's preferable to print the usage.
> {noformat}
> bash-3.2$ hdfs balancer -threshold
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:38:46 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1505)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> bash-3.2$ hdfs balancer -policy
> Usage: java Balancer
> [-policy ]the balancing policy: datanode or blockpool
> [-threshold ]  Percentage of disk capacity
> Balancing took 261.0 milliseconds
> 12/05/31 09:39:03 ERROR balancer.Balancer: Exiting balancer due an exception
> java.lang.ArrayIndexOutOfBoundsException: 1
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.parse(Balancer.java:1520)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer$Cli.run(Balancer.java:1482)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hdfs.server.balancer.Balancer.main(Balancer.java:1555)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3541) Deadlock between recovery, xceiver and packet responder

2012-07-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409492#comment-13409492
 ] 

Hudson commented on HDFS-3541:
--

Integrated in Hadoop-Mapreduce-trunk #1131 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/])
HDFS-3541. Deadlock between recovery, xceiver and packet responder. 
Contributed by Vinay.

Submitted by:   Vinay
Reviewed by:Uma Maheswara Rao G (Revision 1358794)

 Result = SUCCESS
umamahesh : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358794
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockRecovery.java


> Deadlock between recovery, xceiver and packet responder
> ---
>
> Key: HDFS-3541
> URL: https://issues.apache.org/jira/browse/HDFS-3541
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: data-node
>Affects Versions: 0.23.3, 2.0.1-alpha
>Reporter: suja s
>Assignee: Vinay
> Fix For: 2.0.1-alpha, 3.0.0
>
> Attachments: DN_dump.rar, HDFS-3541-2.patch, HDFS-3541.patch
>
>
> Block Recovery initiated while write in progress at Datanode side. Found a 
> lock between recovery, xceiver and packet responder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-711) hdfsUtime does not handle atime = 0 or mtime = 0 correctly

2012-07-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409488#comment-13409488
 ] 

Hudson commented on HDFS-711:
-

Integrated in Hadoop-Mapreduce-trunk #1131 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1131/])
HDFS-711. hdfsUtime does not handle atime = 0 or mtime = 0 correctly. 
Contributed by Colin Patrick McCabe (Revision 1358810)

 Result = SUCCESS
eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1358810
Files : 
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h


> hdfsUtime does not handle atime = 0 or mtime = 0 correctly
> --
>
> Key: HDFS-711
> URL: https://issues.apache.org/jira/browse/HDFS-711
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 0.20.1
>Reporter: freestyler
>Assignee: Colin Patrick McCabe
> Fix For: 2.0.1-alpha
>
> Attachments: HDFS-711.001.patch, HDFS-711.002.patch, 
> HDFS-711.003.patch
>
>
> in HADOOP/src/c++/libhdfs/hdfs.h
> The following function document is incorrect:
> /*   @param mtime new modification time or 0 for only set access time in 
> seconds
>   @param atime new access time or 0 for only set modification time in 
> seconds
> */
> int hdfsUtime(hdfsFS fs, const char* path, tTime mtime, tTime atime);
> Currently, setting mtime or atime to 0 has no special meaning. That is, file 
> last modified time will change to 0 if the mtime argument is 0.
> libhdfs should translate mtime = 0 or atime = 0 to the special value -1, 
> which in HDFS means "don't change this time."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >