[jira] Updated: (HDFS-1007) HFTP needs to be updated to use delegation tokens

2010-07-15 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1007:
-

Status: Resolved  (was: Patch Available)
Resolution: Fixed

 HFTP needs to be updated to use delegation tokens
 -

 Key: HDFS-1007
 URL: https://issues.apache.org/jira/browse/HDFS-1007
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Affects Versions: 0.22.0
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.22.0

 Attachments: 1007-bugfix.patch, distcp-hftp-2.1.1.patch, 
 distcp-hftp.1.patch, distcp-hftp.2.1.patch, distcp-hftp.2.patch, 
 distcp-hftp.patch, HDFS-1007-1.patch, HDFS-1007-2.patch, HDFS-1007-3.patch, 
 HDFS-1007-BP20-fix-1.patch, HDFS-1007-BP20-fix-2.patch, 
 HDFS-1007-BP20-fix-3.patch, HDFS-1007-BP20.patch, 
 hdfs-1007-long-running-hftp-client.patch, hdfs-1007-securityutil-fix.patch


 HFTPFileSystem should be updated to use the delegation tokens so that it can 
 talk to the secure namenodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1301) TestHDFSProxy need to use server side conf for ProxyUser stuff.

2010-07-15 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik updated HDFS-1301:
-

Attachment: HDFS-1301-BP20.patch

for previous version, not for commit

 TestHDFSProxy need to use server side conf for ProxyUser stuff.
 ---

 Key: HDFS-1301
 URL: https://issues.apache.org/jira/browse/HDFS-1301
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
Assignee: Boris Shkolnik
 Attachments: HDFS-1301-BP20.patch


 currently TestHdfsProxy sets hadoop.proxyuser.USER.groups in local copy of 
 configuration. 
 But ProxyUsers only looks at the server side config.
 For test we can uses static method in ProxyUsers to load the config.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1227) UpdateBlock fails due to unmatched file length

2010-07-15 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1293#action_1293
 ] 

Thanh Do commented on HDFS-1227:


In the append-branch, I saw the unmatched file length exception happens, but 
then the client retries RecoverBlock, hence, tolerates this

 UpdateBlock fails due to unmatched file length
 --

 Key: HDFS-1227
 URL: https://issues.apache.org/jira/browse/HDFS-1227
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20-append
Reporter: Thanh Do

 - Summary: client append is not atomic, hence, it is possible that
 when retrying during append, there is an exception in updateBlock
 indicating unmatched file length, making append failed.
  
 - Setup:
 + # available datanodes = 3
 + # disks / datanode = 1
 + # failures = 2
 + failure type = bad disk
 + When/where failure happens = (see below)
 + This bug is non-deterministic, to reproduce it, add a sufficient sleep 
 before out.write() in BlockReceiver.receivePacket() in dn1 and dn2 but not dn3
  
 - Details:
  Suppose client appends 16 bytes to block X which has length 16 bytes at dn1, 
 dn2, dn3.
 Dn1 is primary. The pipeline is dn3-dn2-dn1. recoverBlock succeeds.
 Client starts sending data to the dn3 - the first datanode in pipeline.
 dn3 forwards the packet to downstream datanodes, and starts writing
 data to its disk. Suppose there is an exception in dn3 when writing to disk.
 Client gets the exception, it starts the recovery code by calling 
 dn1.recoverBlock() again.
 dn1 in turn calls dn2.getMetadataInfo() and dn1.getMetaDataInfo() to build 
 the syncList.
 Suppose at the time getMetadataInfo() is called at both datanodes (dn1 and 
 dn2),
 the previous packet (which is sent from dn3) has not come to disk yet.
 Hence, the block Info given by getMetaDataInfo contains the length of 16 
 bytes.
 But after that, the packet comes to disk, making the block file length now 
 becomes 32 bytes.
 Using the syncList (with contains block info with length 16 byte), dn1 calls 
 updateBlock at
 dn2 and dn1, which will failed, because the length of new block info (given 
 by updateBlock,
 which is 16 byte) does not match with its actual length on disk (which is 32 
 byte)
  
 Note that this bug is non-deterministic. Its depends on the thread 
 interleaving
 at datanodes.
 This bug was found by our Failure Testing Service framework:
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
 For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
 Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1298) Add support in HDFS to update statistics that tracks number of file system operations in FileSystem

2010-07-15 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888965#action_12888965
 ] 

Konstantin Shvachko commented on HDFS-1298:
---

+1 Both patches look good to me.

 Add support in HDFS to update statistics that tracks number of file system 
 operations in FileSystem
 ---

 Key: HDFS-1298
 URL: https://issues.apache.org/jira/browse/HDFS-1298
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.22.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 0.22.0

 Attachments: HDFS-1298.patch


 See HADOOP-6859 for the new statistics.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-974) FileSystem.Statistics should include NN accesses from the clients

2010-07-15 Thread Suresh Srinivas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-974.
--

  Assignee: Suresh Srinivas  (was: Sanjay Radia)
Resolution: Duplicate

Duplicate of HDFS-1298

 FileSystem.Statistics should include NN accesses from the clients
 -

 Key: HDFS-974
 URL: https://issues.apache.org/jira/browse/HDFS-974
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: Arun C Murthy
Assignee: Suresh Srinivas
 Fix For: 0.22.0


 It is a very useful metric to track, we can track per-task and hence per-job 
 stats through Counters etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1201) Support for using different Kerberos keys for Namenode and datanode.

2010-07-15 Thread Kan Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kan Zhang updated HDFS-1201:


Attachment: h6632-06.patch

This is the HDFS part of HADOOP-6632. It incorporates HDFS-1020 and a bug fix 
from  HDFS-1006.

  Support for using different Kerberos keys for Namenode and datanode.
 -

 Key: HDFS-1201
 URL: https://issues.apache.org/jira/browse/HDFS-1201
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: h6632-06.patch


 This jira covers the hdfs changes to support different Kerberos keys for 
 Namenode and datanode. This corresponds to changes in HADOOP-6632

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1201) Support for using different Kerberos keys for Namenode and datanode.

2010-07-15 Thread Kan Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12888984#action_12888984
 ] 

Kan Zhang commented on HDFS-1201:
-

Ran ant test and passed. Also, manually verified the feature on a single node 
cluster.

  Support for using different Kerberos keys for Namenode and datanode.
 -

 Key: HDFS-1201
 URL: https://issues.apache.org/jira/browse/HDFS-1201
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: h6632-06.patch


 This jira covers the hdfs changes to support different Kerberos keys for 
 Namenode and datanode. This corresponds to changes in HADOOP-6632

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1303) StreamFile.doGet(..) uses an additional RPC to get file length

2010-07-15 Thread Tsz Wo (Nicholas), SZE (JIRA)
StreamFile.doGet(..) uses an additional RPC to get file length
--

 Key: HDFS-1303
 URL: https://issues.apache.org/jira/browse/HDFS-1303
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE


{code}
//StreamFile.doGet(..)
long fileLen = dfs.getFileInfo(filename).getLen();
FSInputStream in = dfs.open(filename);
{code}
In the codes above, t is unnecessary to call getFileInfo(..), which is an 
additional RPC to namenode.  The file length can be obtained from  the input 
stream after open(..).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1304) There is no unit test for HftpFileSystem.open(..)

2010-07-15 Thread Tsz Wo (Nicholas), SZE (JIRA)
There is no unit test for HftpFileSystem.open(..)
-

 Key: HDFS-1304
 URL: https://issues.apache.org/jira/browse/HDFS-1304
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: test
Reporter: Tsz Wo (Nicholas), SZE


HftpFileSystem.open(..) first opens an URL connection to namenode's 
FileDataServlet and then is redirected to datanode's StreamFile servlet.  Such 
redirection does not work in the unit test environment because the redirect URL 
uses real hostname instead of localhost.

One way to get around it is to use fault-injection in order to replace the real 
hostname with localhost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1229) DFSClient incorrectly asks for new block if primary crashes during first recoverBlock

2010-07-15 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12889001#action_12889001
 ] 

Thanh Do commented on HDFS-1229:


this does not happens in the append+320 trunk.

 DFSClient incorrectly asks for new block if primary crashes during first 
 recoverBlock
 -

 Key: HDFS-1229
 URL: https://issues.apache.org/jira/browse/HDFS-1229
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20-append
Reporter: Thanh Do

 Setup:
 
 + # available datanodes = 2
 + # disks / datanode = 1
 + # failures = 1
 + failure type = crash
 + When/where failure happens = during primary's recoverBlock
  
 Details:
 --
 Say client is appending to block X1 in 2 datanodes: dn1 and dn2.
 First it needs to make sure both dn1 and dn2  agree on the new GS of the 
 block.
 1) Client first creates DFSOutputStream by calling
  
 OutputStream result = new DFSOutputStream(src, buffersize, progress,
 lastBlock, stat, 
  conf.getInt(io.bytes.per.checksum, 512));
  
 in DFSClient.append()
  
 2) The above DFSOutputStream constructor in turn calls 
 processDataNodeError(true, true)
 (i.e, hasError = true, isAppend = true), and starts the DataStreammer
  
  processDatanodeError(true, true);  /* let's call this PDNE 1 */
  streamer.start();
  
 Note that DataStreammer.run() also calls processDatanodeError()
  while (!closed  clientRunning) {
   ...
   boolean doSleep = processDatanodeError(hasError, false); /let's call 
  this PDNE 2*/
  
 3) Now in the PDNE 1, we have following code:
  
  blockStream = null;
  blockReplyStream = null;
  ...
  while (!success  clientRunning) {
  ...
 try {
  primary = createClientDatanodeProtocolProxy(primaryNode, conf);
  newBlock = primary.recoverBlock(block, isAppend, newnodes); 
  /*exception here*/
  ...
 catch (IOException e) {
  ...
  if (recoveryErrorCount  maxRecoveryErrorCount) { 
  // this condition is false
  }
  ...
  return true;
 } // end catch
 finally {...}
 
 this.hasError = false;
 lastException = null;
 errorIndex = 0;
 success = createBlockOutputStream(nodes, clientName, true);
 }
 ...
  
 Because dn1 crashes during client call to recoverBlock, we have an exception.
 Hence, go to the catch block, in which processDatanodeError returns true
 before setting hasError to false. Also, because createBlockOutputStream() is 
 not called
 (due to an early return), blockStream is still null.
  
 4) Now PDNE 1 has finished, we come to streamer.start(), which calls PDNE 2.
 Because hasError = false, PDNE 2 returns false immediately without doing 
 anything
  if (!hasError) { return false; }
  
 5) still in the DataStreamer.run(), after returning false from PDNE 2, we 
 still have
 blockStream = null, hence the following code is executed:
 if (blockStream == null) {
nodes = nextBlockOutputStream(src);
this.setName(DataStreamer for file  + src +  block  + block);
response = new ResponseProcessor(nodes);
response.start();
 }
  
 nextBlockOutputStream which asks namenode to allocate new Block is called.
 (This is not good, because we are appending, not writing).
 Namenode gives it new Block ID and a set of datanodes, including crashed dn1.
 this leads to createOutputStream() fails because it tries to contact the dn1 
 first.
 (which has crashed). The client retries 5 times without any success,
 because every time, it asks namenode for new block! Again we see
 that the retry logic at client is weird!
 *This bug was found by our Failure Testing Service framework:
 http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
 For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
 Haryadi Gunawi (hary...@eecs.berkeley.edu)*

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.