date:20120830


[ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444748#comment-13444748
 ] 

Andy Isaacson commented on HDFS-3733:
-

bq. How about moving isWebHdfsInvocation() and getRemoteIp() from 
NameNodeRpcServer to NamenodeWebHdfsMethods? These two methods are not RPC 
related.
Fair enough, done.
bq. FSNamesystem.getRemoteIp() should be static.
Yep, thanks.
bq. The following change seems not useful. 
It made more sense in a previous version of the patch. :)  Fixed!

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access


 [ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3733:


Attachment: hdfs-3733-4.txt

Attaching latest version of patch.

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access


 [ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3733:
-

Hadoop Flags: Reviewed

+1 
Andy, thanks for the update.  The new patch looks good.  

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-08-30 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444776#comment-13444776
]

Colin Patrick McCabe commented on HDFS-3540:

bq. If I have not missed anything, there are two risks in the branch-1 Recovery
Mode feature: If there is a stray OP_INVALID byte, it could be misinterpreted
as an end-of-log and lead to silent data loss.

Recovery mode will always prompt before doing anything which could lead to data
loss. So no, stray {{OP_INVALID}} bytes will not lead to silent data loss.

Actually, looking at change 1349086, which was introduced by HDFS-3521, I see
that it broke end-of-file checking by default. Since
{{dfs.namenode.edits.toleration.length}} is -1 by default,
{{FSEditLog#checkEndOfLog}} is never invoked. However, this is not a problem
with Recovery Mode; it's a problem with change 1349086.

bq. Recovery Mode does not consider the corruption length.

Recovery Mode does consider the corruption length. The location at which the
problem occurred is printed out. This is the message Failed to parse edit log
(file name) at position position, edit log length is length... This
information is provided to allow the system administrator to make an informed
decision.

bq. Therefore, I suggest to remove Recovery Mode from branch-1 and change the
default toleration length to 0.

Recovery mode has already proven itself useful in the field in code lines
derived from branch-1. I don't see any reason to remove it.

I agree that {{dfs.namenode.edits.toleration.length}} should be 0 by default.

At the end of the day, both edit log toleration and Recovery Mode can cause
data loss. The difference is that Recovery Mode will prompt the system
administrator before hand, and edit log toleration will not. This is the
reason why I opposed edit log toleration originally, and it's the reason why I
believe it should be off by default now. Silent data loss is not a feature--
not one that we want, anyway.

Further improvement on recovery mode and edit log toleration in branch-1

Key: HDFS-3540
URL: https://issues.apache.org/jira/browse/HDFS-3540
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 1.2.0
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE

*Recovery Mode*: HDFS-3479 backported HDFS-3335 to branch-1. However, the
recovery mode feature in branch-1 is dramatically different from the recovery
mode in trunk since the edit log implementations in these two branch are
different. For example, there is UNCHECKED_REGION_LENGTH in branch-1 but not
in trunk.
*Edit Log Toleration*: HDFS-3521 added this feature to branch-1 to remedy
UNCHECKED_REGION_LENGTH and to tolerate edit log corruption.
There are overlaps between these two features. We study potential further
improvement in this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access

[
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444794#comment-13444794
]

Hadoop QA commented on HDFS-3733:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543049/hdfs-3733-4.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 4 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 javadoc. The javadoc tool appears to have generated 1 warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken
org.apache.hadoop.hdfs.TestClientReportBadBlock

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3122//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3122//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3122//console

This message is automatically generated.

Audit logs should include WebHDFS access

Key: HDFS-3733
URL: https://issues.apache.org/jira/browse/HDFS-3733
Project: Hadoop HDFS
Issue Type: Bug
Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt,
hdfs-3733-4.txt, hdfs-3733.txt

Access via WebHdfs does not result in audit log entries. It should.
{noformat}
% curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
{FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
{noformat}
and observe that no audit log entry is generated.
Interestingly, OPEN requests do not generate audit log entries when the NN
generates the redirect, but do generate audit log entries when the second
phase against the DN is executed.
{noformat}
% curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
...
HTTP/1.1 307 TEMPORARY_REDIRECT
Location:
http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
...
% curl -v
'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
...
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 12
Server: Jetty(6.1.26.cloudera.1)

hello world
{noformat}
This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}}
thereby triggering the existing {{logAuditEvent}} code.

[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-08-30 Thread Luke Lu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444890#comment-13444890
]

Luke Lu commented on HDFS-3540:
---

It seems to me that recovery mode and edit log toleration serve different
purposes. The latter is necessary for an HA setup, where admin explicitly set a
small toleration length for tail corruption. The former is useless in an HA
setup and suitable for manual recovery.

Edit log toleration is adequate as is. Recovery mode needs more patches (more
details of errors etc.) to serve the interactive recovery use case better.

Further improvement on recovery mode and edit log toleration in branch-1

[jira] [Created] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748

Arun C Murthy created HDFS-3871:
---

 Summary: Change NameNodeProxies to use HADOOP-8748
 Key: HDFS-3871
 URL: https://issues.apache.org/jira/browse/HDFS-3871
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Minor


Change NameNodeProxies to use util method introduced via HADOOP-8748.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-30 Thread Chao Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444946#comment-13444946
 ] 

Chao Shi commented on HDFS-3863:


Todd, your patch looks good to me.

How about these:
1) Collect max committed-txid from PrepareRecovery response of each JN, and 
check that logToSync.endTxId = max committed-txid. Since there may be 
unexpected race conditions, it would be better to protect it in both client and 
server side. We're paranoid anyway.
2) In Journal#checkRequest(), verify that committed-txid is non-decreasing 
before saving it.

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748


 [ 
https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-3871:


Attachment: HDFS-3781_branch1.patch

Patch for branch-1.

 Change NameNodeProxies to use HADOOP-8748
 -

 Key: HDFS-3871
 URL: https://issues.apache.org/jira/browse/HDFS-3871
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Minor
 Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch


 Change NameNodeProxies to use util method introduced via HADOOP-8748.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748


 [ 
https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated HDFS-3871:


Attachment: HDFS-3781.patch

Patch for trunk.

 Change NameNodeProxies to use HADOOP-8748
 -

 Key: HDFS-3871
 URL: https://issues.apache.org/jira/browse/HDFS-3871
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Minor
 Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch


 Change NameNodeProxies to use util method introduced via HADOOP-8748.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3870) QJM: add metrics to JournalNode

2012-08-30 Thread Chao Shi (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444971#comment-13444971
 ] 

Chao Shi commented on HDFS-3870:


One more: How often a JN is lagging (by counting the number of log syncs whose 
last-commited-txid = firstTxnId). This indicates the JN is running under poor 
condition.

 QJM: add metrics to JournalNode
 ---

 Key: HDFS-3870
 URL: https://issues.apache.org/jira/browse/HDFS-3870
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 The JournalNode should expose some basic metrics through the usual interface. 
 In particular:
 - the writer epoch, accepted epoch,
 - the last written transaction ID and last committed txid (which may be newer 
 in case that it's in the process of catching up)
 - latency information for how long the syncs are taking
 Please feel free to suggest others that come to mind.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

[
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444975#comment-13444975
]

Tsz Wo (Nicholas), SZE commented on HDFS-3540:
--

{quote}
Recovery mode will always prompt before doing anything which could lead to data
loss. So no, stray OP_INVALID bytes will not lead to silent data loss.

Actually, looking at change 1349086, which was introduced by HDFS-3521, I see
that it broke end-of-file checking by default. Since
dfs.namenode.edits.toleration.length is -1 by default, FSEditLog#checkEndOfLog
is never invoked. However, this is not a problem with Recovery Mode; it's a
problem with change 1349086.
{quote}
Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode. If a
stray OP_INVALID byte is within the unchecked region, it will cause silent data
loss.

{quote}
Recovery Mode does consider the corruption length. The location at which the
problem occurred is printed out. This is the message Failed to parse edit log
(file name) at position position, edit log length is length... This
information is provided to allow the system administrator to make an informed
decision.
{quote}
You still do not know the corruption length since there may be padding at the
end. System admins won't know the padding length and so they won't be able to
know the corruption length.

Further improvement on recovery mode and edit log toleration in branch-1

[jira] [Updated] (HDFS-2695) ReadLock should be enough for FsNameSystem#renewLease.

2012-08-30 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2695:
--

Attachment: HDFS-2695.patch

 ReadLock should be enough for FsNameSystem#renewLease.
 --

 Key: HDFS-2695
 URL: https://issues.apache.org/jira/browse/HDFS-2695
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-2695.patch


 When checking the issue HDFS-1241, found this point.
 Since renewLease is not updating any nameSystem related data, can we make 
 this lock to read lock?
 am i missing some thing here? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2695) ReadLock should be enough for FsNameSystem#renewLease.

2012-08-30 Thread Uma Maheswara Rao G (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-2695:
--

Target Version/s: 3.0.0
  Status: Patch Available  (was: Open)

 ReadLock should be enough for FsNameSystem#renewLease.
 --

 Key: HDFS-2695
 URL: https://issues.apache.org/jira/browse/HDFS-2695
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-2695.patch


 When checking the issue HDFS-1241, found this point.
 Since renewLease is not updating any nameSystem related data, can we make 
 this lock to read lock?
 am i missing some thing here? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748


 [ 
https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3871:
-

Status: Patch Available  (was: Open)

 Change NameNodeProxies to use HADOOP-8748
 -

 Key: HDFS-3871
 URL: https://issues.apache.org/jira/browse/HDFS-3871
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Minor
 Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch


 Change NameNodeProxies to use util method introduced via HADOOP-8748.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748


 [ 
https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-3871:
-

 Component/s: hdfs client
Hadoop Flags: Reviewed

+1 for both the trunk and the branch-1 patches.

 Change NameNodeProxies to use HADOOP-8748
 -

 Key: HDFS-3871
 URL: https://issues.apache.org/jira/browse/HDFS-3871
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Minor
 Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch


 Change NameNodeProxies to use util method introduced via HADOOP-8748.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3859) QJM: implement md5sum verification

2012-08-30 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445061#comment-13445061
 ] 

Steve Loughran commented on HDFS-3859:
--

@Todd : this is why a CRC check would be simpler. Faster and less controversial.

 QJM: implement md5sum verification
 --

 Key: HDFS-3859
 URL: https://issues.apache.org/jira/browse/HDFS-3859
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3859-sha1.txt


 When the QJM passes journal segments between nodes, it should use an md5sum 
 field to make sure the data doesn't get corrupted during transit. This also 
 serves as an extra safe-guard to make sure that the data is consistent across 
 all nodes when finalizing a segment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-1490) TransferFSImage should timeout

2012-08-30 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445065#comment-13445065
 ] 

Steve Loughran commented on HDFS-1490:
--

# typo in  DFS_IMAGE_TRANFER_TIMEOUT_KEY
# timeout field should be private, not package scoped

This really needs a functional test that does a kill -STOP `cat 
/var/run/whatever.pid` and then verifies that a hung process is picked up. The 
tests I've been doing for HA on the 1.x branch can trigger things like this; we 
should consider integrating the test framework w/ hadoop, either as an upstream 
dependency or in bigtop, with the functional HA test suite there.

 TransferFSImage should timeout
 --

 Key: HDFS-1490
 URL: https://issues.apache.org/jira/browse/HDFS-1490
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Dmytro Molkov
Assignee: Dmytro Molkov
Priority: Minor
 Attachments: HDFS-1490.patch, HDFS-1490.patch, HDFS-1490.patch


 Sometimes when primary crashes during image transfer secondary namenode would 
 hang trying to read the image from HTTP connection forever.
 It would be great to set timeouts on the connection so if something like that 
 happens there is no need to restart the secondary itself.
 In our case restarting components is handled by the set of scripts and since 
 the Secondary as the process is running it would just stay hung until we get 
 an alarm saying the checkpointing doesn't happen.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2695) ReadLock should be enough for FsNameSystem#renewLease.

[
https://issues.apache.org/jira/browse/HDFS-2695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445070#comment-13445070
]

Hadoop QA commented on HDFS-2695:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543099/HDFS-2695.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3123//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3123//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3123//console

This message is automatically generated.

ReadLock should be enough for FsNameSystem#renewLease.
--

Key: HDFS-2695
URL: https://issues.apache.org/jira/browse/HDFS-2695
Project: Hadoop HDFS
Issue Type: Bug
Components: name-node
Affects Versions: 0.24.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
Attachments: HDFS-2695.patch

When checking the issue HDFS-1241, found this point.
Since renewLease is not updating any nameSystem related data, can we make
this lock to read lock?
am i missing some thing here?

[jira] [Created] (HDFS-3872) Store block ID in block metadata header

Todd Lipcon created HDFS-3872:
-

 Summary: Store block ID in block metadata header
 Key: HDFS-3872
 URL: https://issues.apache.org/jira/browse/HDFS-3872
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 3.0.0
Reporter: Todd Lipcon


We recently had an interesting local filesystem corruption in one cluster, 
which caused a block and its associated metadata file to get replaced with a 
data/meta pair from an entirely different replica. Because the block and its 
metadata were still self-consistent, the block scanner never noticed, and we 
ended up with a system where one replica differed from the others.

One simple solution to guard against this type of corruption in the future 
would be to put the block ID itself in the meta header, and have the block 
scanner verify it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748

[
https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445123#comment-13445123
]

Hadoop QA commented on HDFS-3871:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543094/HDFS-3781.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken
org.apache.hadoop.hdfs.TestDatanodeBlockScanner
org.apache.hadoop.hdfs.TestPersistBlocks

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3124//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3124//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3124//console

This message is automatically generated.

Change NameNodeProxies to use HADOOP-8748
-

Key: HDFS-3871
URL: https://issues.apache.org/jira/browse/HDFS-3871
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs client
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Minor
Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch

Change NameNodeProxies to use util method introduced via HADOOP-8748.

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access


[ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445155#comment-13445155
 ] 

Eli Collins commented on HDFS-3733:
---

Andy, looking good!

- In FSN#getFileInfo why catch UnresolvedLinkException and StandbyException, 
just AccessControlException is sufficient right?
- Nit, I'd remove the System.out.printlns for debugging in the tests?
- Per jenkins there's a javadoc warning:
{noformat}
[WARNING] 
/home/eli/src/hadoop2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/web/resources/NamenodeWebHdfsMethods.java:132:
 warning - Tag @link: reference not found: Server#isRpcInvocation()
{noformat}

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3873) Hftp assumes security is disabled if token fetch fails

Daryn Sharp created HDFS-3873:
-

 Summary: Hftp assumes security is disabled if token fetch fails
 Key: HDFS-3873
 URL: https://issues.apache.org/jira/browse/HDFS-3873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp


Hftp ignores all exceptions generated while trying to get a token, based on the 
assumption that it means security is disabled.  Debugging problems is 
excruciatingly difficult when security is enabled but something goes wrong.  
Job submissions succeed, but tasks fail because the NN rejects the user as 
unauthenticated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-08-30 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445159#comment-13445159
]

Colin Patrick McCabe commented on HDFS-3540:

bq. It seems to me that recovery mode and edit log toleration serve different
purposes. The latter is necessary for an HA setup, where admin explicitly set a
small toleration length for tail corruption. The former is useless in an HA
setup and suitable for manual recovery.

Edit log toleration is not necessary for an HA setup. In fact, it is
impossible to configure edit log toleration together with an HA setup,
because edit log toleration is only available in branch-1 (but not later
branches), and HA is only available in branch-2 and later.

bq. Edit log toleration is adequate as is. Recovery mode needs more patches
(more details of errors etc.) to serve the interactive recovery use case better.

Patches are welcome. Check out the design doc for HDFS-3004, which gives an
overview:
https://issues.apache.org/jira/secure/attachment/12542798/recovery-mode.pdf

Further improvement on recovery mode and edit log toleration in branch-1

[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning


 [ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3837:
--

Attachment: hdfs-3837.txt

No problem, can always do cleanup in another change. Updated patch just does 
adds an exclude.

Thanks for the reviews Suresh.

 Fix DataNode.recoverBlock findbugs warning
 --

 Key: HDFS-3837
 URL: https://issues.apache.org/jira/browse/HDFS-3837
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, 
 hdfs-3837.txt, hdfs-3837.txt


 HDFS-2686 introduced the following findbugs warning:
 {noformat}
 Call to equals() comparing different types in 
 org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
 {noformat}
 Both are using DatanodeID#equals but it's a different method because 
 DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-2261) AOP unit tests are not getting compiled or run


 [ 
https://issues.apache.org/jira/browse/HDFS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2261:
--

  Component/s: test
  Description: The tests in src/test/aop are not getting compiled or 
run.  (was: 
-compile-fault-inject:
 [echo] Start weaving aspects in place
 [iajc] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/java/org/apache/hadoop/hdfs/HftpFileSystem.java:269
 [error] The method encodeQueryValue(String) is undefined for the type 
ServletUtil
 [iajc] ServletUtil.encodeQueryValue(ugi.getShortUserName()));
..

  [iajc] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/system/aop/org/apache/hadoop/hdfs/server/namenode/NameNodeAspect.aj:50
 [warning] advice defined in 
org.apache.hadoop.hdfs.server.namenode.NameNodeAspect has not been applied 
[Xlint:adviceDidNotMatch]
 [iajc] 
 [iajc] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/system/aop/org/apache/hadoop/hdfs/server/datanode/DataNodeAspect.aj:43
 [warning] advice defined in 
org.apache.hadoop.hdfs.server.datanode.DataNodeAspect has not been applied 
[Xlint:adviceDidNotMatch]
 [iajc] 
 [iajc] 
 [iajc] 18 errors, 4 warnings

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/aop/build/aop.xml:222:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/aop/build/aop.xml:203:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk-Commit/trunk/src/test/aop/build/aop.xml:90:
 compile errors: 18)
 Priority: Minor  (was: Major)
Affects Version/s: 2.0.0-alpha
  Summary: AOP unit tests are not getting compiled or run   (was: 
hdfs trunk is broken with -compile-fault-inject ant target)

The system tests were removed in HADOOP-8450, re-purposing this jira to get the 
aop tests compiling and running, looks like they're completely unhooked from 
the mvn build.

 AOP unit tests are not getting compiled or run 
 ---

 Key: HDFS-2261
 URL: https://issues.apache.org/jira/browse/HDFS-2261
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
 Environment: 
 https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/834/console
 -compile-fault-inject ant target 
Reporter: Giridharan Kesavan
Priority: Minor

 The tests in src/test/aop are not getting compiled or run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3540) Further improvement on recovery mode and edit log toleration in branch-1

2012-08-30 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445179#comment-13445179
]

Colin Patrick McCabe commented on HDFS-3540:

bq. Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode. If a
stray OP_INVALID byte is within the unchecked region, it will cause silent data
loss.

Nicholas, you didn't address the main point of my comment, which is that after
HDFS-3521, if a stray OP_INVALID byte is found anywhere in the log, it will
cause silent data loss-- unless the sysadmin configures
{{dfs.namenode.edits.toleration.length}} to something other than the default.
Based on your earlier comments, I think we both agree that this should not be
the default. Let's fix this (independently of everything else were discussing
here.)

bq. You still do not know the corruption length since there may be padding at
the end. System admins won't know the padding length and so they won't be able
to know the corruption length.

The padding length is going to be a megabyte at most. Since the edit log files
are fairly large, you should have a good idea of what percentage through the
file you are. If you have an idea for improving the error messages of
{{FSEditLog.java}}, perhaps you should file a JIRA for that? It's not directly
relevant here, though, since all methods of manual recovery will face the same
issues.

bq. Before HDFS-3521, there is a UNCHECKED_REGION_LENGTH in Recovery Mode...

I want to emphasize one thing here: {{UNCHECKED_REGION_LENGTH}} is *not* part
of Recovery Mode. If you look at the history {{FSEditLog.java}}, you'll see
that change 1325075 (HDFS-3055) introduced Recovery mode, but not
{{UNCHECKED_REGION_LENGTH}}. This was introduced in HDFS-3479 (the backport of
HDFS_3335 to branch-1). Please see this comment, introduced by HDFS-3479:

{code}
+/** The end of the edit log should contain only 0x00 or 0xff bytes.
+ * If it contains other bytes, the log itself may be corrupt.
+ * It is important to check this; if we don't, a stray OP_INVALID byte
+ * could make us stop reading the edit log halfway through, and we'd never
+ * know that we had lost data.
+ *
+ * We don't check the very last part of the edit log, in case the
+ * NameNode crashed while writing to the edit log.
+ */
{code}

I encourage anyone interested in this to check out the history of
{{FSEditLog.java}}. It's a very good guide and it will make understanding this
discussion much easier.

As I said before, I still think we should get rid of the unchecked region
altogether. But this has nothing to do with Recovery Mode, it has to do with
HDFS-3479.

Further improvement on recovery mode and edit log toleration in branch-1

[jira] [Updated] (HDFS-3869) QJM: expose non-file journal manager details in web UI


 [ 
https://issues.apache.org/jira/browse/HDFS-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3869:
--

Attachment: lagging-jn.png
dir-failed.png
open-for-write.png
open-for-read.png

Attached screenshots:
1) open-for-read.png: NN is in standby state, reading from shared edits
2) open-for-write.png: NN in active state, writing to shared edits and local 
storage as well
3) dir-failed.png: I chmodded one of the local directories and triggered a 
roll, so it got marked as failed
4) lagging-jn.png: I suspended one of the JNs so it fell behind the others, 
while I did a bunch of transactions from a client.

 QJM: expose non-file journal manager details in web UI
 --

 Key: HDFS-3869
 URL: https://issues.apache.org/jira/browse/HDFS-3869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: dir-failed.png, hdfs-3869.txt, lagging-jn.png, 
 open-for-read.png, open-for-write.png


 Currently, the NN web UI only contains NN storage directories on local disk. 
 It should also include details about any non-file JournalManagers in use.
 This JIRA targets the QJM branch, but will be useful for BKJM as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write


 [ 
https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-3833:
-

Attachment: HDFS-3833.patch

 TestDFSShell fails on Windows due to file concurrent read write
 ---

 Key: HDFS-3833
 URL: https://issues.apache.org/jira/browse/HDFS-3833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 1-win
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch


 TestDFSShell sometimes fails due to the race between the write issued by the 
 test and blockscanner. Example stack trace:
 {noformat}
 Error Message
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
 Stacktrace
 java.io.FileNotFoundException: 
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:194)
   at java.io.FileOutputStream.init(FileOutputStream.java:145)
   at java.io.PrintWriter.init(PrintWriter.java:218)
   at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133)
   at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write


 [ 
https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-3833:
-

Status: Patch Available  (was: Open)

 TestDFSShell fails on Windows due to file concurrent read write
 ---

 Key: HDFS-3833
 URL: https://issues.apache.org/jira/browse/HDFS-3833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 1-win
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch


 TestDFSShell sometimes fails due to the race between the write issued by the 
 test and blockscanner. Example stack trace:
 {noformat}
 Error Message
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
 Stacktrace
 java.io.FileNotFoundException: 
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:194)
   at java.io.FileOutputStream.init(FileOutputStream.java:145)
   at java.io.PrintWriter.init(PrintWriter.java:218)
   at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133)
   at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write


 [ 
https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-3833:
-

Affects Version/s: 3.0.0

 TestDFSShell fails on Windows due to file concurrent read write
 ---

 Key: HDFS-3833
 URL: https://issues.apache.org/jira/browse/HDFS-3833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 1-win
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch


 TestDFSShell sometimes fails due to the race between the write issued by the 
 test and blockscanner. Example stack trace:
 {noformat}
 Error Message
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
 Stacktrace
 java.io.FileNotFoundException: 
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:194)
   at java.io.FileOutputStream.init(FileOutputStream.java:145)
   at java.io.PrintWriter.init(PrintWriter.java:218)
   at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133)
   at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning

[
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445248#comment-13445248
]

Hadoop QA commented on HDFS-3837:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543132/hdfs-3837.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3125//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3125//console

This message is automatically generated.

Fix DataNode.recoverBlock findbugs warning
--

Key: HDFS-3837
URL: https://issues.apache.org/jira/browse/HDFS-3837
Project: Hadoop HDFS
Issue Type: Bug
Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt,
hdfs-3837.txt, hdfs-3837.txt

HDFS-2686 introduced the following findbugs warning:
{noformat}
Call to equals() comparing different types in
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
{noformat}
Both are using DatanodeID#equals but it's a different method because
DNR#equals overrides equals for some reason (doesn't change behavior).

[jira] [Updated] (HDFS-3861) Deadlock in DFSClient


 [ 
https://issues.apache.org/jira/browse/HDFS-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3861:
--

  Resolution: Fixed
   Fix Version/s: (was: 0.23.4)
  0.23.3
Target Version/s: 0.23.3, 3.0.0, 2.2.0-alpha
  Status: Resolved  (was: Patch Available)

Thanks Kihwal!

 Deadlock in DFSClient
 -

 Key: HDFS-3861
 URL: https://issues.apache.org/jira/browse/HDFS-3861
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Kihwal Lee
Assignee: Kihwal Lee
Priority: Blocker
 Fix For: 0.23.3, 3.0.0, 2.2.0-alpha

 Attachments: hdfs-3861.patch.txt


 The deadlock is between DFSOutputStream#close() and DFSClient#close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access


[ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445259#comment-13445259
 ] 

Andy Isaacson commented on HDFS-3733:
-

bq. In FSN#getFileInfo why catch UnresolvedLinkException and StandbyException, 
just AccessControlException is sufficient right?
I have to {{logAuditEvent(false}} under any exception.  Todd suggested doing 
this instead:
{code}
+} catch (Throwable e) {
   if (auditLog.isInfoEnabled()  isExternalInvocation()) {
 logAuditEvent(false, UserGroupInformation.getCurrentUser(),
   getRemoteIp(),
   getfileinfo, src, null, null);
   }
-  throw e;
-} catch (StandbyException e) {
-  if (auditLog.isInfoEnabled()  isExternalInvocation()) {
-logAuditEvent(false, UserGroupInformation.getCurrentUser(),
-  getRemoteIp(),
-  getfileinfo, src, null, null);
-  }
-  throw e;
+  Throwables.propagateIfPossible(e, AccessControlException.class);
+  Throwables.propagateIfPossible(e, UnresolvedLinkException.class);
+  Throwables.propagateIfPossible(e, StandbyException.class);
+  Throwables.propagateIfPossible(e, IOException.class);
+  throw new RuntimeException(unexpected, e);
{code}
bq. Nit, I'd remove the System.out.printlns for debugging in the tests?
Where's the upside to removing them? It adds a few KB at most to the MBs of 
test output, and I always end up adding the prinlns when trying to grok 
failures.

But, whatever.  Removed.

bq. javadoc warning

Turns out you have to import anything you want to {{@link}}. Fixed.

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access


 [ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3733:


Attachment: hdfs-3733-6.txt

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3869) QJM: expose non-file journal manager details in web UI


 [ 
https://issues.apache.org/jira/browse/HDFS-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3869:
--

Attachment: hdfs-3869.txt

Attached patch has a little cleanup (formatting and javadoc) and also adds the 
current txid to the UI. Verified it on the cluster again.

 QJM: expose non-file journal manager details in web UI
 --

 Key: HDFS-3869
 URL: https://issues.apache.org/jira/browse/HDFS-3869
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: dir-failed.png, hdfs-3869.txt, hdfs-3869.txt, 
 lagging-jn.png, open-for-read.png, open-for-write.png


 Currently, the NN web UI only contains NN storage directories on local disk. 
 It should also include details about any non-file JournalManagers in use.
 This JIRA targets the QJM branch, but will be useful for BKJM as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3863) QJM: track last committed txid


[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445301#comment-13445301
 ] 

Todd Lipcon commented on HDFS-3863:
---

Hi Chao. I tried to add the sanity checks you suggested, and ran into a little 
difficult with the first one. It caused a test failure in the following 
scenario:

JN1 has fallen behind, has: edits_inprogress with txid 44-45
JN2 and JN3 both finished writing this segment (44-47), had fully written 
48-51, and had started a log segment 42, without yet writing any transactions 
to it.

In the current code, when prepareRecovery() invokes scanStorage(), this caused 
JN2 and JN3 to return an empty {{lastSegmentTxId}}. So, the client code went 
into recovery of the log segment with txid 44. It correctly recovered to 44-47, 
but then the assertion failed because the other loggers had seen txid 51 
committed.

So, I had to fix {{scanStorage}} a bit so that it would return the correct most 
recent segment txid, even in this scenario.

I'll upload the improved patch soon after running some more test iterations. 
Thanks for the good idea, as it did catch a slight bug here!

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write

[
https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445315#comment-13445315
]

Hadoop QA commented on HDFS-3833:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543138/HDFS-3833.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
org.apache.hadoop.hdfs.server.datanode.TestBPOfferService
org.apache.hadoop.hdfs.TestPersistBlocks

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3126//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3126//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3126//console

This message is automatically generated.

TestDFSShell fails on Windows due to file concurrent read write
---

Key: HDFS-3833
URL: https://issues.apache.org/jira/browse/HDFS-3833
Project: Hadoop HDFS
Issue Type: Bug
Components: test
Affects Versions: 3.0.0, 1-win
Reporter: Brandon Li
Assignee: Brandon Li
Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch

TestDFSShell sometimes fails due to the race between the write issued by the
test and blockscanner. Example stack trace:
{noformat}
Error Message
c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The
requested operation cannot be performed on a file with a user-mapped section
open)
Stacktrace
java.io.FileNotFoundException:
c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The
requested operation cannot be performed on a file with a user-mapped section
open)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.init(FileOutputStream.java:194)
at java.io.FileOutputStream.init(FileOutputStream.java:145)
at java.io.PrintWriter.init(PrintWriter.java:218)
at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133)
at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231)
{noformat}

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access

[
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445327#comment-13445327
]

Hadoop QA commented on HDFS-3733:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543151/hdfs-3733-6.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 4 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

-1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestDatanodeBlockScanner
org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3127//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HDFS-Build/3127//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3127//console

This message is automatically generated.

Audit logs should include WebHDFS access

hello world
{noformat}
This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}}
thereby triggering the existing {{logAuditEvent}} code.

[jira] [Commented] (HDFS-3833) TestDFSShell fails on Windows due to file concurrent read write


[ 
https://issues.apache.org/jira/browse/HDFS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445330#comment-13445330
 ] 

Brandon Li commented on HDFS-3833:
--

The failed tests are not related with this change.

 TestDFSShell fails on Windows due to file concurrent read write
 ---

 Key: HDFS-3833
 URL: https://issues.apache.org/jira/browse/HDFS-3833
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 1-win
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-3833.branch-1-win.patch, HDFS-3833.patch


 TestDFSShell sometimes fails due to the race between the write issued by the 
 test and blockscanner. Example stack trace:
 {noformat}
 Error Message
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
 Stacktrace
 java.io.FileNotFoundException: 
 c:\A\HM\build\test\data\dfs\data\data1\current\blk_-7735708801221347790 (The 
 requested operation cannot be performed on a file with a user-mapped section 
 open)
   at java.io.FileOutputStream.open(Native Method)
   at java.io.FileOutputStream.init(FileOutputStream.java:194)
   at java.io.FileOutputStream.init(FileOutputStream.java:145)
   at java.io.PrintWriter.init(PrintWriter.java:218)
   at org.apache.hadoop.hdfs.TestDFSShell.corrupt(TestDFSShell.java:1133)
   at org.apache.hadoop.hdfs.TestDFSShell.testGet(TestDFSShell.java:1231)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3874) Exception when client reports bad checksum to NN

Todd Lipcon created HDFS-3874:
-

 Summary: Exception when client reports bad checksum to NN
 Key: HDFS-3874
 URL: https://issues.apache.org/jira/browse/HDFS-3874
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, name-node
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon


We see the following exception in our logs on a cluster:

{code}
2012-08-27 16:34:30,400 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
NameNode.reportBadBlocks
2012-08-27 16:34:30,400 ERROR org.apache.hadoop.security.UserGroupInformation: 
PriviledgedActionException as:hdfs (auth:SIMPLE) cause:java.io.IOException: 
Cannot mark blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
as corrupt because datanode :0 does not exist
2012-08-27 16:34:30,400 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
46 on 8020, call 
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.reportBadBlocks from 
172.29.97.219:43805: error: java.io.IOException: Cannot mark 
blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
as corrupt because datanode :0 does not exist
java.io.IOException: Cannot mark 
blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
as corrupt because datanode :0 does not exist
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.markBlockAsCorrupt(BlockManager.java:1001)
at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.findAndMarkBlockAsCorrupt(BlockManager.java:994)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.reportBadBlocks(FSNamesystem.java:4736)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.reportBadBlocks(NameNodeRpcServer.java:537)
at 
org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reportBadBlocks(DatanodeProtocolServerSideTranslatorPB.java:242)
at 
org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20032)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3837) Fix DataNode.recoverBlock findbugs warning


 [ 
https://issues.apache.org/jira/browse/HDFS-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3837:
--

  Resolution: Fixed
   Fix Version/s: 2.2.0-alpha
Target Version/s:   (was: 2.2.0-alpha)
  Status: Resolved  (was: Patch Available)

I've committed this and merged to branch-2.

 Fix DataNode.recoverBlock findbugs warning
 --

 Key: HDFS-3837
 URL: https://issues.apache.org/jira/browse/HDFS-3837
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
 Fix For: 2.2.0-alpha

 Attachments: hdfs-3837.txt, hdfs-3837.txt, hdfs-3837.txt, 
 hdfs-3837.txt, hdfs-3837.txt


 HDFS-2686 introduced the following findbugs warning:
 {noformat}
 Call to equals() comparing different types in 
 org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(BlockRecoveryCommand$RecoveringBlock)
 {noformat}
 Both are using DatanodeID#equals but it's a different method because 
 DNR#equals overrides equals for some reason (doesn't change behavior).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3874) Exception when client reports bad checksum to NN


[ 
https://issues.apache.org/jira/browse/HDFS-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445350#comment-13445350
 ] 

Todd Lipcon commented on HDFS-3874:
---

The bug seems to be that the datanode doesn't report the right remote DN when 
it detects a checksum error when receiving a block. Here are the DN side logs:

{code}
2012-08-27 16:34:30,396 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Checksum error in block 
BP-1507505631-172.29.97.196-1337120439433:blk_8285012733733669474_140475196 
from /172.29.97.219:52544
org.apache.hadoop.fs.ChecksumException: Checksum error: 
DFSClient_NONMAPREDUCE_334070927_1 at 44032 exp: -983390667 got: 557443094
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:335)
at 
org.apache.hadoop.util.DataChecksum.verifyChunkedSums(DataChecksum.java:266)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.verifyChunks(BlockReceiver.java:377)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:496)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:635)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:506)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
at java.lang.Thread.run(Thread.java:662)
2012-08-27 16:34:30,396 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
report corrupt block 
BP-1507505631-172.29.97.196-1337120439433:blk_8285012733733669474_140475196 
from datanode :0 to namenode
{code}

 Exception when client reports bad checksum to NN
 

 Key: HDFS-3874
 URL: https://issues.apache.org/jira/browse/HDFS-3874
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client, name-node
Affects Versions: 2.0.0-alpha
Reporter: Todd Lipcon

 We see the following exception in our logs on a cluster:
 {code}
 2012-08-27 16:34:30,400 INFO org.apache.hadoop.hdfs.StateChange: *DIR* 
 NameNode.reportBadBlocks
 2012-08-27 16:34:30,400 ERROR 
 org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException 
 as:hdfs (auth:SIMPLE) cause:java.io.IOException: Cannot mark 
 blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
 as corrupt because datanode :0 does not exist
 2012-08-27 16:34:30,400 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
 46 on 8020, call 
 org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.reportBadBlocks from 
 172.29.97.219:43805: error: java.io.IOException: Cannot mark 
 blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
 as corrupt because datanode :0 does not exist
 java.io.IOException: Cannot mark 
 blk_8285012733733669474_140475196{blockUCState=UNDER_CONSTRUCTION, 
 primaryNodeIndex=-1, 
 replicas=[ReplicaUnderConstruction[172.29.97.219:50010|RBW]]}(same as stored) 
 as corrupt because datanode :0 does not exist
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.markBlockAsCorrupt(BlockManager.java:1001)
 at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.findAndMarkBlockAsCorrupt(BlockManager.java:994)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.reportBadBlocks(FSNamesystem.java:4736)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.reportBadBlocks(NameNodeRpcServer.java:537)
 at 
 org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.reportBadBlocks(DatanodeProtocolServerSideTranslatorPB.java:242)
 at 
 org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:20032)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails


 [ 
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3873:
--

Status: Patch Available  (was: Open)

 Hftp assumes security is disabled if token fetch fails
 --

 Key: HDFS-3873
 URL: https://issues.apache.org/jira/browse/HDFS-3873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-3873.patch


 Hftp ignores all exceptions generated while trying to get a token, based on 
 the assumption that it means security is disabled.  Debugging problems is 
 excruciatingly difficult when security is enabled but something goes wrong.  
 Job submissions succeed, but tasks fail because the NN rejects the user as 
 unauthenticated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails

[
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daryn Sharp updated HDFS-3873:
--

Attachment: HDFS-3873.patch

Only considers a connection refused exception as security disabled since an
insecure cluster does not listen on the secure port. Note this prevents jobs
from launching w/o tokens.

I spent the better part of the day debugging why an oozie launcher task was
trying to get a hftp token. Turns out AES was specified in krb5.conf which
caused a SSL exception that was silently swallowed during job submission. The
job launched and the tasks failed with user not authenticated messages from the
NN. This patch evolved from the debugging effort.

Hftp assumes security is disabled if token fetch fails
--

Key: HDFS-3873
URL: https://issues.apache.org/jira/browse/HDFS-3873
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
Attachments: HDFS-3873.patch

Hftp ignores all exceptions generated while trying to get a token, based on
the assumption that it means security is disabled. Debugging problems is
excruciatingly difficult when security is enabled but something goes wrong.
Job submissions succeed, but tasks fail because the NN rejects the user as
unauthenticated.

[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails


 [ 
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3873:
--

Attachment: HDFS-3873.branch-23.patch

Update test to expect different exception from 23.

 Hftp assumes security is disabled if token fetch fails
 --

 Key: HDFS-3873
 URL: https://issues.apache.org/jira/browse/HDFS-3873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch


 Hftp ignores all exceptions generated while trying to get a token, based on 
 the assumption that it means security is disabled.  Debugging problems is 
 excruciatingly difficult when security is enabled but something goes wrong.  
 Job submissions succeed, but tasks fail because the NN rejects the user as 
 unauthenticated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails


[ 
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445372#comment-13445372
 ] 

Hadoop QA commented on HDFS-3873:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12543176/HDFS-3873.branch-23.patch
  against trunk revision .

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3129//console

This message is automatically generated.

 Hftp assumes security is disabled if token fetch fails
 --

 Key: HDFS-3873
 URL: https://issues.apache.org/jira/browse/HDFS-3873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch


 Hftp ignores all exceptions generated while trying to get a token, based on 
 the assumption that it means security is disabled.  Debugging problems is 
 excruciatingly difficult when security is enabled but something goes wrong.  
 Job submissions succeed, but tasks fail because the NN rejects the user as 
 unauthenticated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3871) Change NameNodeProxies to use HADOOP-8748


[ 
https://issues.apache.org/jira/browse/HDFS-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445378#comment-13445378
 ] 

Arun C Murthy commented on HDFS-3871:
-

The test failures and findbugs warnings are not related.

I didn't add a new test since there is an existing test which covers this 
already.

 Change NameNodeProxies to use HADOOP-8748
 -

 Key: HDFS-3871
 URL: https://issues.apache.org/jira/browse/HDFS-3871
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Minor
 Attachments: HDFS-3781_branch1.patch, HDFS-3781.patch


 Change NameNodeProxies to use util method introduced via HADOOP-8748.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3875) Issue handling checksum errors in write pipeline

Todd Lipcon created HDFS-3875:
-

 Summary: Issue handling checksum errors in write pipeline
 Key: HDFS-3875
 URL: https://issues.apache.org/jira/browse/HDFS-3875
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 2.2.0-alpha
Reporter: Todd Lipcon


We saw this issue with one block in a large test cluster. The client is storing 
the data with replication level 2, and we saw the following:
- the second node in the pipeline detects a checksum error on the data it 
received from the first node. We don't know if the client sent a bad checksum, 
or if it got corrupted between node 1 and node 2 in the pipeline.
- this caused the second node to get kicked out of the pipeline, since it threw 
an exception. The pipeline started up again with only one replica (the first 
node in the pipeline)
- this replica was later determined to be corrupt by the block scanner, and 
unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline


[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445385#comment-13445385
 ] 

Todd Lipcon commented on HDFS-3875:
---

Here's the recovery from the perspective of the NN:

{code}
2012-08-28 19:16:33,532 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(block=BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581786,
 newGenerationStamp=140581806, newLength=44281856, 
newNodes=[172.29.97.219:50010], clientNam
2012-08-28 19:16:33,597 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
updatePipeline(BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581786)
 successfully to 
BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581806
{code}

Here's the recovery from the perspective of the middle node:

{code}
2012-08-28 19:16:33,531 INFO 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recovering 
replica ReplicaBeingWritten, blk_2632740624757457378_140581786, RBW
  getNumBytes() = 44867072
  getBytesOnDisk()  = 44867072
  getVisibleLength()= 44281856
  getVolume()   = /data/2/dfs/dn/current
  getBlockFile()= 
/data/2/dfs/dn/current/BP-1507505631-172.29.97.196-1337120439433/current/rbw/blk_2632740624757457378
  bytesAcked=44281856
  bytesOnDisk=44867072
{code}

and then the later checksum exception from the block scanner:

{code}
2012-08-28 19:23:59,275 WARN 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Second 
Verification failed for 
BP-1507505631-172.29.97.196-1337120439433:blk_2632740624757457378_140581806
org.apache.hadoop.fs.ChecksumException: Checksum failed at 44217344
{code}

Interestingly, the checksum exception noticed by the block scanner is less than 
the acked length seen at recovery time.

On the node in question, I see a fair number of weird errors (page allocation 
failures etc) in the kernel log. So my guess is that the machine is borked and 
was silently corrupting memory in the middle of the pipeline. Hence, because 
the recovery kicked out the wrong node, it ended up persisting a corrupt 
version of the block instead of a good one.

 Issue handling checksum errors in write pipeline
 

 Key: HDFS-3875
 URL: https://issues.apache.org/jira/browse/HDFS-3875
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 2.2.0-alpha
Reporter: Todd Lipcon

 We saw this issue with one block in a large test cluster. The client is 
 storing the data with replication level 2, and we saw the following:
 - the second node in the pipeline detects a checksum error on the data it 
 received from the first node. We don't know if the client sent a bad 
 checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
 - this caused the second node to get kicked out of the pipeline, since it 
 threw an exception. The pipeline started up again with only one replica (the 
 first node in the pipeline)
 - this replica was later determined to be corrupt by the block scanner, and 
 unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3875) Issue handling checksum errors in write pipeline


[ 
https://issues.apache.org/jira/browse/HDFS-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445411#comment-13445411
 ] 

Todd Lipcon commented on HDFS-3875:
---

Just to brainstorm, here's one potential solution:
- if the tail node in the pipeline detects a checksum error, then it returns a 
special error code back up the pipeline indicating this (rather than just 
disconnecting)
- if a non-tail node receives this error code, then it immediately scans its 
own block on disk (from the beginning up through the last acked length). If it 
detects a corruption on its local copy, then it should assume that _it_ is the 
faulty one, rather than the downstream neighbor. If it detects no corruption, 
then the faulty node is either the downstream mirror or the network link 
between the two, and the current behavior is reasonable.

Depending on the above, it would report back the errorIndex appropriately to 
the client, so that the correct faulty node is removed from the pipeline.

 Issue handling checksum errors in write pipeline
 

 Key: HDFS-3875
 URL: https://issues.apache.org/jira/browse/HDFS-3875
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node, hdfs client
Affects Versions: 2.2.0-alpha
Reporter: Todd Lipcon

 We saw this issue with one block in a large test cluster. The client is 
 storing the data with replication level 2, and we saw the following:
 - the second node in the pipeline detects a checksum error on the data it 
 received from the first node. We don't know if the client sent a bad 
 checksum, or if it got corrupted between node 1 and node 2 in the pipeline.
 - this caused the second node to get kicked out of the pipeline, since it 
 threw an exception. The pipeline started up again with only one replica (the 
 first node in the pipeline)
 - this replica was later determined to be corrupt by the block scanner, and 
 unrecoverable since it is the only replica

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)

Todd Lipcon created HDFS-3876:
-

 Summary: NN should not RPC to self to find trash defaults (causes 
deadlock)
 Key: HDFS-3876
 URL: https://issues.apache.org/jira/browse/HDFS-3876
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0, 2.2.0-alpha
Reporter: Todd Lipcon
Priority: Blocker


When transitioning a SBN to active, I ran into the following situation:
- the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
{{initialize}} function then tries to make an RPC to the same node to find out 
the defaults.
- This is happening inside the NN write lock (since it's part of the active 
initialization). Hence, all of the other handler threads are already blocked 
waiting to get the NN lock.
- Since no handler threads are free, the RPC blocks forever and the NN never 
enters active state.

We need to have a general policy that the NN should never make RPCs to itself 
for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails


 [ 
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3873:
--

Attachment: (was: HDFS-3873.patch)

 Hftp assumes security is disabled if token fetch fails
 --

 Key: HDFS-3873
 URL: https://issues.apache.org/jira/browse/HDFS-3873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch


 Hftp ignores all exceptions generated while trying to get a token, based on 
 the assumption that it means security is disabled.  Debugging problems is 
 excruciatingly difficult when security is enabled but something goes wrong.  
 Job submissions succeed, but tasks fail because the NN rejects the user as 
 unauthenticated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3873) Hftp assumes security is disabled if token fetch fails


 [ 
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daryn Sharp updated HDFS-3873:
--

Attachment: HDFS-3873.patch

Re-attaching trunk patch since build tried to use 23 patch.

 Hftp assumes security is disabled if token fetch fails
 --

 Key: HDFS-3873
 URL: https://issues.apache.org/jira/browse/HDFS-3873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch


 Hftp ignores all exceptions generated while trying to get a token, based on 
 the assumption that it means security is disabled.  Debugging problems is 
 excruciatingly difficult when security is enabled but something goes wrong.  
 Job submissions succeed, but tasks fail because the NN rejects the user as 
 unauthenticated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3863) QJM: track last committed txid


 [ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-3863:
--

Attachment: hdfs-3863.txt

I've put this through a few thousand runs of the {{testRandomized}} fault test, 
so I think the new sanity checks are reasonable.

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt, hdfs-3863.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails

[
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445466#comment-13445466
]

Hadoop QA commented on HDFS-3873:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543175/HDFS-3873.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3128//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3128//console

This message is automatically generated.

Hftp assumes security is disabled if token fetch fails
--

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access


[ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445468#comment-13445468
 ] 

Andy Isaacson commented on HDFS-3733:
-

bq. I have to logAuditEvent(false under any exception.

This false assumption was the root of my confusion.  In fact, if an exception 
other than ACE occurs, there's no need to logAuditEvent.  None of the other 
callsites do so.

Thanks for bringing this up, Eli.  New patch attached.

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access


 [ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Isaacson updated HDFS-3733:


Attachment: hdfs-3733-7.txt

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733-7.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails


[ 
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445490#comment-13445490
 ] 

Daryn Sharp commented on HDFS-3873:
---

Failed test precedes this patch, it's fixed by HDFS-3852.

 Hftp assumes security is disabled if token fetch fails
 --

 Key: HDFS-3873
 URL: https://issues.apache.org/jira/browse/HDFS-3873
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.3, 3.0.0, 2.2.0-alpha
Reporter: Daryn Sharp
Assignee: Daryn Sharp
 Attachments: HDFS-3873.branch-23.patch, HDFS-3873.patch


 Hftp ignores all exceptions generated while trying to get a token, based on 
 the assumption that it means security is disabled.  Debugging problems is 
 excruciatingly difficult when security is enabled but something goes wrong.  
 Job submissions succeed, but tasks fail because the NN rejects the user as 
 unauthenticated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access


[ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445510#comment-13445510
 ] 

Eli Collins commented on HDFS-3733:
---

Looks great Andy.

+1 pending jenkins.

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733-7.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)


 [ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-3876:
-

Assignee: Eli Collins

 NN should not RPC to self to find trash defaults (causes deadlock)
 --

 Key: HDFS-3876
 URL: https://issues.apache.org/jira/browse/HDFS-3876
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0, 2.2.0-alpha
Reporter: Todd Lipcon
Assignee: Eli Collins
Priority: Blocker

 When transitioning a SBN to active, I ran into the following situation:
 - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
 {{initialize}} function then tries to make an RPC to the same node to find 
 out the defaults.
 - This is happening inside the NN write lock (since it's part of the active 
 initialization). Hence, all of the other handler threads are already blocked 
 waiting to get the NN lock.
 - Since no handler threads are free, the RPC blocks forever and the NN never 
 enters active state.
 We need to have a general policy that the NN should never make RPCs to itself 
 for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)


[ 
https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445531#comment-13445531
 ] 

Eli Collins commented on HDFS-3876:
---

I'll try to get a patch up tonight, if it's blocking you I can revert it.

 NN should not RPC to self to find trash defaults (causes deadlock)
 --

 Key: HDFS-3876
 URL: https://issues.apache.org/jira/browse/HDFS-3876
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 3.0.0, 2.2.0-alpha
Reporter: Todd Lipcon
Assignee: Eli Collins
Priority: Blocker

 When transitioning a SBN to active, I ran into the following situation:
 - the TrashPolicy first gets loaded by an IPC Server Handler thread. The 
 {{initialize}} function then tries to make an RPC to the same node to find 
 out the defaults.
 - This is happening inside the NN write lock (since it's part of the active 
 initialization). Hence, all of the other handler threads are already blocked 
 waiting to get the NN lock.
 - Since no handler threads are free, the RPC blocks forever and the NN never 
 enters active state.
 We need to have a general policy that the NN should never make RPCs to itself 
 for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3733) Audit logs should include WebHDFS access

[
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445565#comment-13445565
]

Hadoop QA commented on HDFS-3733:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543203/hdfs-3733-7.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 4 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3131//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3131//console

This message is automatically generated.

Audit logs should include WebHDFS access

hello world
{noformat}
This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}}
thereby triggering the existing {{logAuditEvent}} code.

[jira] [Commented] (HDFS-3873) Hftp assumes security is disabled if token fetch fails

[
https://issues.apache.org/jira/browse/HDFS-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445583#comment-13445583
]

Hadoop QA commented on HDFS-3873:
-

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12543197/HDFS-3873.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 1 new or modified test
files.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 eclipse:eclipse. The patch built with eclipse:eclipse.

+1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests in
hadoop-hdfs-project/hadoop-hdfs:

org.apache.hadoop.hdfs.TestHftpDelegationToken

org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks

org.apache.hadoop.hdfs.server.namenode.ha.TestStandbyCheckpoints
org.apache.hadoop.hdfs.TestReplication
org.apache.hadoop.hdfs.TestPersistBlocks

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-HDFS-Build/3130//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/3130//console

This message is automatically generated.

Hftp assumes security is disabled if token fetch fails
--

[jira] [Commented] (HDFS-3866) HttpFS build should download Tomcat via Maven instead of directly

2012-08-30 Thread Alejandro Abdelnur (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445605#comment-13445605
 ] 

Alejandro Abdelnur commented on HDFS-3866:
--

HttpFS uses a Tomcat server to run the service, not just Tomcat JARs. A Tomcat 
server is much more than just the JARs (available in Maven), it has a special 
layout, scripts and configuration files. Tomcat server TARBALLs are not 
available in Maven.

To help build this in-house without downloading Tomcat TARBALL from the 
internet you could modify the property that sets the download URL to an 
internal web server where you stage the Tomcat TARBALL.

One thing we could do as part of this JIRA is to make the download location to 
be a POM property so you can easily override it with -D or edit in the 
properties section.


 HttpFS build should download Tomcat via Maven instead of directly
 -

 Key: HDFS-3866
 URL: https://issues.apache.org/jira/browse/HDFS-3866
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: build
Affects Versions: 2.0.0-alpha
 Environment: CDH4 build on CentOS 6.2
Reporter: Ryan Hennig
Priority: Minor

 When trying to enable a build of CDH4 in Jenkins, I got a build error due to 
 an attempt to download Tomcat from the internet directly instead of via Maven 
 and thus our internal Maven repository.
 The problem is due to this line in 
 src/hadoop-hdfs-project/hadoop-hdfs-httpfs/target/antrun/build-main.xml:
   get dest=downloads/tomcat.tar.gz skipexisting=true verbose=true 
 src=http://archive.apache.org/dist/tomcat/tomcat-6/v6.0.32/bin/apache-tomcat-6.0.32.tar.gz/
 This build.xml is generated from 
 src/hadoop-hdfs-project/hadoop-hdfs-httpfs/pom.xml:
 get 
 src=http://archive.apache.org/dist/tomcat/tomcat-6/v${tomcat.version}/bin/apache-tomcat-${tomcat.version}.tar.gz;
   dest=downloads/tomcat.tar.gz verbose=true skipexisting=true/
 Instead of directly downloading from a hardcoded location, the Tomcat 
 dependency should be managed by Maven.  This would enable the use of a local 
 repository for build machines without internet access.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3135) Build a war file for HttpFS instead of packaging the server (tomcat) along with the application.

2012-08-30 Thread Alejandro Abdelnur (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445608#comment-13445608
]

Alejandro Abdelnur commented on HDFS-3135:
--

Bundling a Tomcat Server with HttpFS is just a convenience to work out of the
box from Hadoop TARBALL, you can grab the WAR file and deployed it any servlet
container implementing servlet 2.4 or higher. You'll also have to adapt some of
the httpfs scripts system properties settings for Tomcat.

On the other hand, if you use BigTop packages, only the WAR file and startup
scripts are used, and a bigtop-tomcat package provides the Tomcat server.

As I've commented in HDFS-3866, you could tweak the location from where the
tomcat TARBALL is being downloaded.

Build a war file for HttpFS instead of packaging the server (tomcat) along
with the application.

Key: HDFS-3135
URL: https://issues.apache.org/jira/browse/HDFS-3135
Project: Hadoop HDFS
Issue Type: Improvement
Components: build
Affects Versions: 0.23.2
Reporter: Ravi Prakash
Labels: build

There are several reason why web applications should not be packaged along
with the server that is expected to serve them. For one not all organisations
use vanilla tomcat. There are other reasons I won't go into.
I'm filing this bug because some of our builds failed in trying to download
the tomcat.tar.gz file. We then had to manually wget the file and place it in
downloads/ to make the build pass. I suspect the download failed because of
an overloaded server (Frankly, I don't really know). If someone has ideas,
please share them.

[jira] [Assigned] (HDFS-3232) Cleanup DatanodeInfo vs DatanodeID handling in DN servlets


 [ 
https://issues.apache.org/jira/browse/HDFS-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-3232:
-

Assignee: (was: Eli Collins)

 Cleanup DatanodeInfo vs DatanodeID handling in DN servlets
 --

 Key: HDFS-3232
 URL: https://issues.apache.org/jira/browse/HDFS-3232
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Priority: Minor
  Labels: newbie

 The DN servlets currently have code like the following:
 {code}
   final String hostname = host instanceof DatanodeInfo 
   ? ((DatanodeInfo)host).getHostName() : host.getIpAddr();
 {code}
 I believe this outdated, that we now always get one or the other (at least 
 when not running the tests). Need to verify that. We should clean this code 
 up as well, eg always use the IP (which we'll lookup the FQDN for) since the 
 hostname isn't necessarily valid to put in a URL (the DN hostname isn't 
 necesarily a FQDN).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3640) Don't use Util#now or System#currentTimeMillis for calculating intervals


 [ 
https://issues.apache.org/jira/browse/HDFS-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-3640:
-

Assignee: (was: Eli Collins)

 Don't use Util#now or System#currentTimeMillis for calculating intervals
 

 Key: HDFS-3640
 URL: https://issues.apache.org/jira/browse/HDFS-3640
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins

 Per HDFS-3485 we shouldn't use Util#now or System#currentTimeMillis to 
 calculate intervals as they can be affected by system clock changes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-3233) Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID


 [ 
https://issues.apache.org/jira/browse/HDFS-3233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-3233:
-

Assignee: (was: Eli Collins)

 Move IP to FQDN conversion from DatanodeJSPHelper to DatanodeID
 ---

 Key: HDFS-3233
 URL: https://issues.apache.org/jira/browse/HDFS-3233
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Eli Collins
Priority: Minor
  Labels: newbie

 In a handful of places DatanodeJSPHelper looks up the IP for a DN and then 
 determines a FQDN for the IP. We should move this code to a single place, a 
 new DatanodeID to return the FQDN for a DatanodeID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-2918) HA: Update HA docs to cover dfsadmin


 [ 
https://issues.apache.org/jira/browse/HDFS-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-2918:
-

Assignee: (was: Eli Collins)

 HA: Update HA docs to cover dfsadmin
 

 Key: HDFS-2918
 URL: https://issues.apache.org/jira/browse/HDFS-2918
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha
Affects Versions: 0.24.0
Reporter: Eli Collins

 dfsadmin currently always uses the first namenode rather than failing over. 
 It should failover like other clients, unless fs specifies a specific 
 namenode.
 {noformat}
 hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs haadmin -failover nn1 nn2
 Failover from nn1 to nn2 successful
 # nn2 is 8022
 hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode 
 enter
 Safe mode is ON
 hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -safemode get 
 Safe mode is OFF
 hadoop-0.24.0-SNAPSHOT $ ./bin/hdfs dfsadmin -fs localhost:8022 -safemode get
 Safe mode is ON
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-2911) Gracefully handle OutOfMemoryErrors


 [ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-2911:
-

Assignee: (was: Eli Collins)

 Gracefully handle OutOfMemoryErrors
 ---

 Key: HDFS-2911
 URL: https://issues.apache.org/jira/browse/HDFS-2911
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 1.0.0
Reporter: Eli Collins

 We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
 We should catch them in a high-level handler, cleanly fail the RPC (vs 
 sending back the OOM stackrace) or background thread, and shutdown the NN or 
 DN. Currently the process is left in a not well-test tested state 
 (continuously fails RPCs and internal threads, may or may not recover and 
 doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HDFS-2896) The 2NN incorrectly daemonizes


 [ 
https://issues.apache.org/jira/browse/HDFS-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins reassigned HDFS-2896:
-

Assignee: (was: Eli Collins)

 The 2NN incorrectly daemonizes
 --

 Key: HDFS-2896
 URL: https://issues.apache.org/jira/browse/HDFS-2896
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0, 0.24.0
Reporter: Eli Collins
  Labels: newbie

 The SecondaryNameNode (and Checkpointer) confuse o.a.h.u.Daemon with a Unix 
 daemon. Per below it intends to create a thread that never ends, but 
 o.a.h.u.Daemon just marks a thread with Java's Thread#setDaemon which means 
 Java will terminate the thread when there are no more non-daemon user threads 
 running
 {code}
 // Create a never ending deamon
 Daemon checkpointThread = new Daemon(secondary);
 {code}
 Perhaps they thought they were using commons Daemon. We of course don't want 
 the 2NN to exit unless it exits itself or is stopped explicitly. Currently it 
 won't do this because the main thread is not marked as a daemon thread. In 
 any case, let's make the 2NN consistent with the NN in this regard (exit when 
 the RPC thread exits).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2782) HA: Support multiple shared edits dirs


 [ 
https://issues.apache.org/jira/browse/HDFS-2782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-2782.
---

  Resolution: Won't Fix
Assignee: (was: Eli Collins)
Target Version/s:   (was: 0.24.0)

Given QJM (HDFS-3077) IMO this is no longer worth considering.

 HA: Support multiple shared edits dirs
 --

 Key: HDFS-2782
 URL: https://issues.apache.org/jira/browse/HDFS-2782
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: ha
Affects Versions: 0.24.0
Reporter: Aaron T. Myers

 Supporting multiple shared dirs will improve availability (eg see HDFS-2769). 
 You may want to use multiple shared dirs on a single filer (eg for better 
 fault isolation) or because you want to use multiple filers/mounts. Per 
 HDFS-2752 (and HDFS-2735) we need to do things like use the JournalSet in 
 EditLogTailer and add tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HDFS-3733) Audit logs should include WebHDFS access


 [ 
https://issues.apache.org/jira/browse/HDFS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-3733:
--

  Resolution: Fixed
   Fix Version/s: 2.2.0-alpha
Target Version/s:   (was: 2.2.0-alpha)
  Status: Resolved  (was: Patch Available)

Test failure is unrelated.

I've committed this. Thanks Andy!

 Audit logs should include WebHDFS access
 

 Key: HDFS-3733
 URL: https://issues.apache.org/jira/browse/HDFS-3733
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Affects Versions: 2.0.0-alpha
Reporter: Andy Isaacson
Assignee: Andy Isaacson
 Fix For: 2.2.0-alpha

 Attachments: hdfs-3733-1.txt, hdfs-3733-2.txt, hdfs-3733-3.txt, 
 hdfs-3733-4.txt, hdfs-3733-6.txt, hdfs-3733-7.txt, hdfs-3733.txt


 Access via WebHdfs does not result in audit log entries.  It should.
 {noformat}
 % curl http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=GETFILESTATUS;
 {FileStatus:{accessTime:1343351432395,blockSize:134217728,group:supergroup,length:12,modificationTime:1342808158399,owner:adi,pathSuffix:,permission:644,replication:1,type:FILE}}
 {noformat}
 and observe that no audit log entry is generated.
 Interestingly, OPEN requests do not generate audit log entries when the NN 
 generates the redirect, but do generate audit log entries when the second 
 phase against the DN is executed.
 {noformat}
 % curl -v 'http://nn1:50070/webhdfs/v1/user/adi/hello.txt?op=OPEN'
 ...
  HTTP/1.1 307 TEMPORARY_REDIRECT
  Location: 
 http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020offset=0
 ...
 % curl -v 
 'http://dn01:50075/webhdfs/v1/user/adi/hello.txt?op=OPENnamenoderpcaddress=nn1:8020'
 ...
  HTTP/1.1 200 OK
  Content-Type: application/octet-stream
  Content-Length: 12
  Server: Jetty(6.1.26.cloudera.1)
  
 hello world
 {noformat}
 This happens because {{DatanodeWebHdfsMethods#get}} uses {{DFSClient#open}} 
 thereby triggering the existing {{logAuditEvent}} code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-08-30 Thread Suresh Srinivas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas resolved HDFS-2911.
---

Resolution: Won't Fix

I am going to mark this as won't fix. If anyone disagrees, then reopen with. 
Reason. 

 Gracefully handle OutOfMemoryErrors
 ---

 Key: HDFS-2911
 URL: https://issues.apache.org/jira/browse/HDFS-2911
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 1.0.0
Reporter: Eli Collins

 We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
 We should catch them in a high-level handler, cleanly fail the RPC (vs 
 sending back the OOM stackrace) or background thread, and shutdown the NN or 
 DN. Currently the process is left in a not well-test tested state 
 (continuously fails RPCs and internal threads, may or may not recover and 
 doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors


[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445656#comment-13445656
 ] 

Eli Collins commented on HDFS-2911:
---

No longer think we should do the kill -9 option?

 Gracefully handle OutOfMemoryErrors
 ---

 Key: HDFS-2911
 URL: https://issues.apache.org/jira/browse/HDFS-2911
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 1.0.0
Reporter: Eli Collins

 We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
 We should catch them in a high-level handler, cleanly fail the RPC (vs 
 sending back the OOM stackrace) or background thread, and shutdown the NN or 
 DN. Currently the process is left in a not well-test tested state 
 (continuously fails RPCs and internal threads, may or may not recover and 
 doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors

2012-08-30 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445688#comment-13445688
 ] 

Suresh Srinivas commented on HDFS-2911:
---

I actually thought about it. But given title gracefully handle and killing is 
not graceful, decided to close the bug :)

Feel free to change the title and reopen. Or perhaps a new Jira.

 Gracefully handle OutOfMemoryErrors
 ---

 Key: HDFS-2911
 URL: https://issues.apache.org/jira/browse/HDFS-2911
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node, name-node
Affects Versions: 0.23.0, 1.0.0
Reporter: Eli Collins

 We should gracefully handle j.l.OutOfMemoryError exceptions in the NN or DN. 
 We should catch them in a high-level handler, cleanly fail the RPC (vs 
 sending back the OOM stackrace) or background thread, and shutdown the NN or 
 DN. Currently the process is left in a not well-test tested state 
 (continuously fails RPCs and internal threads, may or may not recover and 
 doesn't shutdown gracefully).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2911) Gracefully handle OutOfMemoryErrors