date:20140107


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864006#comment-13864006
 ] 

Hadoop QA commented on HDFS-5715:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621702/HDFS-5715.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 11 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5833//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5833//console

This message is automatically generated.

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK


 [ 
https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5704:


Status: Patch Available  (was: Open)

 Change OP_UPDATE_BLOCKS  with a new OP_ADD_BLOCK
 

 Key: HDFS-5704
 URL: https://issues.apache.org/jira/browse/HDFS-5704
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch


 Currently every time a block a allocated, the entire list of blocks are 
 written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth 
 issue. The total size of editlog records for a file with large number of 
 blocks could be huge.
 The goal of this jira is discuss adding a different editlog record that only 
 records allocation of block and not the entire block list, on every block 
 allocation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK


 [ 
https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5704:


Attachment: HDFS-5704.001.patch

Thanks for the review Suresh! Update the patch to address your comments. Also 
add two unit tests to cover some basic scenarios.

 Change OP_UPDATE_BLOCKS  with a new OP_ADD_BLOCK
 

 Key: HDFS-5704
 URL: https://issues.apache.org/jira/browse/HDFS-5704
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch


 Currently every time a block a allocated, the entire list of blocks are 
 written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth 
 issue. The total size of editlog records for a file with large number of 
 blocks could be huge.
 The goal of this jira is discuss adding a different editlog record that only 
 records allocation of block and not the entire block list, on every block 
 allocation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail

2014-01-07 Thread Yu Li (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864016#comment-13864016
 ] 

Yu Li commented on HDFS-2994:
-

Happen to find this JIRA already integrated into 2.1.1-beta release, but the 
status here remains unresolved. May someone update the status? :-)

 If lease soft limit is recovered successfully the append can fail
 -

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: Tao Luo
 Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, 
 HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail


[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864021#comment-13864021
 ] 

Hadoop QA commented on HDFS-2994:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12598329/HDFS-2994-2.0.6-alpha.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5835//console

This message is automatically generated.

 If lease soft limit is recovered successfully the append can fail
 -

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: Tao Luo
 Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, 
 HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-07 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864039#comment-13864039
 ] 

Todd Lipcon commented on HDFS-5138:
---

General:
- thanks for the description in the above JIRA comment. Can you transfer this 
comment somewhere into the docs, perhaps 
hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HDFSHighAvailabilityWithQJM.apt.vm 
or a new page? Perhaps with a slightly more user-facing angle.

- what would happen if the admin called finalizeUpgrade() when neither node had 
yet transitioned to active? I don't see any sanity check here.. is it possible 
you'd end up leaving the shared in an orphaned upgrading state and never end 
up finalizing it? Similarly, what happens if you start one NN with -upgrade, 
and you start the other one without -upgrade. It seems to me it should check 
for the upgrade lock file in the shared dir and say looks like an upgrade is 
in progress, please start the SBN with -upgrade.

- there are a few TODOs in the code that probably need to be addressed - 
nothing big, just a few things you may have missed.

JournalManager.java:
- would be good to add Javadoc on the new methods, so that JM implementors know 
what the upgrade process looks like. i.e what is pre-upgrade, etc?

QuorumJournalManager.java:
- Could not perform upgrade or more JournalNodes error message has some 
missing words in it.

+throw new IOException(Failed to lock shared log.);
- this line should be unreachable, right? maybe an AssertionError(Unreachable 
code) would make more sense? Also this same exception message is used down 
below in canRollBack which isn't quite right.

Journal:
- when you upgrade the journal, I'd think you'd to copy over all the data from 
the PersistentLongFiles into the new dir?

FileJournalManager:
- worth considering a protobuf for the shared log lock, in case we want to add 
other fields to it later (instead of the custom serialization you do now)
- need try...finally around the code where you write the shared log lock. On 
the read side you're also forgetting to close it.
- the creation of the shared log lock file is non-atomic... I'm worried that we 
may hit the race in practice, since the AtomicFileOutputStream implies an 
fsync, which means that between the exists() check and the rename to the lock 
file, you may really have a decently long time window. Maybe we can use locking 
code like Storage does? Feel free to punt to a follow-up.


FSNamesystem.java:
- can you add a doc on doUpgradeOfSharedLogOnTransitionToActive()?

NNUpgradeUtil.java:
- why are some of the functions package-private and others are public?
- make it an abstract class or give it a private constructor so it can't be 
instantiated, since it's just static methods
- brief javadocs would be nice for these methods, even though they're straight 
refactors of existing code.

FSEditLog.java:
- in canRollBack(), you throw an exception if there is no shared log. That 
doesn't seem right...
- capitalization of RollBack vs Rollback is a little inconsistent. Looks 
like Rollback is consistently used prior to this patch, so probably best to 
stick with that.

FSImage.java:
- in the switch statement on the startup option, I think you should keep the 
ROLLBACK case, but just have it throw AssertionError -- just to make sure we 
don't accidentally have some case where we're passing it there but shouldn't 
be.GA


 Support HDFS upgrade in HA
 --

 Key: HDFS-5138
 URL: https://issues.apache.org/jira/browse/HDFS-5138
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
 Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch


 With HA enabled, NN wo't start with -upgrade. Since there has been a layout 
 version change between 2.0.x and 2.1.x, starting NN in upgrade mode was 
 necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way 
 to get around this was to disable HA and upgrade. 
 The NN and the cluster cannot be flipped back to HA until the upgrade is 
 finalized. If HA is disabled only on NN for layout upgrade and HA is turned 
 back on without involving DNs, things will work, but finaliizeUpgrade won't 
 work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade 
 snapshots won't get removed.
 We will need a different ways of doing layout upgrade and upgrade snapshot.  
 I am marking this as a 2.1.1-beta blocker based on feedback from others.  If 
 there is a reasonable workaround that does not increase maintenance window 
 greatly, we can lower its priority from blocker to critical.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5667) Include DatanodeStorage in StorageReport


[ 
https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864107#comment-13864107
 ] 

Hudson commented on HDFS-5667:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
HDFS-5667. Add test missed in previous checkin (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555956)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
HDFS-5667. Include DatanodeStorage in StorageReport. (Arpit Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555929)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReport.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/common/TestJspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java


 Include DatanodeStorage in StorageReport
 

 Key: HDFS-5667
 URL: https://issues.apache.org/jira/browse/HDFS-5667
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Eric Sirianni
Assignee: Arpit Agarwal
 Fix For: 3.0.0, 2.4.0

 Attachments: h5667.02.patch, h5667.03.patch, h5667.04.patch, 
 h5667.05.patch


 The fix for HDFS-5484 was accidentally regressed by the following change made 
 via HDFS-5542
 {code}
 +  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
  synchronized (storageMap) {
DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
if (storage == null) {
 @@ -670,8 +658,6 @@
for DN  + getXferAddr());
  storage = new DatanodeStorageInfo(this, s);
  storageMap.put(s.getStorageID(), storage);
 -  } else {
 -storage.setState(s.getState());
}
return storage;
  }
 {code}
 By removing the 'else' and no longer updating the state in the BlockReport 
 processing path, we effectively get the bogus state  type that is set via 
 the first heartbeat (see the fix for HDFS-5455):
 {code}
 +  if (storage == null) {
 +// This is seen during cluster initialization when the heartbeat
 +// is received before the initial block reports from each storage.
 +storage = updateStorage(new DatanodeStorage(report.getStorageID()));
 {code}
 Even reverting the change and reintroducing the 'else' leaves the state  
 type temporarily inaccurate until the first block report. 
 As discussed with [~arpitagarwal], a better fix would be to simply include 
 the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to 
 only the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
 {{StorageReportProto}}. It needs to be a new optional field and we cannot 
 remove the existing {{StorageUuid}} for protocol compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK


[ 
https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864112#comment-13864112
 ] 

Hadoop QA commented on HDFS-5704:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621762/HDFS-5704.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
  org.apache.hadoop.hdfs.TestFileAppendRestart

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5834//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5834//console

This message is automatically generated.

 Change OP_UPDATE_BLOCKS  with a new OP_ADD_BLOCK
 

 Key: HDFS-5704
 URL: https://issues.apache.org/jira/browse/HDFS-5704
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch


 Currently every time a block a allocated, the entire list of blocks are 
 written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth 
 issue. The total size of editlog records for a file with large number of 
 blocks could be huge.
 The goal of this jira is discuss adding a different editlog record that only 
 records allocation of block and not the entire block list, on every block 
 allocation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5719) FSImage#doRollback() should close prevState before return


[ 
https://issues.apache.org/jira/browse/HDFS-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864108#comment-13864108
 ] 

Hudson commented on HDFS-5719:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
HDFS-5719. FSImage#doRollback() should close prevState before return. 
Contributed by Ted Yu (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556057)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java


 FSImage#doRollback() should close prevState before return
 -

 Key: HDFS-5719
 URL: https://issues.apache.org/jira/browse/HDFS-5719
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 3.0.0

 Attachments: hdfs-5719.txt


 {code}
 FSImage prevState = new FSImage(conf);
 {code}
 prevState should be closed before return from doRollback()



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864102#comment-13864102
 ] 

Hudson commented on HDFS-2832:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
HDFS-2832. Update CHANGES.txt to reflect merge to branch-2 (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556088)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 3.0.0, 2.4.0

 Attachments: 20130813-HeterogeneousStorage.pdf, 
 20131125-HeterogeneousStorage-TestPlan.pdf, 
 20131125-HeterogeneousStorage.pdf, 
 20131202-HeterogeneousStorage-TestPlan.pdf, 
 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, 
 editsStored, editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
 h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
 h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
 h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
 h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
 h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
 h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, 
 h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, 
 h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, 
 h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch, 
 h2832_20131211b.patch, h2832_branch-2_20131226.patch, 
 h2832_branch-2_20140103.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached


[ 
https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864110#comment-13864110
 ] 

Hudson commented on HDFS-5589:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
HDFS-5589. Namenode loops caching and uncaching when data should be uncached. 
(awang via cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555996)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 Namenode loops caching and uncaching when data should be uncached
 -

 Key: HDFS-5589
 URL: https://issues.apache.org/jira/browse/HDFS-5589
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: caching, namenode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch


 This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too.
 If you add a new caching directive then remove it, the Namenode will 
 sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE 
 repeatedly to the datanodes where the data was previously cached.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5220) Expose group resolution time as metric


[ 
https://issues.apache.org/jira/browse/HDFS-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864109#comment-13864109
 ] 

Hudson commented on HDFS-5220:
--

SUCCESS: Integrated in Hadoop-Yarn-trunk #445 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/445/])
HDFS-5220. Expose group resolution time as metric (jxiang via cmccabe) 
(cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555976)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUserGroupInformation.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


 Expose group resolution time as metric
 --

 Key: HDFS-5220
 URL: https://issues.apache.org/jira/browse/HDFS-5220
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Rob Weltman
Assignee: Jimmy Xiang
 Fix For: 2.4.0

 Attachments: 2.4-5220.addendum, 2.4-5220.patch, hdfs-5220.addendum, 
 hdfs-5220.patch, hdfs-5220_v2.patch


 It would help detect issues with authentication configuration and with 
 overloading an authentication source if the name node exposed the time taken 
 for group resolution as a metric.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-4834) Add -exclude path to fsck

2014-01-07 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HDFS-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864115#comment-13864115
 ] 

Gerardo Vázquez commented on HDFS-4834:
---

seems duplicated for HDFS-4993

 Add -exclude path to fsck 
 

 Key: HDFS-4834
 URL: https://issues.apache.org/jira/browse/HDFS-4834
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Gerardo Vázquez
Priority: Minor

 fsck would fail if the current file being check is deleted. If you are 
 loading and deleting loaded files quite often this would lead to many fsck 
 attempts until you can do a complete check. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864219#comment-13864219
 ] 

Hudson commented on HDFS-2832:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
HDFS-2832. Update CHANGES.txt to reflect merge to branch-2 (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556088)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 3.0.0, 2.4.0

 Attachments: 20130813-HeterogeneousStorage.pdf, 
 20131125-HeterogeneousStorage-TestPlan.pdf, 
 20131125-HeterogeneousStorage.pdf, 
 20131202-HeterogeneousStorage-TestPlan.pdf, 
 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, 
 editsStored, editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
 h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
 h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
 h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
 h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
 h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
 h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, 
 h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, 
 h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, 
 h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch, 
 h2832_20131211b.patch, h2832_branch-2_20131226.patch, 
 h2832_branch-2_20140103.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5220) Expose group resolution time as metric


[ 
https://issues.apache.org/jira/browse/HDFS-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864226#comment-13864226
 ] 

Hudson commented on HDFS-5220:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
HDFS-5220. Expose group resolution time as metric (jxiang via cmccabe) 
(cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555976)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUserGroupInformation.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


 Expose group resolution time as metric
 --

 Key: HDFS-5220
 URL: https://issues.apache.org/jira/browse/HDFS-5220
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Rob Weltman
Assignee: Jimmy Xiang
 Fix For: 2.4.0

 Attachments: 2.4-5220.addendum, 2.4-5220.patch, hdfs-5220.addendum, 
 hdfs-5220.patch, hdfs-5220_v2.patch


 It would help detect issues with authentication configuration and with 
 overloading an authentication source if the name node exposed the time taken 
 for group resolution as a metric.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5719) FSImage#doRollback() should close prevState before return


[ 
https://issues.apache.org/jira/browse/HDFS-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864225#comment-13864225
 ] 

Hudson commented on HDFS-5719:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
HDFS-5719. FSImage#doRollback() should close prevState before return. 
Contributed by Ted Yu (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556057)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java


 FSImage#doRollback() should close prevState before return
 -

 Key: HDFS-5719
 URL: https://issues.apache.org/jira/browse/HDFS-5719
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 3.0.0

 Attachments: hdfs-5719.txt


 {code}
 FSImage prevState = new FSImage(conf);
 {code}
 prevState should be closed before return from doRollback()



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5667) Include DatanodeStorage in StorageReport


[ 
https://issues.apache.org/jira/browse/HDFS-5667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864224#comment-13864224
 ] 

Hudson commented on HDFS-5667:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
HDFS-5667. Add test missed in previous checkin (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555956)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestStorageReport.java
HDFS-5667. Include DatanodeStorage in StorageReport. (Arpit Agarwal) (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555929)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/protocol/StorageReport.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/proto/DatanodeProtocol.proto
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSClusterWithNodeGroup.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManagerTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/common/TestJspHelper.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/SimulatedFSDataset.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDiskError.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/NNThroughputBenchmark.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDeadDatanode.java


 Include DatanodeStorage in StorageReport
 

 Key: HDFS-5667
 URL: https://issues.apache.org/jira/browse/HDFS-5667
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Eric Sirianni
Assignee: Arpit Agarwal
 Fix For: 3.0.0, 2.4.0

 Attachments: h5667.02.patch, h5667.03.patch, h5667.04.patch, 
 h5667.05.patch


 The fix for HDFS-5484 was accidentally regressed by the following change made 
 via HDFS-5542
 {code}
 +  DatanodeStorageInfo updateStorage(DatanodeStorage s) {
  synchronized (storageMap) {
DatanodeStorageInfo storage = storageMap.get(s.getStorageID());
if (storage == null) {
 @@ -670,8 +658,6 @@
for DN  + getXferAddr());
  storage = new DatanodeStorageInfo(this, s);
  storageMap.put(s.getStorageID(), storage);
 -  } else {
 -storage.setState(s.getState());
}
return storage;
  }
 {code}
 By removing the 'else' and no longer updating the state in the BlockReport 
 processing path, we effectively get the bogus state  type that is set via 
 the first heartbeat (see the fix for HDFS-5455):
 {code}
 +  if (storage == null) {
 +// This is seen during cluster initialization when the heartbeat
 +// is received before the initial block reports from each storage.
 +storage = updateStorage(new DatanodeStorage(report.getStorageID()));
 {code}
 Even reverting the change and reintroducing the 'else' leaves the state  
 type temporarily inaccurate until the first block report. 
 As discussed with [~arpitagarwal], a better fix would be to simply include 
 the full {{DatanodeStorage}} object in the {{StorageReport}} (as opposed to 
 only the Storage ID).  This requires adding the {{DatanodeStorage}} object to 
 {{StorageReportProto}}. It needs to be a new optional field and we cannot 
 remove the existing {{StorageUuid}} for protocol compatibility.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached


[ 
https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864227#comment-13864227
 ] 

Hudson commented on HDFS-5589:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1637 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1637/])
HDFS-5589. Namenode loops caching and uncaching when data should be uncached. 
(awang via cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555996)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 Namenode loops caching and uncaching when data should be uncached
 -

 Key: HDFS-5589
 URL: https://issues.apache.org/jira/browse/HDFS-5589
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: caching, namenode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch


 This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too.
 If you add a new caching directive then remove it, the Namenode will 
 sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE 
 repeatedly to the datanodes where the data was previously cached.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5719) FSImage#doRollback() should close prevState before return


[ 
https://issues.apache.org/jira/browse/HDFS-5719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864289#comment-13864289
 ] 

Hudson commented on HDFS-5719:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/])
HDFS-5719. FSImage#doRollback() should close prevState before return. 
Contributed by Ted Yu (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556057)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImage.java


 FSImage#doRollback() should close prevState before return
 -

 Key: HDFS-5719
 URL: https://issues.apache.org/jira/browse/HDFS-5719
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Fix For: 3.0.0

 Attachments: hdfs-5719.txt


 {code}
 FSImage prevState = new FSImage(conf);
 {code}
 prevState should be closed before return from doRollback()



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5220) Expose group resolution time as metric


[ 
https://issues.apache.org/jira/browse/HDFS-5220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864290#comment-13864290
 ] 

Hudson commented on HDFS-5220:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/])
HDFS-5220. Expose group resolution time as metric (jxiang via cmccabe) 
(cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555976)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeys.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/Groups.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/TestUserGroupInformation.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/metrics/TestNameNodeMetrics.java


 Expose group resolution time as metric
 --

 Key: HDFS-5220
 URL: https://issues.apache.org/jira/browse/HDFS-5220
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Rob Weltman
Assignee: Jimmy Xiang
 Fix For: 2.4.0

 Attachments: 2.4-5220.addendum, 2.4-5220.patch, hdfs-5220.addendum, 
 hdfs-5220.patch, hdfs-5220_v2.patch


 It would help detect issues with authentication configuration and with 
 overloading an authentication source if the name node exposed the time taken 
 for group resolution as a metric.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-2832) Enable support for heterogeneous storages in HDFS


[ 
https://issues.apache.org/jira/browse/HDFS-2832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864283#comment-13864283
 ] 

Hudson commented on HDFS-2832:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/])
HDFS-2832. Update CHANGES.txt to reflect merge to branch-2 (arp: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556088)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Enable support for heterogeneous storages in HDFS
 -

 Key: HDFS-2832
 URL: https://issues.apache.org/jira/browse/HDFS-2832
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 0.24.0
Reporter: Suresh Srinivas
Assignee: Suresh Srinivas
 Fix For: 3.0.0, 2.4.0

 Attachments: 20130813-HeterogeneousStorage.pdf, 
 20131125-HeterogeneousStorage-TestPlan.pdf, 
 20131125-HeterogeneousStorage.pdf, 
 20131202-HeterogeneousStorage-TestPlan.pdf, 
 20131203-HeterogeneousStorage-TestPlan.pdf, H2832_20131107.patch, 
 editsStored, editsStored, h2832_20131023.patch, h2832_20131023b.patch, 
 h2832_20131025.patch, h2832_20131028.patch, h2832_20131028b.patch, 
 h2832_20131029.patch, h2832_20131103.patch, h2832_20131104.patch, 
 h2832_20131105.patch, h2832_20131107b.patch, h2832_20131108.patch, 
 h2832_20131110.patch, h2832_20131110b.patch, h2832_2013.patch, 
 h2832_20131112.patch, h2832_20131112b.patch, h2832_20131114.patch, 
 h2832_20131118.patch, h2832_20131119.patch, h2832_20131119b.patch, 
 h2832_20131121.patch, h2832_20131122.patch, h2832_20131122b.patch, 
 h2832_20131123.patch, h2832_20131124.patch, h2832_20131202.patch, 
 h2832_20131203.patch, h2832_20131210.patch, h2832_20131211.patch, 
 h2832_20131211b.patch, h2832_branch-2_20131226.patch, 
 h2832_branch-2_20140103.patch


 HDFS currently supports configuration where storages are a list of 
 directories. Typically each of these directories correspond to a volume with 
 its own file system. All these directories are homogeneous and therefore 
 identified as a single storage at the namenode. I propose, change to the 
 current model where Datanode * is a * storage, to Datanode * is a collection 
 * of strorages. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5589) Namenode loops caching and uncaching when data should be uncached


[ 
https://issues.apache.org/jira/browse/HDFS-5589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864291#comment-13864291
 ] 

Hudson commented on HDFS-5589:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1662 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1662/])
HDFS-5589. Namenode loops caching and uncaching when data should be uncached. 
(awang via cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1555996)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/CacheReplicationMonitor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestFsDatasetCache.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestCacheDirectives.java


 Namenode loops caching and uncaching when data should be uncached
 -

 Key: HDFS-5589
 URL: https://issues.apache.org/jira/browse/HDFS-5589
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: caching, namenode
Affects Versions: 3.0.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Attachments: hdfs-5589-1.patch, hdfs-5589-2.patch


 This was reported by [~cnauroth] and [~brandonli], and [~schu] repro'd it too.
 If you add a new caching directive then remove it, the Namenode will 
 sometimes get stuck in a loop where it sends DNA_CACHE and then DNA_UNCACHE 
 repeatedly to the datanodes where the data was previously cached.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective

Uma Maheswara Rao G created HDFS-5724:
-

 Summary: modifyCacheDirective logging audit log command wrongly as 
addCacheDirective
 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor


modifyCacheDirective:
{code}
 if (isAuditEnabled()  isExternalInvocation()) {
logAuditEvent(success, addCacheDirective, null, null, null);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective


 [ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5724:
--

Attachment: HDFS-5724.patch

 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective


 [ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5724:
--

Status: Patch Available  (was: Open)

 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache

2014-01-07 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864449#comment-13864449
 ] 

Tsuyoshi OZAWA commented on HDFS-5167:
--

[~jingzhao], could you check a latest patch if you have a chance?

 Add metrics about the NameNode retry cache
 --

 Key: HDFS-5167
 URL: https://issues.apache.org/jira/browse/HDFS-5167
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Jing Zhao
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: HDFS-5167.1.patch, HDFS-5167.2.patch, HDFS-5167.3.patch, 
 HDFS-5167.4.patch, HDFS-5167.5.patch, HDFS-5167.6.patch, HDFS-5167.6.patch, 
 HDFS-5167.7.patch, HDFS-5167.8.patch, HDFS-5167.9-2.patch, HDFS-5167.9.patch


 It will be helpful to have metrics in NameNode about the retry cache, such as 
 the retry count etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective

2014-01-07 Thread Andrew Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-5724:
--

Labels: caching  (was: )

 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: caching
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective

2014-01-07 Thread Andrew Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864451#comment-13864451
 ] 

Andrew Wang commented on HDFS-5724:
---

Thanks Uma, nice catch. +1 pending jenkins, no test is fine here.

 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: caching
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache


[ 
https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864467#comment-13864467
 ] 

Jing Zhao commented on HDFS-5167:
-

Sorry for the long time delay [~ozawa].. I will review it today.

 Add metrics about the NameNode retry cache
 --

 Key: HDFS-5167
 URL: https://issues.apache.org/jira/browse/HDFS-5167
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Jing Zhao
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: HDFS-5167.1.patch, HDFS-5167.2.patch, HDFS-5167.3.patch, 
 HDFS-5167.4.patch, HDFS-5167.5.patch, HDFS-5167.6.patch, HDFS-5167.6.patch, 
 HDFS-5167.7.patch, HDFS-5167.8.patch, HDFS-5167.9-2.patch, HDFS-5167.9.patch


 It will be helpful to have metrics in NameNode about the retry cache, such as 
 the retry count etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5167) Add metrics about the NameNode retry cache

2014-01-07 Thread Tsuyoshi OZAWA (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864482#comment-13864482
 ] 

Tsuyoshi OZAWA commented on HDFS-5167:
--

[~jingzhao], it's OK, no problem :-)

 Add metrics about the NameNode retry cache
 --

 Key: HDFS-5167
 URL: https://issues.apache.org/jira/browse/HDFS-5167
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, namenode
Affects Versions: 3.0.0, 2.4.0
Reporter: Jing Zhao
Assignee: Tsuyoshi OZAWA
Priority: Minor
 Attachments: HDFS-5167.1.patch, HDFS-5167.2.patch, HDFS-5167.3.patch, 
 HDFS-5167.4.patch, HDFS-5167.5.patch, HDFS-5167.6.patch, HDFS-5167.6.patch, 
 HDFS-5167.7.patch, HDFS-5167.8.patch, HDFS-5167.9-2.patch, HDFS-5167.9.patch


 It will be helpful to have metrics in NameNode about the retry cache, such as 
 the retry count etc.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null


 [ 
https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-5710:
--

Attachment: HDFS-5710.patch

Just returning empty string in case if inodes become null when its called with 
out holding global lock. getFullPathName called many places. Instead of 
retuning null and checking evrywhere null, returning empty string may be ok.
Attached simple patch with the change.

 FSDirectory#getFullPathName should check inodes against null
 

 Key: HDFS-5710
 URL: https://issues.apache.org/jira/browse/HDFS-5710
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ted Yu
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-5710.patch, hdfs-5710-output.html


 From 
 https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/
  :
 {code}
 2014-01-01 00:10:15,571 INFO  [IPC Server handler 2 on 50198] 
 blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: 
 blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 
 2014-01-01 00:10:16,559 WARN  
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  namenode.FSDirectory(1854): Could not get full path. Corresponding file 
 might have deleted already.
 2014-01-01 00:10:16,560 FATAL 
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor 
 thread received Runtime exception. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482)
   at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112)
   at java.lang.Thread.run(Thread.java:724)
 {code}
 Looks like getRelativePathINodes() returned null but getFullPathName() didn't 
 check inodes against null, leading to NPE.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864509#comment-13864509
]

Aaron T. Myers commented on HDFS-5138:
--

Thanks a lot for the comments, Todd and Suresh. I've got some obligations
during the first part of today but will try to get back to you later today or
tomorrow.

Suresh - as regards to a design doc, I could potentially write up a small one
if you really think it's necessary, but there really aren't all that many
subtle points here, and hopefully by answering the (very good!) questions
you've raised everything will become much clearer. The core of the patch isn't
even all that large - there's a ton of plumbing of new RPCs, etc. that make it
look more complex than it is. One of the goals I had one producing it was to
leave the existing non-HA upgrade system as untouched as possible, to reduce
the possibility of regressions so we can put this in a 2.x update ASAP.

Support HDFS upgrade in HA
--

Key: HDFS-5138
URL: https://issues.apache.org/jira/browse/HDFS-5138
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.1.1-beta
Reporter: Kihwal Lee
Assignee: Aaron T. Myers
Priority: Blocker
Attachments: HDFS-5138.patch, HDFS-5138.patch, HDFS-5138.patch

With HA enabled, NN wo't start with -upgrade. Since there has been a layout
version change between 2.0.x and 2.1.x, starting NN in upgrade mode was
necessary when deploying 2.1.x to an existing 2.0.x cluster. But the only way
to get around this was to disable HA and upgrade.
The NN and the cluster cannot be flipped back to HA until the upgrade is
finalized. If HA is disabled only on NN for layout upgrade and HA is turned
back on without involving DNs, things will work, but finaliizeUpgrade won't
work (the NN is in HA and it cannot be in upgrade mode) and DN's upgrade
snapshots won't get removed.
We will need a different ways of doing layout upgrade and upgrade snapshot.
I am marking this as a 2.1.1-beta blocker based on feedback from others. If
there is a reasonable workaround that does not increase maintenance window
greatly, we can lower its priority from blocker to critical.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-2994) If lease soft limit is recovered successfully the append can fail

2014-01-07 Thread Suresh Srinivas (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suresh Srinivas updated HDFS-2994:
--

   Resolution: Fixed
Fix Version/s: 2.1.1-beta
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 If lease soft limit is recovered successfully the append can fail
 -

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: Tao Luo
 Fix For: 2.1.1-beta

 Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, 
 HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective


 [ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-5724:
-

Target Version/s: 3.0.0

 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: caching
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-2994) If lease soft limit is recovered successfully the append can fail

2014-01-07 Thread Suresh Srinivas (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864520#comment-13864520
 ] 

Suresh Srinivas commented on HDFS-2994:
---

[~carp84], thanks for pointing it out. You are right. This was fixed in 
2.1.1-beta. Marking this as resolved.

 If lease soft limit is recovered successfully the append can fail
 -

 Key: HDFS-2994
 URL: https://issues.apache.org/jira/browse/HDFS-2994
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.24.0
Reporter: Todd Lipcon
Assignee: Tao Luo
 Fix For: 2.1.1-beta

 Attachments: HDFS-2994-2.0.6-alpha.patch, HDFS-2994_1.patch, 
 HDFS-2994_1.patch, HDFS-2994_2.patch, HDFS-2994_3.patch, HDFS-2994_4.patch


 I saw the following logs on my test cluster:
 {code}
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: startFile: recover lease 
 [Lease.  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6 from client 
 DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1
 2012-02-22 14:35:22,887 INFO 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. 
  Holder: DFSClient_attempt_1329943893604_0007_m_000376_0_453973131_1, 
 pendingcreates: 1], src=/benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
 internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
 closed.
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 2012-02-22 14:35:22,888 WARN org.apache.hadoop.hdfs.StateChange: DIR* 
 NameSystem.startFile: FSDirectory.replaceNode: failed to remove 
 /benchmarks/TestDFSIO/io_data/test_io_6
 {code}
 It seems like, if {{recoverLeaseInternal}} succeeds in {{startFileInternal}}, 
 then the INode will be replaced with a new one, meaning the later 
 {{replaceNode}} call can fail.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

2014-01-07 Thread Suresh Srinivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864525#comment-13864525
]

Suresh Srinivas commented on HDFS-5138:
---

bq. Suresh - as regards to a design doc, I could potentially write up a small
one if you really think it's necessary, but there really aren't all that many
subtle points here
[~atm], you probably are right. Perhaps answering my questions will do. I may
also take the answers from you and post a one pager to describe how I
understand it to see I got it right. That could perhaps be the document that we
can post in this jira, if you agree.

BTW have you looked at HDFS-5535. Are there anythings we can leverage from
that, especially around rollback marker in editlog etc.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK


 [ 
https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5704:


Attachment: HDFS-5704.002.patch
editsStored

Update the patch to fix TestFileAppendRestart. TestOfflineEditsViewer requires 
new editsStored binary file to pass.

 Change OP_UPDATE_BLOCKS  with a new OP_ADD_BLOCK
 

 Key: HDFS-5704
 URL: https://issues.apache.org/jira/browse/HDFS-5704
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch, 
 HDFS-5704.002.patch, editsStored


 Currently every time a block a allocated, the entire list of blocks are 
 written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth 
 issue. The total size of editlog records for a file with large number of 
 blocks could be huge.
 The goal of this jira is discuss adding a different editlog record that only 
 records allocation of block and not the entire block list, on every block 
 allocation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864547#comment-13864547
]

Aaron T. Myers commented on HDFS-5138:
--

bq. BTW have you looked at HDFS-5535. Are there anythings we can leverage from
that, especially around rollback marker in editlog etc.

Yes, I have looked at that. It's a good idea, but with this patch I was
explicitly trying to _not_ redo the existing upgrade/rollback system, and
instead just extend the non-HA upgrade/rollback system to work in an HA setup.

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5710) FSDirectory#getFullPathName should check inodes against null


 [ 
https://issues.apache.org/jira/browse/HDFS-5710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5710:


Status: Patch Available  (was: Open)

 FSDirectory#getFullPathName should check inodes against null
 

 Key: HDFS-5710
 URL: https://issues.apache.org/jira/browse/HDFS-5710
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Ted Yu
Assignee: Uma Maheswara Rao G
 Attachments: HDFS-5710.patch, hdfs-5710-output.html


 From 
 https://builds.apache.org/job/hbase-0.96-hadoop2/166/testReport/junit/org.apache.hadoop.hbase.mapreduce/TestTableInputFormatScan1/org_apache_hadoop_hbase_mapreduce_TestTableInputFormatScan1/
  :
 {code}
 2014-01-01 00:10:15,571 INFO  [IPC Server handler 2 on 50198] 
 blockmanagement.BlockManager(1009): BLOCK* addToInvalidates: 
 blk_1073741967_1143 127.0.0.1:40188 127.0.0.1:46149 127.0.0.1:41496 
 2014-01-01 00:10:16,559 WARN  
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  namenode.FSDirectory(1854): Could not get full path. Corresponding file 
 might have deleted already.
 2014-01-01 00:10:16,560 FATAL 
 [org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor@93935b]
  blockmanagement.BlockManager$ReplicationMonitor(3127): ReplicationMonitor 
 thread received Runtime exception. 
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.getFullPathName(FSDirectory.java:1871)
   at 
 org.apache.hadoop.hdfs.server.namenode.INode.getFullPathName(INode.java:482)
   at 
 org.apache.hadoop.hdfs.server.namenode.INodeFile.getName(INodeFile.java:316)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.chooseTarget(BlockPlacementPolicy.java:118)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1259)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1167)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3158)
   at 
 org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3112)
   at java.lang.Thread.run(Thread.java:724)
 {code}
 Looks like getRelativePathINodes() returned null but getFullPathName() didn't 
 check inodes against null, leading to NPE.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective


[ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864577#comment-13864577
 ] 

Hadoop QA commented on HDFS-5724:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621814/HDFS-5724.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5836//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5836//console

This message is automatically generated.

 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: caching
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5382) Implement the UI of browsing filesystems in HTML 5 page

2014-01-07 Thread Kihwal Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864591#comment-13864591
 ] 

Kihwal Lee commented on HDFS-5382:
--

bq. Since the same HTTP server serves both WebHDFS and the web UI, it seems to 
me that the right fix is to allow WebHDFS to use the customized auth filters as 
well.

This can be a bit complicated due to the limitation of HttpServer and WebHDFS 
compatibility. 

[~szetszwo]: Nicholas, what do you think about Haohui's proposal? 

 Implement the UI of browsing filesystems in HTML 5 page
 ---

 Key: HDFS-5382
 URL: https://issues.apache.org/jira/browse/HDFS-5382
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Fix For: 2.4.0

 Attachments: HDFS-5382.000.patch, HDFS-5382.001.patch, 
 HDFS-5382.002.patch, HDFS-5382.003.patch, browse-dir.png, file-info.png


 The UI of browsing filesystems can be implemented as an HTML 5 application. 
 The UI can pull the data from WebHDFS.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5612) NameNode: change all permission checks to enforce ACLs in addition to permissions.

2014-01-07 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-5612:


Attachment: HDFS-5612.2.patch

I'm attaching patch version 2 with the following changes:
# Rebased against current HDFS-4685 branch, which is up to date with trunk 
since yesterday.
# Corrected comment about sorting on {{FSPermissionChecker#checkAcl}} in 
reaction to recent changes on the finalized HDFS-5673 patch.
# Refactored several common methods to {{AclTestHelpers}}.


 NameNode: change all permission checks to enforce ACLs in addition to 
 permissions.
 --

 Key: HDFS-5612
 URL: https://issues.apache.org/jira/browse/HDFS-5612
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: HDFS ACLs (HDFS-4685)
Reporter: Chris Nauroth
Assignee: Chris Nauroth
 Attachments: HDFS-5612.1.patch, HDFS-5612.2.patch


 All {{NameNode}} code paths that enforce permissions must be updated so that 
 they also enforce ACLs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff


 [ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5715:


   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Thanks for the review Arpit! I've committed this to trunk.

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5704) Change OP_UPDATE_BLOCKS with a new OP_ADD_BLOCK


[ 
https://issues.apache.org/jira/browse/HDFS-5704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864695#comment-13864695
 ] 

Hadoop QA commented on HDFS-5704:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621827/HDFS-5704.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 4 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5837//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5837//console

This message is automatically generated.

 Change OP_UPDATE_BLOCKS  with a new OP_ADD_BLOCK
 

 Key: HDFS-5704
 URL: https://issues.apache.org/jira/browse/HDFS-5704
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Suresh Srinivas
Assignee: Jing Zhao
 Attachments: HDFS-5704.000.patch, HDFS-5704.001.patch, 
 HDFS-5704.002.patch, editsStored


 Currently every time a block a allocated, the entire list of blocks are 
 written in the editlog in OP_UPDATE_BLOCKS operation. This has n^2 growth 
 issue. The total size of editlog records for a file with large number of 
 blocks could be huge.
 The goal of this jira is discuss adding a different editlog record that only 
 records allocation of block and not the entire block list, on every block 
 allocation.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5725) Remove compression support from FSImage

2014-01-07 Thread Haohui Mai (JIRA)

Haohui Mai created HDFS-5725:


 Summary: Remove compression support from FSImage
 Key: HDFS-5725
 URL: https://issues.apache.org/jira/browse/HDFS-5725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


As proposed in HDFS-5722, this jira removes the support of compression in the 
FSImage format.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5725) Remove compression support from FSImage

2014-01-07 Thread Haohui Mai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai updated HDFS-5725:
-

Attachment: HDFS-5725.000.patch

 Remove compression support from FSImage
 ---

 Key: HDFS-5725
 URL: https://issues.apache.org/jira/browse/HDFS-5725
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HDFS-5725.000.patch


 As proposed in HDFS-5722, this jira removes the support of compression in the 
 FSImage format.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5677) Need error checking for HA cluster configuration

2014-01-07 Thread Konstantin Boudnik (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Boudnik updated HDFS-5677:
-

Fix Version/s: 2.3.0
   3.0.0

 Need error checking for HA cluster configuration
 

 Key: HDFS-5677
 URL: https://issues.apache.org/jira/browse/HDFS-5677
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, ha
Affects Versions: 2.0.6-alpha
 Environment: centos6.5, oracle jdk6 45, 
Reporter: Vincent Sheffer
Assignee: Vincent Sheffer
Priority: Minor
 Fix For: 3.0.0, 2.3.0


 If a node is declared in the *dfs.ha.namenodes.myCluster* but is _not_ later 
 defined in subsequent *dfs.namenode.servicerpc-address.myCluster.nodename* or 
 *dfs.namenode.rpc-address.myCluster.XXX* properties no error or warning 
 message is provided to indicate that.
 The only indication of a problem is a log message like the following:
 {code}
 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to 
 server: myCluster:8020
 {code}
 Another way to look at this is that no error or warning is provided when a 
 servicerpc-address/rpc-address property is defined for a node without a 
 corresponding node declared in *dfs.ha.namenodes.myCluster*.
 This arose when I had a typo in the *dfs.ha.namenodes.myCluster* property for 
 one of my node names.  It would be very helpful to have at least a warning 
 message on startup if there is a configuration problem like this.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage

[
https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864735#comment-13864735
]

Colin Patrick McCabe commented on HDFS-5722:

Can you be more specific about how on-disk compression might not fit well with
the new design of the fsimage?

As far as I know, the FSImage is always loaded in sequential order, from start
to finish. Having optional protobuf fields doesn't change that fact. In
general, it is not possible to skip forward by an arbitrary number of protobuf
types, since you don't know in advance how big each one is.

Sorry if there's part of the discussion I missed, but I don't see any
discussion about making the FSImage seekable in HDFS-5698.

Implement compression in the HTTP server of SNN / SBN instead of FSImage

Key: HDFS-5722
URL: https://issues.apache.org/jira/browse/HDFS-5722
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Haohui Mai

The current FSImage format support compression, there is a field in the
header which specifies the compression codec used to compress the data in the
image. The main motivation was to reduce the number of bytes to be
transferred between SNN / SBN / NN.
The main disadvantage, however, is that it requires the client to access the
FSImage in strictly sequential order. This might not fit well with the new
design of FSImage. For example, serializing the data in protobuf allows the
client to quickly skip data that it does not understand. The compression
built-in the format, however, complicates the calculation of offsets and
lengths. Recovering from a corrupted, compressed FSImage is also non-trivial
as off-the-shelf tools like bzip2recover is inapplicable.
This jira proposes to move the compression from the format of the FSImage to
the transport layer, namely, the HTTP server of SNN / SBN. This design
simplifies the format of FSImage, opens up the opportunity to quickly
navigate through the FSImage, and eases the process of recovery. It also
retains the benefits of reducing the number of bytes to be transferred across
the wire since there are compression on the transport layer.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective


 [ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5724:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

committed to trunk.  thanks, Uma.

 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: caching
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5579) Under construction files make DataNode decommission take very long hours


[ 
https://issues.apache.org/jira/browse/HDFS-5579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864749#comment-13864749
 ] 

Jing Zhao commented on HDFS-5579:
-

The latest patch looks good to me. The only comment is that the following 
comment from Vinay has not been addressed? +1 after fixing this.
{quote}
4. +  underReplicatedInOpenFiles++;
This should be incremented only if enough replicas are not there.
{quote}

 Under construction files make DataNode decommission take very long hours
 

 Key: HDFS-5579
 URL: https://issues.apache.org/jira/browse/HDFS-5579
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 1.2.0, 2.2.0
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
 Attachments: HDFS-5579-branch-1.2.patch, HDFS-5579.patch


 We noticed that some times decommission DataNodes takes very long time, even 
 exceeds 100 hours.
 After check the code, I found that in 
 BlockManager:computeReplicationWorkForBlocks(ListListBlock 
 blocksToReplicate) it won't replicate blocks which belongs to under 
 construction files, however in 
 BlockManager:isReplicationInProgress(DatanodeDescriptor srcNode), if there  
 is block need replicate no matter whether it belongs to under construction or 
 not, the decommission progress will continue running.
 That's the reason some time the decommission takes very long time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff

2014-01-07 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864768#comment-13864768
 ] 

Karthik Kambatla commented on HDFS-5715:


Looks like this breaks mvn clean install -DskipTests fails after this patch. 
[~jingzhao] - can you look into it? 

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5724) modifyCacheDirective logging audit log command wrongly as addCacheDirective


[ 
https://issues.apache.org/jira/browse/HDFS-5724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864800#comment-13864800
 ] 

Hudson commented on HDFS-5724:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4971 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4971/])
HDFS-5724. modifyCacheDirective logging audit log command wrongly as 
addCacheDirective (Uma Maheswara Rao G via Colin Patrick McCabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556386)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java


 modifyCacheDirective logging audit log command wrongly as addCacheDirective
 ---

 Key: HDFS-5724
 URL: https://issues.apache.org/jira/browse/HDFS-5724
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G
Priority: Minor
  Labels: caching
 Attachments: HDFS-5724.patch


 modifyCacheDirective:
 {code}
  if (isAuditEnabled()  isExternalInvocation()) {
 logAuditEvent(success, addCacheDirective, null, null, null);
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down


 [ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5649:
-

Attachment: HDFS-5649.002.patch

 Unregister NFS and Mount service when NFS gateway is shutting down
 --

 Key: HDFS-5649
 URL: https://issues.apache.org/jira/browse/HDFS-5649
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch


 The services should be unregistered if the gateway is asked to shutdown 
 gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864823#comment-13864823
 ] 

Jing Zhao commented on HDFS-5715:
-

It works fine on my machine. What's the error msg in your build?

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Assigned] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


 [ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HDFS-5721:


Assignee: Ted Yu

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor

 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


 [ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5721:
-

Attachment: hdfs-5721-v1.txt

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


 [ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5721:
-

Status: Patch Available  (was: Open)

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down


[ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864844#comment-13864844
 ] 

Hadoop QA commented on HDFS-5649:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621874/HDFS-5649.002.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5839//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5839//console

This message is automatically generated.

 Unregister NFS and Mount service when NFS gateway is shutting down
 --

 Key: HDFS-5649
 URL: https://issues.apache.org/jira/browse/HDFS-5649
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch


 The services should be unregistered if the gateway is asked to shutdown 
 gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down


[ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864850#comment-13864850
 ] 

Brandon Li commented on HDFS-5649:
--

I've manually tested the patch and see the service was unregistered when the 
services were shut down.

 Unregister NFS and Mount service when NFS gateway is shutting down
 --

 Key: HDFS-5649
 URL: https://issues.apache.org/jira/browse/HDFS-5649
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch


 The services should be unregistered if the gateway is asked to shutdown 
 gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff

2014-01-07 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864853#comment-13864853
 ] 

Karthik Kambatla commented on HDFS-5715:


Interesting! maven - 3.0.3, jdk - 1.7.0_40

{noformat}
[INFO] -
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[32,48]
 OutputFormat is internal proprietary API and may be removed in a future release
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[33,48]
 XMLSerializer is internal proprietary API and may be removed in a future 
release
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
 error: snapshotId has private access in AbstractINodeDiff
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,4]
 OutputFormat is internal proprietary API and may be removed in a future release
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[55,33]
 OutputFormat is internal proprietary API and may be removed in a future release
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,4]
 XMLSerializer is internal proprietary API and may be removed in a future 
release
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/offlineEditsViewer/XmlEditsVisitor.java:[59,35]
 XMLSerializer is internal proprietary API and may be removed in a future 
release
[INFO] 7 errors 
{noformat}

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff

2014-01-07 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864856#comment-13864856
 ] 

Karthik Kambatla commented on HDFS-5715:


Just verified it builds fine with JDK6. To make sure it is this JIRA, I dropped 
the commit and it builds fine against JDK7 as well.

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864860#comment-13864860
 ] 

Jing Zhao commented on HDFS-5715:
-

I see. Let me check with JDK7.

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864862#comment-13864862
 ] 

Colin Patrick McCabe commented on HDFS-5715:


This has broken the build for me as well.  It's not your fault, it passed 
Jenkins and seems to be fine on JDK6.  It seems like we need to start talking 
about JDK7 build slaves.  In the meantime, can we get a fix or revert?

The issue is this:
{code}
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
 error: snapshotId has private access in AbstractINodeDiff
{code}

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down


[ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864870#comment-13864870
 ] 

Brandon Li commented on HDFS-5649:
--

Thank you, Jing, for the review. I've committed the patch.

 Unregister NFS and Mount service when NFS gateway is shutting down
 --

 Key: HDFS-5649
 URL: https://issues.apache.org/jira/browse/HDFS-5649
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch


 The services should be unregistered if the gateway is asked to shutdown 
 gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down


[ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864883#comment-13864883
 ] 

Hudson commented on HDFS-5649:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4972 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4972/])
HDFS-5649. Unregister NFS and Mount service when NFS gateway is shutting down. 
Contributed by Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556405)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/portmap/PortmapRequest.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/DFSClientCache.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 Unregister NFS and Mount service when NFS gateway is shutting down
 --

 Key: HDFS-5649
 URL: https://issues.apache.org/jira/browse/HDFS-5649
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch


 The services should be unregistered if the gateway is asked to shutdown 
 gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down


 [ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5649:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 Unregister NFS and Mount service when NFS gateway is shutting down
 --

 Key: HDFS-5649
 URL: https://issues.apache.org/jira/browse/HDFS-5649
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.3.0

 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch


 The services should be unregistered if the gateway is asked to shutdown 
 gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864894#comment-13864894
 ] 

Jing Zhao commented on HDFS-5715:
-

Let me create a new jira to fix it. Thanks Karthik and Colin!

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5649) Unregister NFS and Mount service when NFS gateway is shutting down


 [ 
https://issues.apache.org/jira/browse/HDFS-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-5649:
-

Fix Version/s: 2.3.0

 Unregister NFS and Mount service when NFS gateway is shutting down
 --

 Key: HDFS-5649
 URL: https://issues.apache.org/jira/browse/HDFS-5649
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 3.0.0
Reporter: Brandon Li
Assignee: Brandon Li
 Fix For: 2.3.0

 Attachments: HDFS-5649.001.patch, HDFS-5649.002.patch


 The services should be unregistered if the gateway is asked to shutdown 
 gracefully.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7

Jing Zhao created HDFS-5726:
---

 Summary: Fix compilation error in AbstractINodeDiff for JDK7
 Key: HDFS-5726
 URL: https://issues.apache.org/jira/browse/HDFS-5726
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Jing Zhao
Assignee: Jing Zhao


HDFS-5715 breaks JDK7 build for the following error:
{code}
[ERROR] 
/home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
 error: snapshotId has private access in AbstractINodeDiff
{code}

This jira will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5715) Use Snapshot ID to indicate the corresponding Snapshot for a FileDiff/DirectoryDiff


[ 
https://issues.apache.org/jira/browse/HDFS-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864898#comment-13864898
 ] 

Jing Zhao commented on HDFS-5715:
-

Created HDFS-5726 for this.

 Use Snapshot ID to indicate the corresponding Snapshot for a 
 FileDiff/DirectoryDiff
 ---

 Key: HDFS-5715
 URL: https://issues.apache.org/jira/browse/HDFS-5715
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Jing Zhao
Assignee: Jing Zhao
 Fix For: 3.0.0

 Attachments: HDFS-5715.000.patch, HDFS-5715.001.patch, 
 HDFS-5715.002.patch


 Currently FileDiff and DirectoryDiff both contain a snapshot object reference 
 to indicate its associated snapshot. Instead, we can simply record the 
 corresponding snapshot id there. This can simplify some logic and allow us to 
 use a byte array to represent the snapshot feature (HDFS-5714).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7


 [ 
https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5726:


Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

 Fix compilation error in AbstractINodeDiff for JDK7
 ---

 Key: HDFS-5726
 URL: https://issues.apache.org/jira/browse/HDFS-5726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5726.000.patch


 HDFS-5715 breaks JDK7 build for the following error:
 {code}
 [ERROR] 
 /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
  error: snapshotId has private access in AbstractINodeDiff
 {code}
 This jira will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7


 [ 
https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5726:


Attachment: HDFS-5726.000.patch

I tried this patch in my local machine (Oracle JDK7 + OSX) and it works for me.

 Fix compilation error in AbstractINodeDiff for JDK7
 ---

 Key: HDFS-5726
 URL: https://issues.apache.org/jira/browse/HDFS-5726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
 Attachments: HDFS-5726.000.patch


 HDFS-5715 breaks JDK7 build for the following error:
 {code}
 [ERROR] 
 /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
  error: snapshotId has private access in AbstractINodeDiff
 {code}
 This jira will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7

2014-01-07 Thread Tsz Wo (Nicholas), SZE (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-5726:
-

Priority: Minor  (was: Major)
Hadoop Flags: Reviewed

+1 patch looks good.

 Fix compilation error in AbstractINodeDiff for JDK7
 ---

 Key: HDFS-5726
 URL: https://issues.apache.org/jira/browse/HDFS-5726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5726.000.patch


 HDFS-5715 breaks JDK7 build for the following error:
 {code}
 [ERROR] 
 /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
  error: snapshotId has private access in AbstractINodeDiff
 {code}
 This jira will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5138) Support HDFS upgrade in HA

[
https://issues.apache.org/jira/browse/HDFS-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864971#comment-13864971
]

Aaron T. Myers commented on HDFS-5138:
--

Hi Suresh, hopefully the below answers to your questions clears things up.
Please let me know if anything is unclear, or if you have any more questions.

bq. Can you describe what is the difference between this vs older version of
finalize? The command -finalize is fairly well know and this change will be
backward compatible.

I'm guessing you mean incompatible here. There's no this vs. older version
of finalize. For as long as I can remember, we've always supported two ways of
finalizing an upgrade: either by shutting down the running NN and then starting
it again with the -finalize startup option, or by just running `hdfs dfsadmin
-finalizeUpgrade' which makes an RPC to a running NN. The trouble with the
startup option in an HA scenario is that an NN can't guarantee that it will be
active at the time it starts, since determining who is active and who is
standby is handled externally to the NN. I don't see any reason to prefer using
the startup option even in a non-HA setup, so seemed like we could remove it
here. I could certainly just remove support for it in the HA case, if you'd
prefer.

bq. Sorry I am not sure I understand this. Why does HA rollback become more
difficult?

In the case of the '-upgrade' flag it's reasonable to only do the upgrade on
transition to active, since we have to load the current fsimage/edits anyway
before doing the upgrade, and the act of upgrading moves the transaction ID
forward. In the case of '-rollback', however, it doesn't make much sense to
start up in the standby state, load the full fsimage/edits, and then roll back
everything, and reload the old fsimage upon becoming active. Given that the act
of rolling back does not require loading the fsimage/edits at all, just moving
some directories around, seems to make sense to me that this should not be a
mode but rather just a standalone command that runs and then exits.

bq. Why is the lock file required? Why cannot NN just write an editlog about
upgrade intent, including the new layout version? During rollback we can
discount the editlog starting from the upgrade intent log. Infact we can also
consider requiring users to save namespace with empty editlogs? With this,
perhaps we can avoid the following:

This is again because an HA NN that is just starting up should not be writing
to the shared log, but two HA NNs that are starting up need to
synchronize/agree on the new CTime to use during upgrade. This needs to be
known before doing the saveNamespace which is part of the upgrade process. You
could imagine writing the new CTime to the edit log upon transitioning to the
active state, but this would require the NNs to do the saveNamespace upon
transitioning to active and/or when reading from the shared log as part of
being the standby. It seems quite problematic to do the long, blocking
operation of writing out a potentially large fsimage file in either of these
places.

bq. You mean finalize of the shared log in above?

Yep, sure did. My bad. :)

Support HDFS upgrade in HA
--

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


[ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864978#comment-13864978
 ] 

Hadoop QA commented on HDFS-5721:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621877/hdfs-5721-v1.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.ha.TestHASafeMode

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5840//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5840//console

This message is automatically generated.

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


[ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864993#comment-13864993
 ] 

Ted Yu commented on HDFS-5721:
--

I ran TestHASafeMode locally on Mac but didn't reproduce the test failure.

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5318) Pluggable interface for replica counting

2014-01-07 Thread Arpit Agarwal (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864999#comment-13864999
 ] 

Arpit Agarwal commented on HDFS-5318:
-

Hi Eric, I took a look at pipeline recovery today. It looks like the following 
cases are of interest:
# Block is finalized, r/w replica is lost, r/o replica is available. In this 
case the existing NN replication mechanisms will cause an extra replica to be 
created (q. what happens if a client attempts to append before the replication 
happens? Client probably needs to be fixed to handle this).
# Block is RBW. r/w replica is lost, r/o replica is available. In the usual 
case DFSClientOutputStream will recover the write pipeline by selecting another 
DN, transferring block contents to the new DN and inserting it in the write 
pipeline. However pipeline recovery will not work when the single replica in 
the pipeline is lost, as you guys already mentioned on HDFS-5318. I think you 
can use either the client side setting or block placement policy option in that 
case that is being discussed there.

Updating the suggested approach:
# Each DataNode presents a different StorageID for the same physical storage.
# Read-only replicas are not counted towards satisfying the replication factor. 
This assumes that read-only replicas are 'shared' (i.e. what you called using 
writability of a replica as a proxy for deducing whether or not that replica 
is shared).
# Read-only replicas cannot be pruned (follows from (2)).
# Client should be able to bootstrap a write pipeline with read-only replicas.
# Read-only storages will not be used for block placement.

I am not sure if there are any special conditions wrt lease recovery that also 
need to be considered.

 Pluggable interface for replica counting
 

 Key: HDFS-5318
 URL: https://issues.apache.org/jira/browse/HDFS-5318
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.0
Reporter: Eric Sirianni
 Attachments: HDFS-5318.patch, hdfs-5318.pdf


 There are several use cases for using shared-storage for datanode block 
 storage in an HDFS environment (storing cold blocks on a NAS device, Amazon 
 S3, etc.).
 With shared-storage, there is a distinction between:
 # a distinct physical copy of a block
 # an access-path to that block via a datanode.  
 A single 'replication count' metric cannot accurately capture both aspects.  
 However, for most of the current uses of 'replication count' in the Namenode, 
 the number of physical copies aspect seems to be the appropriate semantic.
 I propose altering the replication counting algorithm in the Namenode to 
 accurately infer distinct physical copies in a shared storage environment.  
 With HDFS-5115, a {{StorageID}} is a UUID.  I propose associating some minor 
 additional semantics to the {{StorageID}} - namely that multiple datanodes 
 attaching to the same physical shared storage pool should report the same 
 {{StorageID}} for that pool.  A minor modification would be required in the 
 DataNode to enable the generation of {{StorageID}} s to be pluggable behind 
 the {{FsDatasetSpi}} interface.  
 With those semantics in place, the number of physical copies of a block in a 
 shared storage environment can be calculated as the number of _distinct_ 
 {{StorageID}} s associated with that block.
 Consider the following combinations for two {{(DataNode ID, Storage ID)}} 
 pairs {{(DN_A, S_A) (DN_B, S_B)}} for a given block B:
 * {{DN_A != DN_B  S_A != S_B}} - *different* access paths to *different* 
 physical replicas (i.e. the traditional HDFS case with local disks)
 ** rarr; Block B has {{ReplicationCount == 2}}
 * {{DN_A != DN_B  S_A == S_B}} - *different* access paths to the *same* 
 physical replica (e.g. HDFS datanodes mounting the same NAS share)
 ** rarr; Block B has {{ReplicationCount == 1}}
 For example, if block B has the following location tuples:
 * {{DN_1, STORAGE_A}}
 * {{DN_2, STORAGE_A}}
 * {{DN_3, STORAGE_B}}
 * {{DN_4, STORAGE_B}},
 the effect of this proposed change would be to calculate the replication 
 factor in the namenode as *2* instead of *4*.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7


[ 
https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865006#comment-13865006
 ] 

Hadoop QA commented on HDFS-5726:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621893/HDFS-5726.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5841//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5841//console

This message is automatically generated.

 Fix compilation error in AbstractINodeDiff for JDK7
 ---

 Key: HDFS-5726
 URL: https://issues.apache.org/jira/browse/HDFS-5726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Attachments: HDFS-5726.000.patch


 HDFS-5715 breaks JDK7 build for the following error:
 {code}
 [ERROR] 
 /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
  error: snapshotId has private access in AbstractINodeDiff
 {code}
 This jira will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns

2014-01-07 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865017#comment-13865017
 ] 

Junping Du commented on HDFS-5721:
--

Thanks Ted for the patch! I think TestHASafeMode is unrelated as I saw this get 
failed intermittent. 
However, would you mind to fix other similar issues for using FSImage in this 
JIRA? I did a quick search and found following issues:
NameNode.java line 818  fsImage is created and formated but not get closed if 
any exceptions are thrown.
FSNameSystem.java line 603, fsImage is created and loaded to namesystem but not 
get closed if anything wrong.
BootstrapStandby.java line 192. image is created and initialized but not get 
closed if exceptions are thrown.

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7


 [ 
https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-5726:


   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the review, Nicholas! I've committed this to trunk.

 Fix compilation error in AbstractINodeDiff for JDK7
 ---

 Key: HDFS-5726
 URL: https://issues.apache.org/jira/browse/HDFS-5726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5726.000.patch


 HDFS-5715 breaks JDK7 build for the following error:
 {code}
 [ERROR] 
 /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
  error: snapshotId has private access in AbstractINodeDiff
 {code}
 This jira will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5726) Fix compilation error in AbstractINodeDiff for JDK7


[ 
https://issues.apache.org/jira/browse/HDFS-5726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865024#comment-13865024
 ] 

Hudson commented on HDFS-5726:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #4973 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/4973/])
HDFS-5726. Fix compilation error in AbstractINodeDiff for JDK7. Contributed by 
Jing Zhao. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1556433)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java


 Fix compilation error in AbstractINodeDiff for JDK7
 ---

 Key: HDFS-5726
 URL: https://issues.apache.org/jira/browse/HDFS-5726
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao
Priority: Minor
 Fix For: 3.0.0

 Attachments: HDFS-5726.000.patch


 HDFS-5715 breaks JDK7 build for the following error:
 {code}
 [ERROR] 
 /home/kasha/code/hadoop-trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/AbstractINodeDiff.java:[134,53]
  error: snapshotId has private access in AbstractINodeDiff
 {code}
 This jira will fix the issue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


 [ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5721:
-

Attachment: hdfs-5721-v2.txt

Patch v2 addresses Junping's comments.

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5722) Implement compression in the HTTP server of SNN / SBN instead of FSImage

[
https://issues.apache.org/jira/browse/HDFS-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865040#comment-13865040
]

Colin Patrick McCabe commented on HDFS-5722:

bq. The design requires putting offset and length in the FSImage, and having
compression inside the file makes things difficult. Therefore this jira
proposes to move compression from FSImage to the higher-level application logic.

I don't see why having compression makes things difficult. If the software
wants to skip an N byte section did doesn't understand, it just asks the
{{CompressedStream}} to skip N bytes. The stream takes care of the details of
translating that into byte offsets in the file. It may be more efficient to do
this when compression is not enabled, but that is no reason to break the
configurations of users who do have compression enabled now.

I like the idea of implementing compression in the HTTP server code. But I
don't see why we need to remove a feature that people are using, the on-disk
FSImage compression feature. Possibly we should deprecate this feature, since
HTTP compression is better for most use cases.

Implement compression in the HTTP server of SNN / SBN instead of FSImage

Key: HDFS-5722
URL: https://issues.apache.org/jira/browse/HDFS-5722
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Haohui Mai

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


 [ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5721:
-

Attachment: (was: hdfs-5721-v2.txt)

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


 [ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HDFS-5721:
-

Attachment: hdfs-5721-v2.txt

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5727) introduce a self-maintain io queue handling mechanism

2014-01-07 Thread Liang Xie (JIRA)

Liang Xie created HDFS-5727:
---

 Summary: introduce a self-maintain io queue handling mechanism
 Key: HDFS-5727
 URL: https://issues.apache.org/jira/browse/HDFS-5727
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie


Currently the datanode read/write SLA is dfficult to be ganranteed for HBase 
online requirement. One of major reasons is we don't support io priority or io 
reqeust reorder inside datanode.
I proposal introducing a self-maintain io queue mechanism to handle io request 
priority. Image there're lots of concurrent read/write reqeust from HBase side, 
and a background datanode block scanner is running(default is every 21 days, 
IIRC) just in time, then the HBase read/write 99% or 99.9% percentile latency 
would be vulnerable despite we have a bg thread throttling...
the reorder stuf i have not thought clearly enough, but definitely the reorder 
in the queue in the app side would beat the currently relying OS's io queue 
merge.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5727) introduce a self-maintain io queue handling mechanism

2014-01-07 Thread Liang Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865046#comment-13865046
 ] 

Liang Xie commented on HDFS-5727:
-

so far no design doc available, just put the raw thought here as a placeholder, 
hopefully start to work on it 3~4 weeks later due to other higher priority 
issues need be done these days.

 introduce a self-maintain io queue handling mechanism
 -

 Key: HDFS-5727
 URL: https://issues.apache.org/jira/browse/HDFS-5727
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie

 Currently the datanode read/write SLA is dfficult to be ganranteed for HBase 
 online requirement. One of major reasons is we don't support io priority or 
 io reqeust reorder inside datanode.
 I proposal introducing a self-maintain io queue mechanism to handle io 
 request priority. Image there're lots of concurrent read/write reqeust from 
 HBase side, and a background datanode block scanner is running(default is 
 every 21 days, IIRC) just in time, then the HBase read/write 99% or 99.9% 
 percentile latency would be vulnerable despite we have a bg thread 
 throttling...
 the reorder stuf i have not thought clearly enough, but definitely the 
 reorder in the queue in the app side would beat the currently relying OS's io 
 queue merge.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block

Vinay created HDFS-5728:
---

 Summary: [Diskfull] Block recovery will fail if the metafile not 
having crc for all chunks of the block
 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay


1. Client (regionsever) has opened stream to write its WAL to HDFS. This is not 
one time upload, data will be written slowly.
2. One of the DataNode got diskfull ( due to some other data filled up disks)
3. Unfortunately block was being written to only this datanode in cluster, so 
client write has also failed.

4. After some time disk is made free and all processes are restarted.
5. Now HMaster try to recover the file by calling recoverLease. 
At this time recovery was failing saying file length mismatch.

When checked,
 actual block file length: 62484480
 Calculated block length: 62455808

This was because, metafile was having crc for only 62455808 bytes, and it 
considered 62455808 as the block size.

No matter how many times, recovery was continously failing.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block


[ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865059#comment-13865059
 ] 

Vinay commented on HDFS-5728:
-

2013-12-28 13:22:30,467 WARN 
org.apache.hadoop.hdfs.server.protocol.InterDatanodeProtocol: Failed to 
updateBlock 
(newblock=BP-720706819-x-1389113739092:blk_5575900364052391670_517444, 
datanode=tmm-e8:11242)
java.io.IOException: File length mismatched.  The length of 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670
 is 62484480 but r=ReplicaUnderRecovery, blk_5575900364052391670_320295, RUR
  getNumBytes() = 62455808
  getBytesOnDisk()  = 62455808
  getVisibleLength()= -1
  getVolume()   = 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current
  getBlockFile()= 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670
  recoveryId=517444
  original=ReplicaWaitingToBeRecovered, blk_5575900364052391670_320295, RWR
  getNumBytes() = 62455808
  getBytesOnDisk()  = 62455808
  getVisibleLength()= -1
  getVolume()   = 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current
  getBlockFile()= 
/usr/local/hadoop/hadoop_data/dfs/data2/datanode/hadoop/dfs/data/current/BP-720706819-x-1389113739092/current/rbw/blk_5575900364052391670
  unlinked=false
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkReplicaFiles(FsDatasetImpl.java:1063)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.updateReplicaUnderRecovery(FsDatasetImpl.java:1541)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.updateReplicaUnderRecovery(DataNode.java:1907)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$BlockRecord.updateReplicaUnderRecovery(DataNode.java:1938)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:2090)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1988)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.access$400(DataNode.java:225)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$2.run(DataNode.java:1869)

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay

 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade

2014-01-07 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865062#comment-13865062
 ] 

Ming Ma commented on HDFS-5585:
---

Interesting. Should client ask DN to do the upgrade via ClientDatanodeProtocol, 
or should client ask NN to do the upgrade via refreshNodes approach and NN ask 
DNs after that? The nice thing about going through NN is NN has the state and 
is able to decide the order in which DNs are restarted to minimize the impact 
on write and read operation.

 Provide admin commands for data node upgrade
 

 Key: HDFS-5585
 URL: https://issues.apache.org/jira/browse/HDFS-5585
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode, ha, hdfs-client, namenode
Reporter: Kihwal Lee

 Several new methods to ClientDatanodeProtocol may need to be added to support 
 querying version, initiating upgrade, etc.  The admin CLI needs to be added 
 as well. This primary use case is for rolling upgrade, but this can be used 
 for preparing for a graceful restart of a data node for any reasons.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block


 [ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5728:


Status: Patch Available  (was: Open)

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block


 [ 
https://issues.apache.org/jira/browse/HDFS-5728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay updated HDFS-5728:


Attachment: HDFS-5728.patch

Attached the patch, please review

 [Diskfull] Block recovery will fail if the metafile not having crc for all 
 chunks of the block
 --

 Key: HDFS-5728
 URL: https://issues.apache.org/jira/browse/HDFS-5728
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Vinay
Assignee: Vinay
 Attachments: HDFS-5728.patch


 1. Client (regionsever) has opened stream to write its WAL to HDFS. This is 
 not one time upload, data will be written slowly.
 2. One of the DataNode got diskfull ( due to some other data filled up disks)
 3. Unfortunately block was being written to only this datanode in cluster, so 
 client write has also failed.
 4. After some time disk is made free and all processes are restarted.
 5. Now HMaster try to recover the file by calling recoverLease. 
 At this time recovery was failing saying file length mismatch.
 When checked,
  actual block file length: 62484480
  Calculated block length: 62455808
 This was because, metafile was having crc for only 62455808 bytes, and it 
 considered 62455808 as the block size.
 No matter how many times, recovery was continously failing.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns


[ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865125#comment-13865125
 ] 

Hadoop QA commented on HDFS-5721:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12621915/hdfs-5721-v2.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/5842//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/5842//console

This message is automatically generated.

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HDFS-4273) Fix some issue in DFSInputstream

2014-01-07 Thread Binglin Chang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Binglin Chang updated HDFS-4273:


Attachment: HDFS-4273.v8.patch

Update patch to remove expiring local deadNodes related changes. Will create 
another jira addressing it.

 Fix some issue in DFSInputstream
 

 Key: HDFS-4273
 URL: https://issues.apache.org/jira/browse/HDFS-4273
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.0.2-alpha
Reporter: Binglin Chang
Assignee: Binglin Chang
Priority: Minor
 Attachments: HDFS-4273-v2.patch, HDFS-4273.patch, HDFS-4273.v3.patch, 
 HDFS-4273.v4.patch, HDFS-4273.v5.patch, HDFS-4273.v6.patch, 
 HDFS-4273.v7.patch, HDFS-4273.v8.patch, TestDFSInputStream.java


 Following issues in DFSInputStream are addressed in this jira:
 1. read may not retry enough in some cases cause early failure
 Assume the following call logic
 {noformat} 
 readWithStrategy()
   - blockSeekTo()
   - readBuffer()
  - reader.doRead()
  - seekToNewSource() add currentNode to deadnode, wish to get a 
 different datanode
 - blockSeekTo()
- chooseDataNode()
   - block missing, clear deadNodes and pick the currentNode again
 seekToNewSource() return false
  readBuffer() re-throw the exception quit loop
 readWithStrategy() got the exception,  and may fail the read call before 
 tried MaxBlockAcquireFailures.
 {noformat} 
 2. In multi-threaded scenario(like hbase), DFSInputStream.failures has race 
 condition, it is cleared to 0 when it is still used by other thread. So it is 
 possible that  some read thread may never quit. Change failures to local 
 variable solve this issue.
 3. If local datanode is added to deadNodes, it will not be removed from 
 deadNodes if DN is back alive. We need a way to remove local datanode from 
 deadNodes when the local datanode is become live.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5535) Umbrella jira for improved HDFS rolling upgrades

2014-01-07 Thread Ming Ma (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865141#comment-13865141
 ] 

Ming Ma commented on HDFS-5535:
---

Nice work. Some comments:

1. HDFS Configuration update is another scenario; it could be different from 
code upgrade in terms of the design. For example, this requirement could mean 
if we can support DN dynamic config reload to handle certain config changes, no 
DN restart is required.
2. The write pipeline pause and resume approach is interesting as NN isn’t 
involved. One scenario similar to DN rolling upgrade is top-of-rack switch 
upgrade for 30 minutes. During the 30 minutes, we don’t want NN to consider DNs 
dead and trigger replication. For this specific scenario, write pipeline pause 
and resume approach might not be enough.

 Umbrella jira for improved HDFS rolling upgrades
 

 Key: HDFS-5535
 URL: https://issues.apache.org/jira/browse/HDFS-5535
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, ha, hdfs-client, namenode
Affects Versions: 3.0.0, 2.2.0
Reporter: Nathan Roberts
 Attachments: HDFSRollingUpgradesHighLevelDesign.pdf


 In order to roll a new HDFS release through a large cluster quickly and 
 safely, a few enhancements are needed in HDFS. An initial High level design 
 document will be attached to this jira, and sub-jiras will itemize the 
 individual tasks.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5721) sharedEditsImage in Namenode#initializeSharedEdits() should be closed before method returns

2014-01-07 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-5721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865142#comment-13865142
 ] 

Junping Du commented on HDFS-5721:
--

Thanks Ted for the patch! 
v2 patch looks good overall and only issue is we should replace *System.out* 
below with using LOG.
{code}
+  System.out.println(Encountered exception during format:  + ioe);
{code}
+1 when this issues is addressed.

 sharedEditsImage in Namenode#initializeSharedEdits() should be closed before 
 method returns
 ---

 Key: HDFS-5721
 URL: https://issues.apache.org/jira/browse/HDFS-5721
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
Priority: Minor
 Attachments: hdfs-5721-v1.txt, hdfs-5721-v2.txt


 At line 901:
 {code}
   FSImage sharedEditsImage = new FSImage(conf,
   Lists.URInewArrayList(),
   sharedEditsDirs);
 {code}
 sharedEditsImage is not closed before the method returns.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HDFS-5728) [Diskfull] Block recovery will fail if the metafile not having crc for all chunks of the block