[jira] Commented: (HDFS-884) DataNode makeInstance should report the directory list when failing to start up

2011-01-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979538#action_12979538
 ] 

Steve Loughran commented on HDFS-884:
-

looks good, though I'm not sure we need the assert statement given that the 
constructor does the same check and includes the list of invalid dirs. All the 
assert will do is fail early on assert enabled (test) runs, so reducing 
coverage of the constructor itself.

This patch will obsolete HDFS-890, which didn't have any code associated with 
it anyway.

 DataNode makeInstance should report the directory list when failing to start 
 up
 ---

 Key: HDFS-884
 URL: https://issues.apache.org/jira/browse/HDFS-884
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-884.patch, HDFS-884.patch, InvalidDirs.patch, 
 InvalidDirs.patch


 When {{Datanode.makeInstance()}} cannot work with one of the directories in 
 dfs.data.dir, it logs this at warn level (while losing the stack trace). 
 It should include the nested exception for better troubleshooting. Then, when 
 all dirs in the list fail, an exception is thrown, but this exception does 
 not include the list of directories. It should list the absolute path of 
 every missing/failing directory, so that whoever sees the exception can see 
 where to start looking for problems: either the filesystem or the 
 configuration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-925) Make it harder to accidentally close a shared DFSClient

2011-01-10 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979557#action_12979557
 ] 

Steve Loughran commented on HDFS-925:
-

I'm not seeing this as a problem so much as I'm doing less hadoop work 
directly, and what I am doing is doing more in separate processes. I fear the 
problem still exists though.

 Make it harder to accidentally close a shared DFSClient
 ---

 Key: HDFS-925
 URL: https://issues.apache.org/jira/browse/HDFS-925
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.21.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HADOOP-5933.patch, HADOOP-5933.patch, HDFS-925.patch


 Every so often I get stack traces telling me that DFSClient is closed, 
 usually in {{org.apache.hadoop.hdfs.DFSClient.checkOpen() }} . The root cause 
 of this is usually that one thread has closed a shared fsclient while another 
 thread still has a reference to it. If the other thread then asks for a new 
 client it will get one -and the cache repopulated- but if has one already, 
 then I get to see a stack trace. 
 It's effectively a race condition between clients in different threads. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-10 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Status: Open  (was: Patch Available)

 Separate Storage from FSImage
 -

 Key: HDFS-1557
 URL: https://issues.apache.org/jira/browse/HDFS-1557
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.21.0
Reporter: Ivan Kelly
 Fix For: 0.22.0, 0.23.0

 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
 HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
 HDFS-1557.diff


 FSImage currently derives from Storage and FSEditLog has to call methods 
 directly on FSImage to access the filesystem. This JIRA is to separate the 
 Storage class out into NNStorage so that FSEditLog is less dependent on 
 FSImage. From this point, the other parts of the circular dependency should 
 be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1557) Separate Storage from FSImage

2011-01-10 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1557:
-

Status: Patch Available  (was: Open)

kicking for hudson

 Separate Storage from FSImage
 -

 Key: HDFS-1557
 URL: https://issues.apache.org/jira/browse/HDFS-1557
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: 0.21.0
Reporter: Ivan Kelly
 Fix For: 0.22.0, 0.23.0

 Attachments: HDFS-1557-branch-0.22.diff, HDFS-1557-branch-0.22.diff, 
 HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, HDFS-1557-trunk.diff, 
 HDFS-1557.diff


 FSImage currently derives from Storage and FSEditLog has to call methods 
 directly on FSImage to access the filesystem. This JIRA is to separate the 
 Storage class out into NNStorage so that FSEditLog is less dependent on 
 FSImage. From this point, the other parts of the circular dependency should 
 be easy to fix.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-835) TestDefaultNameNodePort.testGetAddressFromConf fails with an unsupported formate error

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-835:
-

Priority: Minor  (was: Blocker)

Aaron, moving this to minor (the same priority as the issue it depends on).

 TestDefaultNameNodePort.testGetAddressFromConf fails with an unsupported 
 formate error 
 ---

 Key: HDFS-835
 URL: https://issues.apache.org/jira/browse/HDFS-835
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: gary murry
Assignee: Aaron Kimball
Priority: Minor
 Attachments: HDFS-835.patch


 The current build fails on the TestDefaultNameNodePort.testGetAddressFromConf 
 unit test with the following error:
  FileSystem name 'foo' is provided in an unsupported format. (Try 
 'hdfs://foo' instead?)
 http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Hdfs-trunk/171/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1554) New semantics for recoverLease

2011-01-10 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang resolved HDFS-1554.
-

  Resolution: Fixed
Release Note: Change recoverLease API to return if the file is closed or 
not. It also change the semantics of recoverLease to start lease recovery 
immediately.
Hadoop Flags: [Incompatible change, Reviewed]

I've committed this. Thank Dhruba for reviewing the patch.

 New semantics for recoverLease
 --

 Key: HDFS-1554
 URL: https://issues.apache.org/jira/browse/HDFS-1554
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append, 0.22.0, 0.23.0

 Attachments: appendRecoverLease.patch, appendRecoverLease1.patch


 Current recoverLease API implemented in append 0.20 aims to provide a lighter 
 weight (comparing to using create/append) way to trigger a file's soft lease 
 expiration. From both the use case of hbase and scribe, it could have a 
 stronger semantics: revoking the file's lease, thus starting lease recovery 
 immediately.
 Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
 HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1529) Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-1529:
--

Fix Version/s: 0.22.0

Blocker for 0.22

 Incorrect handling of interrupts in waitForAckedSeqno can cause deadlock
 

 Key: HDFS-1529
 URL: https://issues.apache.org/jira/browse/HDFS-1529
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1529.txt, hdfs-1529.txt, hdfs-1529.txt, Test.java


 In HDFS-895 the handling of interrupts during hflush/close was changed to 
 preserve interrupt status. This ends up creating an infinite loop in 
 waitForAckedSeqno if the waiting thread gets interrupted, since Object.wait() 
 has a strange semantic that it doesn't give up the lock even momentarily if 
 the thread is already in interrupted state at the beginning of the call.
 We should decide what the correct behavior is here - if a thread is 
 interrupted while it's calling hflush() or close() should we (a) throw an 
 exception, perhaps InterruptedIOException (b) ignore, or (c) wait for the 
 flush to finish but preserve interrupt status on exit?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1186) 0.20: DNs should interrupt writers at start of recovery

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-1186:
--

Fix Version/s: 0.20-append

Likely only a blocker for 0.20 append branch.

 0.20: DNs should interrupt writers at start of recovery
 ---

 Key: HDFS-1186
 URL: https://issues.apache.org/jira/browse/HDFS-1186
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20-append
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.20-append

 Attachments: hdfs-1186.txt


 When block recovery starts (eg due to NN recovering lease) it needs to 
 interrupt any writers currently writing to those blocks. Otherwise, an old 
 writer (who hasn't realized he lost his lease) can continue to write+sync to 
 the blocks, and thus recovery ends up truncating data that has been sync()ed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-988) saveNamespace can corrupt edits log

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-988:
-

Fix Version/s: 0.22.0

This is committed to 0.20-append but need a unit test for trunk.

 saveNamespace can corrupt edits log
 ---

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Todd Lipcon
Priority: Blocker
 Fix For: 0.20-append, 0.22.0

 Attachments: hdfs-988.txt, saveNamespace.txt, 
 saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1496) TestStorageRestore is failing after HDFS-903 fix

2011-01-10 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979699#action_12979699
 ] 

Hairong Kuang commented on HDFS-1496:
-

I do not think that the storage directory restoration scheme introduced in 
HADOOP-4885 works well because it introduces inconsistent states among 
fsimage/edits directories. Each old good directory contains old image + old 
edit, but each restored directory contains new image with an empty edit log. 
This has the potential to corrupt fsimage if secondary NN happens to download 
the empty edit log from a newly restored edit log directory.

I could not figure out a better way to fix this problem. Is it OK that I 
disable this feature for now so that unit test could pass? Good that Dhruba 
already enhanced saveNameSpace in HDFS-1509 that could be used as an 
alternative to restore the failed image directories.

 TestStorageRestore is failing after HDFS-903 fix
 

 Key: HDFS-1496
 URL: https://issues.apache.org/jira/browse/HDFS-1496
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 0.22.0, 0.23.0
Reporter: Konstantin Boudnik
Assignee: Hairong Kuang
Priority: Blocker
 Fix For: 0.22.0


 TestStorageRestore seems to be failing after HDFS-903 commit. Running git 
 bisect confirms it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-900) Corrupt replicas are not tracked correctly through block report from DN

2011-01-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-900:
-

 Priority: Blocker  (was: Critical)
Fix Version/s: 0.22.0

In discussion with Nigel, we'd like to mark this as blocker pending further 
investigation. If we determine it's not a regression since 0.20 we'll downgrade 
priority.

 Corrupt replicas are not tracked correctly through block report from DN
 ---

 Key: HDFS-900
 URL: https://issues.apache.org/jira/browse/HDFS-900
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0

 Attachments: log-commented, to-reproduce.patch


 This one is tough to describe, but essentially the following order of events 
 is seen to occur:
 # A client marks one replica of a block to be corrupt by telling the NN about 
 it
 # Replication is then scheduled to make a new replica of this node
 # The replication completes, such that there are now 3 good replicas and 1 
 corrupt replica
 # The DN holding the corrupt replica sends a block report. Rather than 
 telling this DN to delete the node, the NN instead marks this as a new *good* 
 replica of the block, and schedules deletion on one of the good replicas.
 I don't know if this is a dataloss bug in the case of 1 corrupt replica with 
 dfs.replication=2, but it seems feasible. I will attach a debug log with some 
 commentary marked by '', plus a unit test patch which I can get 
 to reproduce this behavior reliably. (it's not a proper unit test, just some 
 edits to an existing one to show it)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1448) Create multi-format parser for edits logs file, support binary and XML formats initially

2011-01-10 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1448:
--

Fix Version/s: (was: 0.22.0)
   0.23.0

 Create multi-format parser for edits logs file, support binary and XML 
 formats initially
 

 Key: HDFS-1448
 URL: https://issues.apache.org/jira/browse/HDFS-1448
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.22.0
Reporter: Erik Steffl
Assignee: Erik Steffl
 Fix For: 0.23.0

 Attachments: editsStored, HDFS-1448-0.22-1.patch, 
 HDFS-1448-0.22-2.patch, HDFS-1448-0.22-3.patch, HDFS-1448-0.22-4.patch, 
 HDFS-1448-0.22-5.patch, HDFS-1448-0.22.patch, Viewer hierarchy.pdf


 Create multi-format parser for edits logs file, support binary and XML 
 formats initially.
 Parsing should work from any supported format to any other supported format 
 (e.g. from binary to XML and from XML to binary).
 The binary format is the format used by FSEditLog class to read/write edits 
 file.
 Primary reason to develop this tool is to help with troubleshooting, the 
 binary format is hard to read and edit (for human troubleshooters).
 Longer term it could be used to clean up and minimize parsers for fsimage and 
 edits files. Edits parser OfflineEditsViewer is written in a very similar 
 fashion to OfflineImageViewer. Next step would be to merge OfflineImageViewer 
 and OfflineEditsViewer and use the result in both FSImage and FSEditLog. This 
 is subject to change, specifically depending on adoption of avro (which would 
 completely change how objects are serialized as well as provide ways to 
 convert files to different formats).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1554) New semantics for recoverLease

2011-01-10 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979725#action_12979725
 ] 

Nigel Daley commented on HDFS-1554:
---

Hairong, can you please set the Fix Version correctly?  Thx.

 New semantics for recoverLease
 --

 Key: HDFS-1554
 URL: https://issues.apache.org/jira/browse/HDFS-1554
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append, 0.22.0, 0.23.0

 Attachments: appendRecoverLease.patch, appendRecoverLease1.patch


 Current recoverLease API implemented in append 0.20 aims to provide a lighter 
 weight (comparing to using create/append) way to trigger a file's soft lease 
 expiration. From both the use case of hbase and scribe, it could have a 
 stronger semantics: revoking the file's lease, thus starting lease recovery 
 immediately.
 Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
 HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1573) LeaseChecker thread name trace not that useful

2011-01-10 Thread Todd Lipcon (JIRA)
LeaseChecker thread name trace not that useful
--

 Key: HDFS-1573
 URL: https://issues.apache.org/jira/browse/HDFS-1573
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Trivial
 Fix For: 0.23.0


The LeaseChecker thread in DFSClient will put a stack trace in its thread name, 
theoretically to help debug cases where these threads get leaked. However it 
just shows the stack trace of whoever is asking for the thread's name, not the 
stack trace of when the thread was allocated. I'd like to fix this so that you 
can see where the thread got started, which was presumably its original intent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1125) Removing a datanode (failed or decommissioned) should not require a namenode restart

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-1125:
--

 Priority: Critical  (was: Blocker)
Fix Version/s: (was: 0.22.0)
   Issue Type: Improvement  (was: Bug)

At this point I don't see how this 6 month old unassigned issue is a blocker 
for 0.22.  I also think this is an improvement, not a bug.  Removing from 0.22 
blocker list.

 Removing a datanode (failed or decommissioned) should not require a namenode 
 restart
 

 Key: HDFS-1125
 URL: https://issues.apache.org/jira/browse/HDFS-1125
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 0.20.2
Reporter: Alex Loddengaard
Priority: Critical

 I've heard of several Hadoop users using dfsadmin -report to monitor the 
 number of dead nodes, and alert if that number is not 0.  This mechanism 
 tends to work pretty well, except when a node is decommissioned or fails, 
 because then the namenode requires a restart for said node to be entirely 
 removed from HDFS.  More details here:
 http://markmail.org/search/?q=decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode#query:decommissioned%20node%20showing%20up%20ad%20dead%20node%20in%20web%20based%09interface%20to%20namenode+page:1+mid:7gwqwdkobgfuszb4+state:results
 Removal from the exclude file and a refresh should get rid of the dead node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-1505:
--

Fix Version/s: 0.22.0

Hi Jakob, are you working on a patch for this for 0.22?  If so, many thanks!  
I'm going to mark this for 0.22.

 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Jakob Homan
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-test.txt


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1505) saveNamespace appears to succeed even if all directories fail to save

2011-01-10 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979774#action_12979774
 ] 

Jakob Homan commented on HDFS-1505:
---

Yes, I'm hoping to have a patch for this this week.

 saveNamespace appears to succeed even if all directories fail to save
 -

 Key: HDFS-1505
 URL: https://issues.apache.org/jira/browse/HDFS-1505
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Jakob Homan
Priority: Blocker
 Fix For: 0.22.0

 Attachments: hdfs-1505-test.txt


 After HDFS-1071, saveNamespace now appears to succeed even if all of the 
 individual directories failed to save.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1554) New semantics for recoverLease

2011-01-10 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979777#action_12979777
 ] 

Hairong Kuang commented on HDFS-1554:
-

Sorry that I forgot that I meant to introduce this new API to the trunk. Let me 
change this jira's fix version to append 0.20 and then open a different jira 
for the trunk.

 New semantics for recoverLease
 --

 Key: HDFS-1554
 URL: https://issues.apache.org/jira/browse/HDFS-1554
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append, 0.22.0, 0.23.0

 Attachments: appendRecoverLease.patch, appendRecoverLease1.patch


 Current recoverLease API implemented in append 0.20 aims to provide a lighter 
 weight (comparing to using create/append) way to trigger a file's soft lease 
 expiration. From both the use case of hbase and scribe, it could have a 
 stronger semantics: revoking the file's lease, thus starting lease recovery 
 immediately.
 Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
 HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1554) Append 0.20: New semantics for recoverLease

2011-01-10 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1554:


Fix Version/s: (was: 0.23.0)
   (was: 0.22.0)
  Summary: Append 0.20: New semantics for recoverLease  (was: New 
semantics for recoverLease)

 Append 0.20: New semantics for recoverLease
 ---

 Key: HDFS-1554
 URL: https://issues.apache.org/jira/browse/HDFS-1554
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.20-append

 Attachments: appendRecoverLease.patch, appendRecoverLease1.patch


 Current recoverLease API implemented in append 0.20 aims to provide a lighter 
 weight (comparing to using create/append) way to trigger a file's soft lease 
 expiration. From both the use case of hbase and scribe, it could have a 
 stronger semantics: revoking the file's lease, thus starting lease recovery 
 immediately.
 Also I'd like to port this recoverLease API to HDFS 0.22 and trunk since 
 HBase is moving to HDFS 0.22.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-10 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979794#action_12979794
 ] 

Jakob Homan commented on HDFS-1572:
---

Liyin-
   I'd rather not duplicate a bunch of the logic in the test.  Were we to put 
out the effort into testing, I'd rather go ahead and take an approach like what 
Project Voldemort has for time-dependent operations: http://s.apache.org/uU.  I 
recently used it to good effect in unit testing a similar bit of code: 
http://s.apache.org/bMQ  It worked quite well.

That being said, I think the patch I submitted does a more complete job of 
cleaning up the code in general and I'd like to go ahead with that one.  Adding 
more tests would be great, but is a bigger issue.  The failed unit tests seem 
to be bogus time outs.  They're not related to this code and are not 
reproducing on my local box.  I'm running the full test suite now and will post 
results when they finish.

 Checkpointer should trigger checkpoint with specified period.
 -

 Key: HDFS-1572
 URL: https://issues.apache.org/jira/browse/HDFS-1572
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Liyin Liang
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572.patch


 {code:}
   long now = now();
   boolean shouldCheckpoint = false;
   if(now = lastCheckpointTime + periodMSec) {
 shouldCheckpoint = true;
   } else {
 long size = getJournalSize();
 if(size = checkpointSize)
   shouldCheckpoint = true;
   }
 {code}
 {dfs.namenode.checkpoint.period} in configuration determines the period of 
 checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
 every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
 the first *if*  statement should be:
  {code:}
 if(now = lastCheckpointTime + 1000 * checkpointPeriod) {
  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-671) Documentation change for updated configuration keys.

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-671:
-

Priority: Blocker  (was: Major)

Seems like a blocker for 0.22.

 Documentation change for updated configuration keys.
 

 Key: HDFS-671
 URL: https://issues.apache.org/jira/browse/HDFS-671
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Priority: Blocker
 Fix For: 0.22.0


  HDFS-531, HADOOP-6233 and HDFS-631 have resulted in changes in several 
 config keys. The hadoop documentation needs to be updated to reflect those 
 changes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-884) DataNode makeInstance should report the directory list when failing to start up

2011-01-10 Thread Nigel Daley (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979817#action_12979817
 ] 

Nigel Daley commented on HDFS-884:
--

From the patch:

{code}
+try {
+  dn = DataNode.createDataNode(new String[]{}, conf);
+} catch(IOException e) {
+  // expecting exception here
+}
+if(dn != null) dn.shutdown();
{code}

Shouldn't there be a fail() call after the dn assignment line?
If you're updating patch then dn.shutdown() should be on it's own line.

 DataNode makeInstance should report the directory list when failing to start 
 up
 ---

 Key: HDFS-884
 URL: https://issues.apache.org/jira/browse/HDFS-884
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Priority: Minor
 Fix For: 0.22.0

 Attachments: HDFS-884.patch, HDFS-884.patch, InvalidDirs.patch, 
 InvalidDirs.patch


 When {{Datanode.makeInstance()}} cannot work with one of the directories in 
 dfs.data.dir, it logs this at warn level (while losing the stack trace). 
 It should include the nested exception for better troubleshooting. Then, when 
 all dirs in the list fail, an exception is thrown, but this exception does 
 not include the list of directories. It should list the absolute path of 
 every missing/failing directory, so that whoever sees the exception can see 
 where to start looking for problems: either the filesystem or the 
 configuration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1574) HDFS cannot be browsed from web UI while in safe mode

2011-01-10 Thread Todd Lipcon (JIRA)
HDFS cannot be browsed from web UI while in safe mode
-

 Key: HDFS-1574
 URL: https://issues.apache.org/jira/browse/HDFS-1574
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Blocker


As of HDFS-984, the NN does not issue delegation tokens while in safe mode 
(since it would require writing to the edit log). But the browsedfscontent 
servlet relies on getting a delegation token before redirecting to a random DN 
to browse the FS. Thus, the browse the filesystem link does not work while 
the NN is in safe mode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1331) dfs -test should work like /bin/test

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-1331:
--

Fix Version/s: (was: 0.22.0)
   Issue Type: Improvement  (was: Bug)

Changing to improvement and removing 0.22 fix version.

 dfs -test should work like /bin/test
 

 Key: HDFS-1331
 URL: https://issues.apache.org/jira/browse/HDFS-1331
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: tools
Affects Versions: 0.20.2
Reporter: Allen Wittenauer
Priority: Minor

 hadoop dfs -test doesn't act like its shell equivalent, making it difficult 
 to actually use if you are used to the real test command:
 hadoop:
 $hadoop dfs -test -d /nonexist; echo $?
 test: File does not exist: /nonexist
 255
 shell:
 $ test -d /nonexist; echo $?
 1
 a) Why is it spitting out a message? Even so, why is it saying file instead 
 of directory when I used -d?
 b) Why is the return code 255? I realize this is documented as '0' if true.  
 But docs basically say the value is undefined if it isn't.
 c) where is -f?
 d) Why is empty -z instead of -s ?  Was it a misunderstanding of the man page?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1575) viewing block from web UI broken

2011-01-10 Thread Todd Lipcon (JIRA)
viewing block from web UI broken


 Key: HDFS-1575
 URL: https://issues.apache.org/jira/browse/HDFS-1575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0


DatanodeJspHelper seems to expect the file path to be in the path info of the 
HttpRequest, rather than in a parameter. I see the following exception when 
visiting the URL 
{{http://localhost.localdomain:50075/browseBlock.jsp?blockId=5006108823351810567blockSize=20genstamp=1001filename=%2Fuser%2Ftodd%2FissuedatanodePort=50010namenodeInfoPort=50070}}

java.io.FileNotFoundException: File does not exist: /
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:834)
...
at 
org.apache.hadoop.hdfs.server.datanode.DatanodeJspHelper.generateFileDetails(DatanodeJspHelper.java:258)
at 
org.apache.hadoop.hdfs.server.datanode.browseBlock_jsp._jspService(browseBlock_jsp.java:79)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1575) viewing block from web UI broken

2011-01-10 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979835#action_12979835
 ] 

Jakob Homan commented on HDFS-1575:
---

This bug was identified in HDFS-1109 (http://s.apache.org/D2i), but I don't see 
that a JIRA was ever opened for it.  Suresh? Dmytro?  

 viewing block from web UI broken
 

 Key: HDFS-1575
 URL: https://issues.apache.org/jira/browse/HDFS-1575
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Todd Lipcon
Priority: Blocker
 Fix For: 0.22.0


 DatanodeJspHelper seems to expect the file path to be in the path info of 
 the HttpRequest, rather than in a parameter. I see the following exception 
 when visiting the URL 
 {{http://localhost.localdomain:50075/browseBlock.jsp?blockId=5006108823351810567blockSize=20genstamp=1001filename=%2Fuser%2Ftodd%2FissuedatanodePort=50010namenodeInfoPort=50070}}
 java.io.FileNotFoundException: File does not exist: /
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:834)
 ...
   at 
 org.apache.hadoop.hdfs.server.datanode.DatanodeJspHelper.generateFileDetails(DatanodeJspHelper.java:258)
   at 
 org.apache.hadoop.hdfs.server.datanode.browseBlock_jsp._jspService(browseBlock_jsp.java:79)
   at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1536) Improve HDFS WebUI

2011-01-10 Thread Hairong Kuang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979840#action_12979840
 ] 

Hairong Kuang commented on HDFS-1536:
-

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release au
 [exec] dit warnings.
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.

Failed unit tests are TestHDFSServrPorts, TestHDFSTrash, TestBackupNode, 
TestStorageRestore, and TestDFSRollback.

 Improve HDFS WebUI
 --

 Key: HDFS-1536
 URL: https://issues.apache.org/jira/browse/HDFS-1536
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: missingBlocksWebUI.patch, missingBlocksWebUI1.patch


 1. Make the missing blocks count accurate;
 2. Make the under replicated blocks count excluding missing blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1536) Improve HDFS WebUI

2011-01-10 Thread Hairong Kuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HDFS-1536:


  Resolution: Fixed
Release Note: On web UI, missing block number now becomes accurate and 
under-replicated blocks do not include missing blocks.
  Status: Resolved  (was: Patch Available)

I've just committed this. Thank Dhruba and Nigel for reviewing this!

 Improve HDFS WebUI
 --

 Key: HDFS-1536
 URL: https://issues.apache.org/jira/browse/HDFS-1536
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 0.23.0
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0

 Attachments: missingBlocksWebUI.patch, missingBlocksWebUI1.patch


 1. Make the missing blocks count accurate;
 2. Make the under replicated blocks count excluding missing blocks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1333) S3 File Permissions

2011-01-10 Thread Nigel Daley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Daley updated HDFS-1333:
--

Priority: Critical  (was: Blocker)

Doesn't seem a blocker for any release.  Downgrading to Critical.

 S3 File Permissions
 ---

 Key: HDFS-1333
 URL: https://issues.apache.org/jira/browse/HDFS-1333
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
 Environment: Hadoop cluster using 3 small Amazon EC2 machines and the 
 S3FileSystem.
 Hadoop compiled from latest trunc: 0.22.0-SNAPSHOT
 core-site:
 fs.default.name=s3://my-s3-bucket
 fs.s3.awsAccessKeyId=[key id omitted]
 fs.s3.awsSecretAccessKey=[secret key omitted]
 hadoop.tmp.dir=/mnt/hadoop.tmp.dir
 hdfs-site: empty
 mapred-site:
 mapred.job.tracker=[domU-XX-XX-XX-XX-XX-XX.compute-1.internal:9001]
 mapred.map.tasks=6
 mapred.reduce.tasks=6
Reporter: Danny Leshem
Priority: Critical

 Till lately I've been using 0.20.2 and everything was ok. Now I'm using the 
 latest trunc 0.22.0-SNAPSHOT and getting the following thrown:
 Exception in thread main java.io.IOException: The ownership/permissions on 
 the staging directory 
 s3://my-s3-bucket/mnt/hadoop.tmp.dir/mapred/staging/root/.staging is not as 
 expected. It is owned by  and permissions are rwxrwxrwx. The directory must 
 be owned by the submitter root or by root and permissions must be rwx--
 at
 org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:107)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:312)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:961)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:977)
 at com.mycompany.MyJob.runJob(MyJob.java:153)
 at com.mycompany.MyJob.run(MyJob.java:177)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at com.mycompany.MyOtherJob.runJob(MyOtherJob.java:62)
 at com.mycompany.MyOtherJob.run(MyOtherJob.java:112)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at com.mycompany.MyOtherJob.main(MyOtherJob.java:117)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
 (The it is owned by ... and permissions  is not a mistake, seems like the 
 empty string is printed there)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1561) BackupNode listens on default host

2011-01-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979853#action_12979853
 ] 

Hadoop QA commented on HDFS-1561:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12467861/BNAddress.patch
  against trunk revision 1056206.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.fs.permission.TestStickyBit
  org.apache.hadoop.hdfs.security.TestDelegationToken
  org.apache.hadoop.hdfs.server.common.TestDistributedUpgrade
  org.apache.hadoop.hdfs.server.datanode.TestBlockReport
  
org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting
  
org.apache.hadoop.hdfs.server.namenode.metrics.TestNameNodeMetrics
  org.apache.hadoop.hdfs.server.namenode.TestBackupNode
  
org.apache.hadoop.hdfs.server.namenode.TestBlocksWithNotEnoughRacks
  org.apache.hadoop.hdfs.server.namenode.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint
  org.apache.hadoop.hdfs.server.namenode.TestFsck
  org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs
  org.apache.hadoop.hdfs.server.namenode.TestStorageRestore
  org.apache.hadoop.hdfs.TestCrcCorruption
  org.apache.hadoop.hdfs.TestDatanodeBlockScanner
  org.apache.hadoop.hdfs.TestDatanodeDeath
  org.apache.hadoop.hdfs.TestDFSClientRetries
  org.apache.hadoop.hdfs.TestDFSFinalize
  org.apache.hadoop.hdfs.TestDFSRollback
  org.apache.hadoop.hdfs.TestDFSShell
  org.apache.hadoop.hdfs.TestDFSStartupVersions
  org.apache.hadoop.hdfs.TestDFSStorageStateRecovery
  org.apache.hadoop.hdfs.TestDFSUpgradeFromImage
  org.apache.hadoop.hdfs.TestDFSUpgrade
  org.apache.hadoop.hdfs.TestDistributedFileSystem
  org.apache.hadoop.hdfs.TestFileAppend2
  org.apache.hadoop.hdfs.TestFileAppend3
  org.apache.hadoop.hdfs.TestFileAppend4
  org.apache.hadoop.hdfs.TestFileAppend
  org.apache.hadoop.hdfs.TestFileConcurrentReader
  org.apache.hadoop.hdfs.TestFileCreationNamenodeRestart
  org.apache.hadoop.hdfs.TestFileCreation
  org.apache.hadoop.hdfs.TestHDFSFileSystemContract
  org.apache.hadoop.hdfs.TestHDFSTrash
  org.apache.hadoop.hdfs.TestPread
  org.apache.hadoop.hdfs.TestQuota
  org.apache.hadoop.hdfs.TestReplication
  org.apache.hadoop.hdfs.TestRestartDFS
  org.apache.hadoop.hdfs.TestSetrepDecreasing
  org.apache.hadoop.hdfs.TestSetrepIncreasing
  org.apache.hadoop.hdfs.TestWriteConfigurationToDFS

-1 contrib tests.  The patch failed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/95//testReport/
Findbugs warnings: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/95//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://hudson.apache.org/hudson/job/PreCommit-HDFS-Build/95//console

This message is automatically generated.

 BackupNode listens on default host
 --

 Key: HDFS-1561
 URL: https://issues.apache.org/jira/browse/HDFS-1561
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Fix For: 0.22.0

 Attachments: BNAddress.patch, BNAddress.patch


 Currently BackupNode uses DNS to find its default host name, and then starts 
 RPC server listening on that address ignoring the address specified in the 
 configuration. Therefore, there is no way to start BackupNode on a particular 
 ip or host address. BackupNode should use the address specified in the 
 configuration instead.

-- 
This 

[jira] Created: (HDFS-1576) TestWriteConfigurationToDFS is timing out on trunk

2011-01-10 Thread Jakob Homan (JIRA)
TestWriteConfigurationToDFS is timing out on trunk
--

 Key: HDFS-1576
 URL: https://issues.apache.org/jira/browse/HDFS-1576
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0, 0.23.0
 Environment: OSX 10.6
Reporter: Jakob Homan
 Fix For: 0.22.0, 0.23.0


On a fresh checkout, TestWriteConfigurationToDFS, runs, errors out and then 
never returns, blocking all subsequent tests.  This is reproducible with 
-Dtestcase=
{noformat}
[junit] Running org.apache.hadoop.hdfs.TestWriteConfigurationToDFS
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.023 sec
{noformat}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1576) TestWriteConfigurationToDFS is timing out on trunk

2011-01-10 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1576:
--

Priority: Blocker  (was: Major)

 TestWriteConfigurationToDFS is timing out on trunk
 --

 Key: HDFS-1576
 URL: https://issues.apache.org/jira/browse/HDFS-1576
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0, 0.23.0
 Environment: OSX 10.6
Reporter: Jakob Homan
Priority: Blocker
 Fix For: 0.22.0, 0.23.0


 On a fresh checkout, TestWriteConfigurationToDFS, runs, errors out and then 
 never returns, blocking all subsequent tests.  This is reproducible with 
 -Dtestcase=
 {noformat}
 [junit] Running org.apache.hadoop.hdfs.TestWriteConfigurationToDFS
 [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.023 sec
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1576) TestWriteConfigurationToDFS is timing out on trunk

2011-01-10 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979862#action_12979862
 ] 

Jakob Homan commented on HDFS-1576:
---

This looks like it may be an Ivy issue:
{noformat}
[ivy:resolve] downloading 
https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-common-test/0.23.0-SNAPSHOT/hadoop-common-test-0.23.0-20101226.201
217-25.jar ...
[ivy:resolve] ..
{noformat}
The common jar that's being pulled is from before the fix was committed and so 
the regression test is triggering the event.  

 TestWriteConfigurationToDFS is timing out on trunk
 --

 Key: HDFS-1576
 URL: https://issues.apache.org/jira/browse/HDFS-1576
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.22.0, 0.23.0
 Environment: OSX 10.6
Reporter: Jakob Homan
Priority: Blocker
 Fix For: 0.22.0, 0.23.0


 On a fresh checkout, TestWriteConfigurationToDFS, runs, errors out and then 
 never returns, blocking all subsequent tests.  This is reproducible with 
 -Dtestcase=
 {noformat}
 [junit] Running org.apache.hadoop.hdfs.TestWriteConfigurationToDFS
 [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 60.023 sec
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1577) Fall back to a random datanode when bestNode fails

2011-01-10 Thread Hairong Kuang (JIRA)
Fall back to a random datanode when bestNode fails
--

 Key: HDFS-1577
 URL: https://issues.apache.org/jira/browse/HDFS-1577
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Reporter: Hairong Kuang
Assignee: Hairong Kuang
 Fix For: 0.23.0


When NameNod decides to redirect a read request to a datanode, if it can not 
find a live node that contains a block of the file, NameNode should choose a 
random datanode instead of throwing an exception.

This is because a live node test is against its http port. A non-functional 
jetty servlet (may due to bug like JETTY-1264) does not mean that the replica 
on that DataNode is not readable. Redirecting the read request to a random 
datanode could make hftp function better when DataNodes hit bugs like 
JETTY-1264.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-10 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Attachment: HDFS-1572-2.patch

Updated patch so checkpointer only sleeps for a minute between running rather 
than five.  This means the checkpoint time setting will be delayed by a maximum 
of a minute.  Ran tests locally, all pass except known bad.

 Checkpointer should trigger checkpoint with specified period.
 -

 Key: HDFS-1572
 URL: https://issues.apache.org/jira/browse/HDFS-1572
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Liyin Liang
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, 
 HDFS-1572.patch


 {code:}
   long now = now();
   boolean shouldCheckpoint = false;
   if(now = lastCheckpointTime + periodMSec) {
 shouldCheckpoint = true;
   } else {
 long size = getJournalSize();
 if(size = checkpointSize)
   shouldCheckpoint = true;
   }
 {code}
 {dfs.namenode.checkpoint.period} in configuration determines the period of 
 checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
 every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
 the first *if*  statement should be:
  {code:}
 if(now = lastCheckpointTime + 1000 * checkpointPeriod) {
  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-10 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Status: Open  (was: Patch Available)

 Checkpointer should trigger checkpoint with specified period.
 -

 Key: HDFS-1572
 URL: https://issues.apache.org/jira/browse/HDFS-1572
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Liyin Liang
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, 
 HDFS-1572.patch


 {code:}
   long now = now();
   boolean shouldCheckpoint = false;
   if(now = lastCheckpointTime + periodMSec) {
 shouldCheckpoint = true;
   } else {
 long size = getJournalSize();
 if(size = checkpointSize)
   shouldCheckpoint = true;
   }
 {code}
 {dfs.namenode.checkpoint.period} in configuration determines the period of 
 checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
 every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
 the first *if*  statement should be:
  {code:}
 if(now = lastCheckpointTime + 1000 * checkpointPeriod) {
  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HDFS-1572) Checkpointer should trigger checkpoint with specified period.

2011-01-10 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HDFS-1572:
--

Status: Patch Available  (was: Open)

re-triggering hudson.

 Checkpointer should trigger checkpoint with specified period.
 -

 Key: HDFS-1572
 URL: https://issues.apache.org/jira/browse/HDFS-1572
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.22.0
Reporter: Liyin Liang
Priority: Blocker
 Fix For: 0.21.0

 Attachments: 1527-1.diff, 1572-2.diff, HDFS-1572-2.patch, 
 HDFS-1572.patch


 {code:}
   long now = now();
   boolean shouldCheckpoint = false;
   if(now = lastCheckpointTime + periodMSec) {
 shouldCheckpoint = true;
   } else {
 long size = getJournalSize();
 if(size = checkpointSize)
   shouldCheckpoint = true;
   }
 {code}
 {dfs.namenode.checkpoint.period} in configuration determines the period of 
 checkpoint. However, with above code, the Checkpointer triggers a checkpoint 
 every 5 minutes (periodMSec=5*60*1000). According to SecondaryNameNode.java, 
 the first *if*  statement should be:
  {code:}
 if(now = lastCheckpointTime + 1000 * checkpointPeriod) {
  {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.