Re: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?
Given people have had several days to vote, and there have been no -1s, this should be good to go in, right? We have two HDFS committer +1s (Stack and Nicholas) and nonbinding +1s from several others. Thanks -Todd On Thu, Feb 4, 2010 at 1:30 PM, Tsz Wo (Nicholas), Sze s29752-hadoop...@yahoo.com wrote: This is a friendly reminder for voting on committing HDFD-927 to 0.20 and 0.21. Comiitters, please vote! Nicholas - Original Message From: Stack st...@duboce.net To: hdfs-dev@hadoop.apache.org Sent: Tue, February 2, 2010 10:22:50 PM Subject: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch? I'd like to open a vote on committing HDFS-927 to both hadoop branch 0.20 and to 0.21. HDFS-927 DFSInputStream retries too many times for new block location has an odd summary but in short, its a better HDFS-127 DFSClient block read failures cause open DFSInputStream to become unusable. HDFS-127 is an old, popular issue that refuses to die. We voted on having it committed to the 0.20 branch not too long ago, see http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html, only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted. High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good read cleans out the failures count (Previous failures 'stuck' though there may have been hours of successful reads in betwixt). When rolling hadoop 0.20.2 was proposed, a few fellas including myself raised a lack of HDFS-127 as an obstacle. HDFS-927 has been committed to TRUNK. I'm +1 on committing to 0.20 and to 0.21 branches. Thanks for taking the time to take a look into this issue. St.Ack
[jira] Resolved: (HDFS-830) change build.xml to look at lib's jars before ivy, to allow overwriting ivy's libraries.
[ https://issues.apache.org/jira/browse/HDFS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boris Shkolnik resolved HDFS-830. - Resolution: Won't Fix Looks like alternative solution is to use resolvers=iternal with maven. Closing this one. change build.xml to look at lib's jars before ivy, to allow overwriting ivy's libraries. Key: HDFS-830 URL: https://issues.apache.org/jira/browse/HDFS-830 Project: Hadoop HDFS Issue Type: Bug Reporter: Boris Shkolnik Attachments: HDFS-830.patch Currently build.xml looks first into ivy's locations ,before picking up jars from lib directory. We need to change that to allow overwriting ivy's libs with local ones, by putting them into lib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Name Node Corruption When Shutdown Too Soon
Hi Jonathan, Thank you for raising the issue. We will need more information about your configuration files. It sounds like a problem noted by Todd in HDFS-909. If edits directory precedes image in configuration, then edits will be emptied prior to saving the image. Any way it worth filing a jira on that, and attach logs, config file, whatever you may find helpful for reproducing the problem. Thanks, --Konstantin Shvachko On 2/7/2010 8:45 AM, Allen, Jonathan wrote: I've come across a name node bug and just wanted to check if it's a known issue before I formally raise it (I've had a quick look through the database but couldn't see anything obvious). If the name node is shut down before it has completed reading through the edit log then the edit log gets removed without the image file being updated. This results in name node reverting to its previously saved state (out of sync with the data nodes) and the most recent edits getting lost. Does anybody recognise this as a known issue or should I raise it? Thanks, Jonathan Allen UKGP, NSR, Defence and Security HP Enterprise Services Telephone +44 1682 292101 Email jonathan.allen...@hp.com Street address, Unit 29, Alexandra Way, Ashchurch Business Park, Tewkesbury, Gloucestershire. GL20 8NB Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL.
[jira] Created: (HDFS-955) FSImage.saveFSImage can lose edits
FSImage.saveFSImage can lose edits -- Key: HDFS-955 URL: https://issues.apache.org/jira/browse/HDFS-955 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Priority: Blocker This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage function (implementing dfsadmin -saveNamespace) can corrupt the NN storage such that all current edits are lost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-956) Improper synchronization in some FSNamesystem methods
Improper synchronization in some FSNamesystem methods - Key: HDFS-956 URL: https://issues.apache.org/jira/browse/HDFS-956 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.21.0, 0.22.0 Reporter: Todd Lipcon There are some methods in FSNamesystem that check isInSafeMode while not synchronized, and then proceed to perform operations. Thus the actual operations can occur after the NN has entered safemode, which is no good. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Name Node Corruption When Shutdown Too Soon
Hey Jonathan, As Konstantin mentioned, I've been looking into a couple issues that could be related. At first glance it doesn't sound like you've run into quite the same thing. What version did you see this on? The steps to reproduce are something like: 1) Start a NN 2) Perform a bunch of edits so there is a large edit log 3) kill -9 the NN 4) start the NN again 5) while it is in the middle of replaying edits, kill -9 it again 6) start the NN, and lose all the previous edits? Or did I misunderstand what happened? If that sounds right, I'll give it a go and see if I can reproduce. Thanks -Todd On Sun, Feb 7, 2010 at 8:45 AM, Allen, Jonathan jonathan.all...@hp.com wrote: I've come across a name node bug and just wanted to check if it's a known issue before I formally raise it (I've had a quick look through the database but couldn't see anything obvious). If the name node is shut down before it has completed reading through the edit log then the edit log gets removed without the image file being updated. This results in name node reverting to its previously saved state (out of sync with the data nodes) and the most recent edits getting lost. Does anybody recognise this as a known issue or should I raise it? Thanks, Jonathan Allen UKGP, NSR, Defence and Security HP Enterprise Services Telephone +44 1682 292101 Email jonathan.allen...@hp.com Street address, Unit 29, Alexandra Way, Ashchurch Business Park, Tewkesbury, Gloucestershire. GL20 8NB Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL.
[jira] Reopened: (HDFS-830) change build.xml to look at lib's jars before ivy, to allow overwriting ivy's libraries.
[ https://issues.apache.org/jira/browse/HDFS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan reopened HDFS-830: -- I'm going to go ahead and re-open this: we've been using the resolvers:internal method for a while and, like I had I feared, it's a pain to keep straight which version is installed and when it is getting called. Also, as noted above, there was no public discussion on this approach before it was added to the wiki. My preference would be a new option, something like, -Dadditional.jars=foo.jar, which would add those jars to the classpath before the other entries. This would make it easy to automate upstream testing, building a patched common jar and then passing it to hdfs to be tested against (and so on for MR). In any case, with some many patches flying around, locally installing temporary jars is not a good solution. change build.xml to look at lib's jars before ivy, to allow overwriting ivy's libraries. Key: HDFS-830 URL: https://issues.apache.org/jira/browse/HDFS-830 Project: Hadoop HDFS Issue Type: Bug Reporter: Boris Shkolnik Attachments: HDFS-830.patch Currently build.xml looks first into ivy's locations ,before picking up jars from lib directory. We need to change that to allow overwriting ivy's libs with local ones, by putting them into lib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?
Vote is closed (unless there is objection). I'll commit below in next day or so. Thanks to all who participated. St.Ack On Mon, Feb 8, 2010 at 11:26 AM, Todd Lipcon t...@cloudera.com wrote: Given people have had several days to vote, and there have been no -1s, this should be good to go in, right? We have two HDFS committer +1s (Stack and Nicholas) and nonbinding +1s from several others. Thanks -Todd On Thu, Feb 4, 2010 at 1:30 PM, Tsz Wo (Nicholas), Sze s29752-hadoop...@yahoo.com wrote: This is a friendly reminder for voting on committing HDFD-927 to 0.20 and 0.21. Comiitters, please vote! Nicholas - Original Message From: Stack st...@duboce.net To: hdfs-dev@hadoop.apache.org Sent: Tue, February 2, 2010 10:22:50 PM Subject: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch? I'd like to open a vote on committing HDFS-927 to both hadoop branch 0.20 and to 0.21. HDFS-927 DFSInputStream retries too many times for new block location has an odd summary but in short, its a better HDFS-127 DFSClient block read failures cause open DFSInputStream to become unusable. HDFS-127 is an old, popular issue that refuses to die. We voted on having it committed to the 0.20 branch not too long ago, see http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html, only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted. High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good read cleans out the failures count (Previous failures 'stuck' though there may have been hours of successful reads in betwixt). When rolling hadoop 0.20.2 was proposed, a few fellas including myself raised a lack of HDFS-127 as an obstacle. HDFS-927 has been committed to TRUNK. I'm +1 on committing to 0.20 and to 0.21 branches. Thanks for taking the time to take a look into this issue. St.Ack
Re: Name Node Corruption When Shutdown Too Soon
Hi Jonathan, Another question: how have you configured dfs.name.dir? Do you have several directories configured? Thanks -Todd On Mon, Feb 8, 2010 at 4:45 PM, Todd Lipcon t...@cloudera.com wrote: Hey Jonathan, As Konstantin mentioned, I've been looking into a couple issues that could be related. At first glance it doesn't sound like you've run into quite the same thing. What version did you see this on? The steps to reproduce are something like: 1) Start a NN 2) Perform a bunch of edits so there is a large edit log 3) kill -9 the NN 4) start the NN again 5) while it is in the middle of replaying edits, kill -9 it again 6) start the NN, and lose all the previous edits? Or did I misunderstand what happened? If that sounds right, I'll give it a go and see if I can reproduce. Thanks -Todd On Sun, Feb 7, 2010 at 8:45 AM, Allen, Jonathan jonathan.all...@hp.com wrote: I've come across a name node bug and just wanted to check if it's a known issue before I formally raise it (I've had a quick look through the database but couldn't see anything obvious). If the name node is shut down before it has completed reading through the edit log then the edit log gets removed without the image file being updated. This results in name node reverting to its previously saved state (out of sync with the data nodes) and the most recent edits getting lost. Does anybody recognise this as a known issue or should I raise it? Thanks, Jonathan Allen UKGP, NSR, Defence and Security HP Enterprise Services Telephone +44 1682 292101 Email jonathan.allen...@hp.com Street address, Unit 29, Alexandra Way, Ashchurch Business Park, Tewkesbury, Gloucestershire. GL20 8NB Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England The contents of this message and any attachments to it are confidential and may be legally privileged. If you have received this message in error, you should delete it from your system immediately and advise the sender. To any recipient of this message within HP, unless otherwise stated you should consider this message and attachments as HP CONFIDENTIAL.