Re: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?

2010-02-08 Thread Todd Lipcon
Given people have had several days to vote, and there have been no
-1s, this should be good to go in, right? We have two HDFS committer
+1s (Stack and Nicholas) and nonbinding +1s from several others.

Thanks
-Todd

On Thu, Feb 4, 2010 at 1:30 PM, Tsz Wo (Nicholas), Sze
s29752-hadoop...@yahoo.com wrote:

 This is a friendly reminder for voting on committing HDFD-927 to 0.20 and 
 0.21.

 Comiitters, please vote!

 Nicholas




 - Original Message 
  From: Stack st...@duboce.net
  To: hdfs-dev@hadoop.apache.org
  Sent: Tue, February 2, 2010 10:22:50 PM
  Subject: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?
 
  I'd like to open a vote on committing HDFS-927 to both hadoop branch
  0.20 and to 0.21.
 
  HDFS-927 DFSInputStream retries too many times for new block
  location has an odd summary but in short, its a better HDFS-127
  DFSClient block read failures cause open DFSInputStream to become
  unusable.  HDFS-127 is an old, popular issue that refuses to die.  We
  voted on having it committed to the 0.20 branch not too long ago, see
  http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html,
  only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted.
 
  High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good
  read cleans out the failures count (Previous failures 'stuck' though
  there may have been hours of successful reads in betwixt).  When
  rolling hadoop 0.20.2 was proposed, a few fellas including myself
  raised a lack of HDFS-127 as an obstacle.
 
  HDFS-927 has been committed to TRUNK.
 
  I'm +1 on committing to 0.20 and to 0.21 branches.
 
  Thanks for taking the time to take a look into this issue.
  St.Ack




[jira] Resolved: (HDFS-830) change build.xml to look at lib's jars before ivy, to allow overwriting ivy's libraries.

2010-02-08 Thread Boris Shkolnik (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boris Shkolnik resolved HDFS-830.
-

Resolution: Won't Fix

Looks like alternative solution is to use resolvers=iternal with maven.
Closing this one.

 change build.xml to look at lib's jars before ivy, to allow overwriting ivy's 
 libraries.
 

 Key: HDFS-830
 URL: https://issues.apache.org/jira/browse/HDFS-830
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
 Attachments: HDFS-830.patch


 Currently build.xml looks first into ivy's locations ,before picking up jars 
 from lib directory.
 We need to change that to allow overwriting ivy's libs with local ones, by 
 putting them into lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Name Node Corruption When Shutdown Too Soon

2010-02-08 Thread Konstantin Shvachko

Hi Jonathan,

Thank you for raising the issue.
We will need more information about your configuration files.

It sounds like a problem noted by Todd in HDFS-909.
If edits directory precedes image in configuration, then edits
will be emptied prior to saving the image.

Any way it worth filing a jira on that, and attach logs, config file,
whatever you may find helpful for reproducing the problem.

Thanks,
--Konstantin Shvachko

On 2/7/2010 8:45 AM, Allen, Jonathan wrote:
 I've come across a name node bug and just wanted to check if it's a known 
issue before I formally raise it (I've had a quick look through the database but 
couldn't see anything obvious).

 If the name node is shut down before it has completed reading through the edit log then the edit log gets removed without the image file being updated.  This results in name node reverting to its previously saved state (out of sync with the data nodes) 
and the most recent edits getting lost.


 Does anybody recognise this as a known issue or should I raise it?

 Thanks,
 Jonathan Allen
 UKGP, NSR, Defence and Security
 HP Enterprise Services
 Telephone +44 1682 292101
 Email jonathan.allen...@hp.com
 Street address, Unit 29, Alexandra Way, Ashchurch Business Park, Tewkesbury, 
Gloucestershire. GL20 8NB

 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 
1HN
 Registered No: 690597 England
 The contents of this message and any attachments to it are confidential and 
may be legally privileged. If you have received this message in error, you should 
delete it from your system immediately and advise the sender.
 To any recipient of this message within HP, unless otherwise stated you should consider 
this message and attachments as HP CONFIDENTIAL.







[jira] Created: (HDFS-955) FSImage.saveFSImage can lose edits

2010-02-08 Thread Todd Lipcon (JIRA)
FSImage.saveFSImage can lose edits
--

 Key: HDFS-955
 URL: https://issues.apache.org/jira/browse/HDFS-955
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0, 0.22.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Blocker


This is a continuation of a discussion from HDFS-909. The FSImage.saveFSImage 
function (implementing dfsadmin -saveNamespace) can corrupt the NN storage such 
that all current edits are lost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-956) Improper synchronization in some FSNamesystem methods

2010-02-08 Thread Todd Lipcon (JIRA)
Improper synchronization in some FSNamesystem methods
-

 Key: HDFS-956
 URL: https://issues.apache.org/jira/browse/HDFS-956
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.21.0, 0.22.0
Reporter: Todd Lipcon


There are some methods in FSNamesystem that check isInSafeMode while not 
synchronized, and then proceed to perform operations. Thus the actual 
operations can occur after the NN has entered safemode, which is no good.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Name Node Corruption When Shutdown Too Soon

2010-02-08 Thread Todd Lipcon
Hey Jonathan,

As Konstantin mentioned, I've been looking into a couple issues that
could be related. At first glance it doesn't sound like you've run
into quite the same thing.

What version did you see this on? The steps to reproduce are something like:

1) Start a NN
2) Perform a bunch of edits so there is a large edit log
3) kill -9 the NN
4) start the NN again
5) while it is in the middle of replaying edits, kill -9 it again
6) start the NN, and lose all the previous edits?

Or did I misunderstand what happened? If that sounds right, I'll give
it a go and see if I can reproduce.

Thanks
-Todd

On Sun, Feb 7, 2010 at 8:45 AM, Allen, Jonathan jonathan.all...@hp.com wrote:
 I've come across a name node bug and just wanted to check if it's a known 
 issue before I formally raise it (I've had a quick look through the database 
 but couldn't see anything obvious).

 If the name node is shut down before it has completed reading through the 
 edit log then the edit log gets removed without the image file being updated. 
  This results in name node reverting to its previously saved state (out of 
 sync with the data nodes) and the most recent edits getting lost.

 Does anybody recognise this as a known issue or should I raise it?

 Thanks,
 Jonathan Allen
 UKGP, NSR, Defence and Security
 HP Enterprise Services
 Telephone +44 1682 292101
 Email jonathan.allen...@hp.com
 Street address, Unit 29, Alexandra Way, Ashchurch Business Park, Tewkesbury, 
 Gloucestershire. GL20 8NB

 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 
 1HN
 Registered No: 690597 England
 The contents of this message and any attachments to it are confidential and 
 may be legally privileged. If you have received this message in error, you 
 should delete it from your system immediately and advise the sender.
 To any recipient of this message within HP, unless otherwise stated you 
 should consider this message and attachments as HP CONFIDENTIAL.






[jira] Reopened: (HDFS-830) change build.xml to look at lib's jars before ivy, to allow overwriting ivy's libraries.

2010-02-08 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan reopened HDFS-830:
--


I'm going to go ahead and re-open this: we've been using the resolvers:internal 
method for a while and, like I had I feared, it's a pain to keep straight which 
version is installed and when it is getting called.  Also, as noted above, 
there was no public discussion on this approach before it was added to the 
wiki.  

My preference would be a new option, something like, 
-Dadditional.jars=foo.jar, which would add those jars to the classpath before 
the other entries.  This would make it easy to automate upstream testing, 
building a patched common jar and then passing it to hdfs to be tested against 
(and so on for MR).  In any case, with some many patches flying around, locally 
installing temporary jars is not a good solution.

 change build.xml to look at lib's jars before ivy, to allow overwriting ivy's 
 libraries.
 

 Key: HDFS-830
 URL: https://issues.apache.org/jira/browse/HDFS-830
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Boris Shkolnik
 Attachments: HDFS-830.patch


 Currently build.xml looks first into ivy's locations ,before picking up jars 
 from lib directory.
 We need to change that to allow overwriting ivy's libs with local ones, by 
 putting them into lib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?

2010-02-08 Thread Stack
Vote is closed (unless there is objection).  I'll commit below in next
day or so.
Thanks to all who participated.
St.Ack

On Mon, Feb 8, 2010 at 11:26 AM, Todd Lipcon t...@cloudera.com wrote:
 Given people have had several days to vote, and there have been no
 -1s, this should be good to go in, right? We have two HDFS committer
 +1s (Stack and Nicholas) and nonbinding +1s from several others.

 Thanks
 -Todd

 On Thu, Feb 4, 2010 at 1:30 PM, Tsz Wo (Nicholas), Sze
 s29752-hadoop...@yahoo.com wrote:

 This is a friendly reminder for voting on committing HDFD-927 to 0.20 and 
 0.21.

 Comiitters, please vote!

 Nicholas




 - Original Message 
  From: Stack st...@duboce.net
  To: hdfs-dev@hadoop.apache.org
  Sent: Tue, February 2, 2010 10:22:50 PM
  Subject: [VOTE] Commit HDFS-927 to both 0.20 and 0.21 branch?
 
  I'd like to open a vote on committing HDFS-927 to both hadoop branch
  0.20 and to 0.21.
 
  HDFS-927 DFSInputStream retries too many times for new block
  location has an odd summary but in short, its a better HDFS-127
  DFSClient block read failures cause open DFSInputStream to become
  unusable.  HDFS-127 is an old, popular issue that refuses to die.  We
  voted on having it committed to the 0.20 branch not too long ago, see
  http://www.mail-archive.com/hdfs-dev@hadoop.apache.org/msg00401.html,
  only it broke TestFsck (See http://su.pr/1nylUn) so it was reverted.
 
  High-level, HDFS-127/HDFS-927 is about fixing DFSClient so it a good
  read cleans out the failures count (Previous failures 'stuck' though
  there may have been hours of successful reads in betwixt).  When
  rolling hadoop 0.20.2 was proposed, a few fellas including myself
  raised a lack of HDFS-127 as an obstacle.
 
  HDFS-927 has been committed to TRUNK.
 
  I'm +1 on committing to 0.20 and to 0.21 branches.
 
  Thanks for taking the time to take a look into this issue.
  St.Ack





Re: Name Node Corruption When Shutdown Too Soon

2010-02-08 Thread Todd Lipcon
Hi Jonathan,

Another question: how have you configured dfs.name.dir? Do you have
several directories configured?

Thanks
-Todd

On Mon, Feb 8, 2010 at 4:45 PM, Todd Lipcon t...@cloudera.com wrote:
 Hey Jonathan,

 As Konstantin mentioned, I've been looking into a couple issues that
 could be related. At first glance it doesn't sound like you've run
 into quite the same thing.

 What version did you see this on? The steps to reproduce are something like:

 1) Start a NN
 2) Perform a bunch of edits so there is a large edit log
 3) kill -9 the NN
 4) start the NN again
 5) while it is in the middle of replaying edits, kill -9 it again
 6) start the NN, and lose all the previous edits?

 Or did I misunderstand what happened? If that sounds right, I'll give
 it a go and see if I can reproduce.

 Thanks
 -Todd

 On Sun, Feb 7, 2010 at 8:45 AM, Allen, Jonathan jonathan.all...@hp.com 
 wrote:
 I've come across a name node bug and just wanted to check if it's a known 
 issue before I formally raise it (I've had a quick look through the database 
 but couldn't see anything obvious).

 If the name node is shut down before it has completed reading through the 
 edit log then the edit log gets removed without the image file being 
 updated.  This results in name node reverting to its previously saved state 
 (out of sync with the data nodes) and the most recent edits getting lost.

 Does anybody recognise this as a known issue or should I raise it?

 Thanks,
 Jonathan Allen
 UKGP, NSR, Defence and Security
 HP Enterprise Services
 Telephone +44 1682 292101
 Email jonathan.allen...@hp.com
 Street address, Unit 29, Alexandra Way, Ashchurch Business Park, Tewkesbury, 
 Gloucestershire. GL20 8NB

 Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 
 1HN
 Registered No: 690597 England
 The contents of this message and any attachments to it are confidential and 
 may be legally privileged. If you have received this message in error, you 
 should delete it from your system immediately and advise the sender.
 To any recipient of this message within HP, unless otherwise stated you 
 should consider this message and attachments as HP CONFIDENTIAL.