[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Status: Patch Available (was: Open) > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > > I had been automating tests to verify the removal and restoration of storage > directories. I was testing by setting up a loopback file system, using that > as one of the storage directories, and filling it up to make the writes from > Hadoop namenode to the checkpoint fail. > Mostly I would see the functionality work. However, very often I would see > this exception in the logs: > 2011-05-29 23:34:30,241 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:297) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:224) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:101) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:98) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:97) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:74) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:74) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > In this case the storage directory wasn't taken offline. It would not be > removed from the list. John George figured out this was because the > IOException was happening in a code path fromm where the function to remove > the corresponding wasn't being called. > Also, very rarely, I would see this exception > 2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 87 on 8020, call getEditLogSize() from > 98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.namenode.EditLogFile
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.patch This applies to commit a8cacc60847be89b5769741f0eb5f560cdb64691 > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.patch > > > I had been automating tests to verify the removal and restoration of storage > directories. I was testing by setting up a loopback file system, using that > as one of the storage directories, and filling it up to make the writes from > Hadoop namenode to the checkpoint fail. > Mostly I would see the functionality work. However, very often I would see > this exception in the logs: > 2011-05-29 23:34:30,241 WARN org.mortbay.log: /getimage: java.io.IOException: > GetImage failed. java.io.IOException: No space left on device > at java.io.FileOutputStream.writeBytes(Native Method) > at java.io.FileOutputStream.write(FileOutputStream.java:297) > at > org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:224) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:101) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:98) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:97) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:74) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) > at > org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:74) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > In this case the storage directory wasn't taken offline. It would not be > removed from the list. John George figured out this was because the > IOException was happening in a code path fromm where the function to remove > the corresponding wasn't being called. > Also, very rarely, I would see this exception > 2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 87 on 8020, call getEditLogSize() from > 98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException > java.io.IOExceptio
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Description: Removal and restoration of storage directories on checkpointing failure doesn't work properly. Sometimes it throws a NullPointerException and sometimes it doesn't take off a failed storage directory (was: I had been automating tests to verify the removal and restoration of storage directories. I was testing by setting up a loopback file system, using that as one of the storage directories, and filling it up to make the writes from Hadoop namenode to the checkpoint fail. Mostly I would see the functionality work. However, very often I would see this exception in the logs: 2011-05-29 23:34:30,241 WARN org.mortbay.log: /getimage: java.io.IOException: GetImage failed. java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:297) at org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:224) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:101) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:98) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:97) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:74) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:416) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131) at org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:74) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124) at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) In this case the storage directory wasn't taken offline. It would not be removed from the list. John George figured out this was because the IOException was happening in a code path fromm where the function to remove the corresponding wasn't being called. Also, very rarely, I would see this exception 2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler 87 on 8020, call getEditLogSize() from 98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:109) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.processIOError(FSEditLog.java:299) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getEditLogSize(FSEditLog.java:849) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEditLogSize(FSNamesystem.java:4270) at org.apache.hadoop.hdfs.server.namenode.NameNode.getEditLogSize(NameNode.java:1095) at sun.reflect.G
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.patch HDFS-2011.patch > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.patch Granting license to ASF. > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.3.patch Updated patch. Fixed some things I looked over. > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.3.patch, HDFS-2011.patch, HDFS-2011.patch, > HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.4.patch Incorporated Matt's and Konstantin's comments > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.patch, > HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.5.patch > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.6.patch Incorporated Matt's latest comments > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.6.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.7.patch Included fc.close() in EditLogFileOutputStream.close() > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.patch, HDFS-2011.patch, > HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-2011: --- Attachment: HDFS-2011.8.patch Hi Matt, I've incorporated both changes. Thanks for your insightful reviews :) Cheers Ravi > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, > HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated HDFS-2011: - Resolution: Fixed Fix Version/s: 0.23.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks Ravi! And thanks to Todd and Cos for reviews. > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 0.23.0 > > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, > HDFS-2011.patch, HDFS-2011.patch > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-2011: -- Attachment: elfos-close-patch-on-1073.txt Here's the patch I'm planning to commit to 1073 branch. Look good? I will also do some stress testing similar to what Ravi described on the branch to see if I can reproduce the issue he saw. > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 0.23.0 > > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, > HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073.txt > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2011: -- Attachment: elfos-close-patch-on-1073-2.txt @John. I agree. Here's an updated (1073) patch that tests the double close and asserts that we get an IOE for using an aborted stream. @Ravi - want to incorporate this new test into the patch for trunk? > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 0.23.0 > > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, > HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073-2.txt, > elfos-close-patch-on-1073.txt > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly
[ https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Collins updated HDFS-2011: -- Attachment: elfos-close-patch-on-1073-3.txt Attaching patch with minor fix to comment in the double close test. > Removal and restoration of storage directories on checkpointing failure > doesn't work properly > - > > Key: HDFS-2011 > URL: https://issues.apache.org/jira/browse/HDFS-2011 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node >Affects Versions: 0.23.0 >Reporter: Ravi Prakash >Assignee: Ravi Prakash > Fix For: 0.23.0 > > Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, > HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, > HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073-2.txt, > elfos-close-patch-on-1073-3.txt, elfos-close-patch-on-1073.txt > > > Removal and restoration of storage directories on checkpointing failure > doesn't work properly. Sometimes it throws a NullPointerException and > sometimes it doesn't take off a failed storage directory -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira