[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-05-30 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Status: Patch Available  (was: Open)

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
>
> I had been automating tests to verify the removal and restoration of storage 
> directories. I was testing by setting up a loopback file system, using that 
> as one of the storage directories, and filling it up to make the writes from 
> Hadoop namenode to the checkpoint fail. 
> Mostly I would see the functionality work. However, very often I would see 
> this exception in the logs: 
> 2011-05-29 23:34:30,241 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:224)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:101)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:98)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:97)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:74)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:74)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
> at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> In this case the storage directory wasn't taken offline. It would not be 
> removed from the list. John George figured out this was because the 
> IOException was happening in a code path fromm where the function to remove 
> the corresponding wasn't being called. 
> Also, very rarely, I would see this exception
> 2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 87 on 8020, call getEditLogSize() from
> 98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException
> java.io.IOException: java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogFile

[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-05-30 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.patch

This applies to commit a8cacc60847be89b5769741f0eb5f560cdb64691 



> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch
>
>
> I had been automating tests to verify the removal and restoration of storage 
> directories. I was testing by setting up a loopback file system, using that 
> as one of the storage directories, and filling it up to make the writes from 
> Hadoop namenode to the checkpoint fail. 
> Mostly I would see the functionality work. However, very often I would see 
> this exception in the logs: 
> 2011-05-29 23:34:30,241 WARN org.mortbay.log: /getimage: java.io.IOException: 
> GetImage failed. java.io.IOException: No space left on device
> at java.io.FileOutputStream.writeBytes(Native Method)
> at java.io.FileOutputStream.write(FileOutputStream.java:297)
> at 
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:224)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:101)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:98)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:97)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:74)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
> at 
> org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:74)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
> at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> In this case the storage directory wasn't taken offline. It would not be 
> removed from the list. John George figured out this was because the 
> IOException was happening in a code path fromm where the function to remove 
> the corresponding wasn't being called. 
> Also, very rarely, I would see this exception
> 2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 87 on 8020, call getEditLogSize() from
> 98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException
> java.io.IOExceptio

[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-05-31 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Description: Removal and restoration of storage directories on 
checkpointing failure doesn't work properly. Sometimes it throws a 
NullPointerException and sometimes it doesn't take off a failed storage 
directory  (was: I had been automating tests to verify the removal and 
restoration of storage directories. I was testing by setting up a loopback file 
system, using that as one of the storage directories, and filling it up to make 
the writes from Hadoop namenode to the checkpoint fail. 
Mostly I would see the functionality work. However, very often I would see this 
exception in the logs: 

2011-05-29 23:34:30,241 WARN org.mortbay.log: /getimage: java.io.IOException: 
GetImage failed. java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:297)
at 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage.getFileClient(TransferFsImage.java:224)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:101)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1$1.run(GetImageServlet.java:98)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:97)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet$1.run(GetImageServlet.java:74)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
at 
org.apache.hadoop.hdfs.server.namenode.GetImageServlet.doGet(GetImageServlet.java:74)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:871)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1115)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:361)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

In this case the storage directory wasn't taken offline. It would not be 
removed from the list. John George figured out this was because the IOException 
was happening in a code path fromm where the function to remove the 
corresponding wasn't being called. 

Also, very rarely, I would see this exception

2011-04-05 17:36:56,187 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
87 on 8020, call getEditLogSize() from
98.137.97.99:35862: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:109)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.processIOError(FSEditLog.java:299)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.getEditLogSize(FSEditLog.java:849)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getEditLogSize(FSNamesystem.java:4270)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.getEditLogSize(NameNode.java:1095)
at sun.reflect.G

[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.patch

HDFS-2011.patch

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.patch

Granting license to ASF.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-01 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.3.patch

Updated patch. Fixed some things I looked over.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.patch, HDFS-2011.patch, 
> HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-03 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.4.patch

Incorporated Matt's and Konstantin's comments

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.patch, 
> HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-03 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.5.patch

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-27 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.6.patch

Incorporated Matt's latest comments

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.patch, HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-27 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.7.patch

Included fc.close() in EditLogFileOutputStream.close()

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.patch, HDFS-2011.patch, 
> HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-28 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated HDFS-2011:
---

Attachment: HDFS-2011.8.patch

Hi Matt,

I've incorporated both changes. Thanks for your insightful reviews :)

Cheers
Ravi

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
> HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-06-30 Thread Matt Foley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated HDFS-2011:
-

   Resolution: Fixed
Fix Version/s: 0.23.0
   Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Ravi!  And thanks to Todd and Cos for reviews.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 0.23.0
>
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
> HDFS-2011.patch, HDFS-2011.patch
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-06 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2011:
--

Attachment: elfos-close-patch-on-1073.txt

Here's the patch I'm planning to commit to 1073 branch. Look good?

I will also do some stress testing similar to what Ravi described on the branch 
to see if I can reproduce the issue he saw.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 0.23.0
>
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
> HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073.txt
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2011:
--

Attachment: elfos-close-patch-on-1073-2.txt

@John. I agree. Here's an updated (1073) patch that tests the double close and 
asserts that we get an IOE for using an aborted stream. 

@Ravi - want to incorporate this new test into the patch for trunk? 

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 0.23.0
>
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
> HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073-2.txt, 
> elfos-close-patch-on-1073.txt
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2011) Removal and restoration of storage directories on checkpointing failure doesn't work properly

2011-07-13 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-2011:
--

Attachment: elfos-close-patch-on-1073-3.txt

Attaching patch with minor fix to comment in the double close test.

> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly
> -
>
> Key: HDFS-2011
> URL: https://issues.apache.org/jira/browse/HDFS-2011
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Ravi Prakash
>Assignee: Ravi Prakash
> Fix For: 0.23.0
>
> Attachments: HDFS-2011.3.patch, HDFS-2011.4.patch, HDFS-2011.5.patch, 
> HDFS-2011.6.patch, HDFS-2011.7.patch, HDFS-2011.8.patch, HDFS-2011.patch, 
> HDFS-2011.patch, HDFS-2011.patch, elfos-close-patch-on-1073-2.txt, 
> elfos-close-patch-on-1073-3.txt, elfos-close-patch-on-1073.txt
>
>
> Removal and restoration of storage directories on checkpointing failure 
> doesn't work properly. Sometimes it throws a NullPointerException and 
> sometimes it doesn't take off a failed storage directory

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira