[
https://issues.apache.org/jira/browse/STORM-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15129184#comment-15129184
]
ASF GitHub Bot commented on STORM-1515:
---------------------------------------
GitHub user torbiak opened a pull request:
https://github.com/apache/storm/pull/1067
STORM-1515: Reset LocalState if corrupted after a hard reboot on Windows
On Windows LocalState IO requests interrupted by a hard reboot can
result in a file full of NULs, similar to the empty-file corruption seen in
STORM-307.
I've fixed this for 0.9.x first since I haven't upgraded to 0.10 yet. The
fix for 0.10 will be slightly different due to the move to Thrift for
LocalState serialization.
It might be desirable to catch EOFException instead of checking
serialized.length, since it could cover more cases of corruption, like a
partially-written serialization stream.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/torbiak/storm 0.9.x-branch
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1067.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1067
----
commit 69c90bdda052e68c18a2ecfe43c34999869aa144
Author: Jordan Torbiak <[email protected]>
Date: 2016-02-02T01:48:53Z
STORM-1515: Reset LocalState if corrupted after a hard reboot on Windows
On Windows LocalState IO requests interrupted by a hard reboot can
result in a file full of NULs, similar to the empty-file corruption seen
in STORM-307.
----
> LocalState corruption after hard reboot on Windows
> --------------------------------------------------
>
> Key: STORM-1515
> URL: https://issues.apache.org/jira/browse/STORM-1515
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-core
> Affects Versions: 0.10.0, 0.9.4
> Environment: Windows Server 2012
> Reporter: Jordan Torbiak
>
> After a hard reboot on windows I'm seeing {{LocalState}} files for the
> supervisor that contain a few hundred NULs, resulting in a
> {{StreamCorruptedException}} on deserialization and the supervisor failing to
> start.
> {noformat}
> 2016-01-27T17:04:10.848-0700 b.s.d.supervisor [INFO] Starting supervisor with
> id 45b27917-4ca0-4d96-8727-914909e3ac47 at host jtorbiak-ws.nj.invidi.com
> 2016-01-27T17:04:11.673-0700 b.s.event [ERROR] Error when processing event
> java.lang.RuntimeException: java.io.StreamCorruptedException: invalid stream
> header: 00000000
> at
> backtype.storm.serialization.DefaultSerializationDelegate.deserialize(DefaultSerializationDelegate.java:56)
> ~[storm-core-0.9.4.jar:0.9.4]
> at backtype.storm.utils.Utils.deserialize(Utils.java:89)
> ~[storm-core-0.9.4.jar:0.9.4]
> at
> backtype.storm.utils.LocalState.deserializeLatestVersion(LocalState.java:65)
> ~[storm-core-0.9.4.jar:0.9.4]
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:47)
> ~[storm-core-0.9.4.jar:0.9.4]
> at backtype.storm.utils.LocalState.get(LocalState.java:72)
> ~[storm-core-0.9.4.jar:0.9.4]
> at
> backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:234)
> ~[storm-core-0.9.4.jar:0.9.4]
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
> at clojure.core$partial$fn__4190.doInvoke(core.clj:2396)
> ~[clojure-1.5.1.jar:na]
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
> at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:40)
> ~[storm-core-0.9.4.jar:0.9.4]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> Caused by: java.io.StreamCorruptedException: invalid stream header: 00000000
> at
> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
> ~[na:1.8.0_45]
> at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
> ~[na:1.8.0_45]
> at
> backtype.storm.serialization.DefaultSerializationDelegate.deserialize(DefaultSerializationDelegate.java:51)
> ~[storm-core-0.9.4.jar:0.9.4]
> ... 13 common frames omitted
> 2016-01-27T17:04:11.674-0700 b.s.util [ERROR] Halting process: ("Error when
> processing an event")
> java.lang.RuntimeException: ("Error when processing an event")
> at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> [storm-core-0.9.4.jar:0.9.4]
> at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
> at backtype.storm.event$event_manager$fn__2809.invoke(event.clj:48)
> [storm-core-0.9.4.jar:0.9.4]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
> 2016-01-27T17:04:11.695-0700 b.s.d.supervisor [INFO] Shutting down supervisor
> 45b27917-4ca0-4d96-8727-914909e3ac47
> {noformat}
> This is very similar to STORM-307, except the {{LocalState}} files contain
> NULs instead of being empty. I'm guessing this corruption type is specific to
> Windows.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)