I'm not sure if this has been discussed before, but I think it'd be great if there were a way to keep track of failed workers/bolts. Occasionally, a bolt will restart, for various reasons (generally either timing out or a NPE inside of a bolt), and it is rather difficult to actually locate what happened, and why. Part of this is because as soon as it fails, it restarts, and you no longer know where the failure occurred (what worker/port).
I think the UI should be augmented to show failed bolts, along with the reason it failed (ie. timed out, etc), and a link to the log-viewer tied to the point in time that the failure occurred. Does anyone else have similar issues? I may be able to work on this a bit, but I'm not sure how difficult it would be to implement this -- much of this would have to be drawn from nimbus, which may or may not even have the information.
