[ https://issues.apache.org/jira/browse/STORM-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693521#comment-14693521 ]
ASF GitHub Bot commented on STORM-837: -------------------------------------- Github user revans2 commented on the pull request: https://github.com/apache/storm/pull/644#issuecomment-130315702 For me internally the tests pass and everything builds. Although the rest of storm seems a bit unstable with tests. I think the changes look fine +1 for merging this in. I don't think we have any split brain problems because of how we route messages, so the begin command and commit messages should prevent this. There are some things that I think we can do better, but are not wrong. 1) I am scared that the transaction file is going to put more of a load on the Name Node than I like. I am mostly concerned that if we have thousands of instances of hdfs state creating/deleting transaction files multiple times a second the load is going to noticeable on a large cluster. But this is a premature optimization so I would say we don't address it unless we see it as an actual problem. 2) The RotationActions don't have a recovery interface, and things like the MoveFile action could potentially fail after successfully rotating the file which might leave the file in the wrong directory. 3) I don't really like the special case for the TimedRotationAction.start. I would rather see start a part of the RotationAction interface, that can be ignored by those that don't want/need it. The last two could perhaps be a follow on JIRA. I am fine with the code as is, but I would like another pair of eyes looking at this too before merging it in. > HdfsState ignores commits > ------------------------- > > Key: STORM-837 > URL: https://issues.apache.org/jira/browse/STORM-837 > Project: Apache Storm > Issue Type: Bug > Reporter: Robert Joseph Evans > Assignee: Arun Mahadevan > Priority: Critical > > HdfsState works with trident which is supposed to provide exactly once > processing. It does this two ways, first by informing the state about > commits so it can be sure the data is written out, and second by having a > commit id, so that double commits can be handled. > HdfsState ignores the beginCommit and commit calls, and with that ignores the > ids. This means that if you use HdfsState and your worker crashes you may > both lose data and get some data twice. > At a minimum the flush and file rotation should be tied to the commit in some > way. The commit ID should at a minimum be written out with the data so > someone reading the data can have a hope of deduping it themselves. > Also with the rotationActions it is possible for a file that was partially > written is leaked, and never moved to the final location, because it is not > rotated. I personally think the actions are too generic for this case and > need to be deprecated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)