-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6683/#review10513
-----------------------------------------------------------
Nice patch! This looks to remove many of the probabilistic test failures of
testRestartLogReplayV{1,2} which is exactly what I was hoping for! A couple of
items to work on below but overall I think the approach is sound.
== Review items ==
1) OK, this fixes the big TestFileChannel.testRestartLogReplayV{1,2} failure
mode, that is lost puts and takes. With the fix it still fails eventually to
replay the logs. The reason I believe this is true is that we can have this
scenario:
put
checkpoint (put is written to in flights)
commit
replay
the put is written out in the inflight puts file and then on replay it's added
to the transaction map and put back into the queue, but it was also in the
queue at checkpoint time. I was able to get the test to pass 170 times in a row
by adding a queue remove in the replayLog method:
transactionMap.put(txnID, FlumeEventPointer.fromLong(eventPointer));
queue.remove(FlumeEventPointer.fromLong(eventPointer));
That is, if it's truly inflight, then a commit has not occurred and the record
will be added to the queue when the commit has is replayed.
2) The failure I got after 170 runs was caused by this scenerio:
put
checkpoint (put is written to inflights)
commit
checkpoint (no in flights and as such inflight files are not updated, thus have
old data)
replay
After commenting out:
if(values.isEmpty()){
return;
}
in the serializeAndWrite method, the test ran 306 times in a row without
failing.
3) I a little unsure of the inflight take logic.
take
checkpoint
commit
On replay, put the take back in the queue and then skip ahead to the
checkpoint. At that point we replay the commit the but the commit has no seen
the takes so it will not remove them the queue?
== Wishlist ==
Since we are planning on making the rest of the file format more extensible,
would you be opposed to using protocol buffers for these two files? That way
we wouldn't have to upgrade when we integrate this with FLUME-1487. Basically
you could copy the protocol buffers generation code from FLUME-1487. In that
change we stop doing random writes to files so we'd have two files:
inflighttakes and inflighttakes.meta where the meta file would have the checksum
This might be a .proto file which would work.
message InFlightTransactions {
repeated InFlightTransaction transactions = 1;
}
message InFlightTransaction {
required sfixed64 transactionID = 1;
repeated sfixed64 pointers = 3;
}
message InFlightTransactionsMetaData {
required bytes checksum = 1;
}
with changes
v1 9.5%
fail 20
success 211
v2 7.5%
fail 7
success 93
without
v1 3.9%
fail 5
success 127
v2 5.2%
fail 5
success 95
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
<https://reviews.apache.org/r/6683/#comment22502>
Not a bad check, but I am not entirely sure this is needed since we use the
semaphore to control capacity. Do you see a whole in that logic?
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22501>
Do we have to expose this inner class since the only time this is used the
deserialize method is immediately called
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22492>
Can we move the main to the bottom?
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22493>
whats the FIXME?
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22503>
can you just print the contents of the inflight takes/puts as well?
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22494>
Does this class have to be public and have public methods?
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22495>
future gives generic warning
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22496>
future.get is never called so any exception will not be propagated
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22497>
Exception catching is too wide
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22498>
If we have no inflights this time, we will *not* overwrite the previous
files which means they will get replayed later down the line.
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22500>
we never close, why are we rechecking to see if it's open? Also does
fileChannel need to be volatile?
flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
<https://reviews.apache.org/r/6683/#comment22499>
loooong line
- Brock Noland
On Aug. 18, 2012, 8:40 a.m., Hari Shreedharan wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6683/
> -----------------------------------------------------------
>
> (Updated Aug. 18, 2012, 8:40 a.m.)
>
>
> Review request for Flume and Brock Noland.
>
>
> Description
> -------
>
> Flume Event Queue now keeps a copy of the event pointers to uncommitted puts
> and takes. It serializes these on every checkpoint and deserializes these on
> replay and reinserts these into either the event queue(for takes) or to the
> replay queue(for puts).
>
> I could have used the PutList and TakeList of the transaction for this, but I
> didn't really like the approach. I don't want to be sharing this kind of data
> between multiple layers, since that makes it complex to change the
> FlumeEventQueue implementation without causing major changes in
> FileBackedTransaction. Also it would lead to a number of cross layer calls to
> read data - which makes the approach less clean.
> With my current approach, by localizing most changes to the FlumeEventQueue
> class, only a couple of function calls would need to be removed/modified.
> Agreed that this is going to be some memory overhead, but this is
> insignificant compared to the event queue size itself. This would be hardly a
> few MB extra in memory - but if that gives me cleaner implementation, I would
> prefer that.
>
>
> This addresses bug FLUME-1437.
> https://issues.apache.org/jira/browse/FLUME-1437
>
>
> Diffs
> -----
>
>
> flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FileChannel.java
> e7735e8
>
> flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/FlumeEventQueue.java
> 9bfee2d
>
> flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/Log.java
> 11f1e1f
>
> flume-ng-channels/flume-file-channel/src/main/java/org/apache/flume/channel/file/ReplayHandler.java
> bbca62c
>
> flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestCheckpoint.java
> 7ec5916
>
> flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFileChannel.java
> 1d5a0f9
>
> flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestFlumeEventQueue.java
> 569b7c7
>
> flume-ng-channels/flume-file-channel/src/test/java/org/apache/flume/channel/file/TestLog.java
> e0b5e3f
>
> Diff: https://reviews.apache.org/r/6683/diff/
>
>
> Testing
> -------
>
> Added 4 new unit tests (2 to TestFileChannel.java to test the actual use
> case, and 2 to TestFlumeEventQueue.java to test the actual functionality of
> serialization/deserialization).
>
>
> Thanks,
>
> Hari Shreedharan
>
>