[ 
https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226755#comment-14226755
 ] 

Colin Patrick McCabe commented on HDFS-7446:
--------------------------------------------

bq. It feels like we have a mismatch between the underlying data and our 
objects. The need for the VHS-rewind in getTxidBatchSize is one example, what 
we really want there is an iterator of EditEvents, with one EditEvents per txid 
(name is just a suggestion).

Basically, the code in {{getTxidBatchSize}} is converting between what we get 
back from the NameNode RPC (a bunch of Events, not grouped by txid) to what we 
want to return (a batch of events that all have the same txid).

I guess we could do the conversion earlier, in 
{{PBHelper#convert(GetEditsFromTxidResponseProto resp)}}.  This would mean 
pushing the batch concept into {{EventsList}}.  I'm not sure there's much 
benefit in this, though.  {{EventsList}} is used on the {{NameNode}} as well, 
and the NN doesn't care about grouping Events of the same txid into batches.  
But we'd have to implement all that logic anyway if we went down this route.  
(The NN has its own concept of batching... how many Events it sends back in a 
single RPC... and of course it will not split a txid across RPCs.  But that's 
the only batching we get out of it.)

I admit that rewinding the iterator feels awkward.  I was looking for a utility 
function in {{Iterators}} or something that would help with this, but I didn't 
find anything.  Java doesn't have the concept of duplicating iterators that C++ 
does, so you can't just make a copy of the iterator and then push only that 
copy forward.  I wanted to avoid allocating an array until I knew what size I 
wanted.  I suppose I could create a {{LinkedList}} rather than an array, and 
avoid the rewinding that way.  That's not very efficient, though.  One slightly 
awkward function feels like an acceptable tradeoff for memory efficiency.

bq. The txid could also be moved into EditEvents which would also save some 
bytes.

Most batches are small-- I think the median size will be a batch of size 1, 
with maybe some outliers at 2, 3, or 4.  I don't think there's a lot of memory 
savings to be had by moving the txid into {{EditEvents}}.

A bigger issue is that we need (well maybe not need, but the code will be 
really, really awkward if we don't) to add txid to the protobuf {{EventProto}} 
structure in {{inotify.proto}}.  This is because there can be "gaps" in the 
txids returned by {{EventsListProto}}.  Currently, all {{EventsListProto}} has 
is firstTxid, lastTxid, and a list of Events.  But you don't know what the txid 
of any of those Events actually is... if firstTxid is 100 and lastTxid is 200, 
and you have 50 events, where are they in that sequence?  Who knows.  And if 
you add this to the PB, it's extremely awkward not to add it to the Java 
{{Event}} class as well.

> HDFS inotify should have the ability to determine what txid it has read up to
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-7446
>                 URL: https://issues.apache.org/jira/browse/HDFS-7446
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfsclient
>    Affects Versions: 2.6.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7446.001.patch
>
>
> HDFS inotify should have the ability to determine what txid it has read up 
> to.  This will allow users who want to avoid missing any events to record 
> this txid and use it to resume reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to