[ 
https://issues.apache.org/jira/browse/AVRO-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566860#comment-13566860
 ] 

Daniel Russel commented on AVRO-1182:
-------------------------------------

I had been thinking of an API more like
- void seekBytes(size_t offset); // seek to the start of the first block that 
does not start before offset by seeking there and then scanning for a sync mark
- size_t offsetBytes() const; // get the current offset in the file
- size_t sizeBytes() const; // get the size of the file

That would provide (I think)
- constant time access to objects deep in the file
- allow the construction of indexes for the data file by, for example, seeking 
at each i/1000 of the file, saving the resulting offset (and extracted 
identifier from the object)

The cost would be that you have lower precision (finding the nth record 
requires that you be able to identify it and, possibly, do a search) and be 
able to identify objects based solely on the context (as determining its index 
in the file would still require a linear scan). Also it requires that the 
reader be able to compute the size of the stream, something that cannot 
currently be done.


                
> DataFileReader missing seek, sync methods
> -----------------------------------------
>
>                 Key: AVRO-1182
>                 URL: https://issues.apache.org/jira/browse/AVRO-1182
>             Project: Avro
>          Issue Type: Improvement
>          Components: c++
>    Affects Versions: 1.7.2
>            Reporter: Daniel Russel
>
> The DataFileReader is missing the seek and sync methods that are found in the 
> java version making it hard to navigate a file except in a linear fashion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to