[ https://issues.apache.org/jira/browse/AVRO-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566860#comment-13566860 ]
Daniel Russel commented on AVRO-1182: ------------------------------------- I had been thinking of an API more like - void seekBytes(size_t offset); // seek to the start of the first block that does not start before offset by seeking there and then scanning for a sync mark - size_t offsetBytes() const; // get the current offset in the file - size_t sizeBytes() const; // get the size of the file That would provide (I think) - constant time access to objects deep in the file - allow the construction of indexes for the data file by, for example, seeking at each i/1000 of the file, saving the resulting offset (and extracted identifier from the object) The cost would be that you have lower precision (finding the nth record requires that you be able to identify it and, possibly, do a search) and be able to identify objects based solely on the context (as determining its index in the file would still require a linear scan). Also it requires that the reader be able to compute the size of the stream, something that cannot currently be done. > DataFileReader missing seek, sync methods > ----------------------------------------- > > Key: AVRO-1182 > URL: https://issues.apache.org/jira/browse/AVRO-1182 > Project: Avro > Issue Type: Improvement > Components: c++ > Affects Versions: 1.7.2 > Reporter: Daniel Russel > > The DataFileReader is missing the seek and sync methods that are found in the > java version making it hard to navigate a file except in a linear fashion. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira