[ https://issues.apache.org/jira/browse/AVRO-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569222#comment-13569222 ]
Yin Huai commented on AVRO-1208: -------------------------------- Also, in the test mentioned above, using startBlockWithPrefetch to only read 1 block (<num of prefetched blocks>=0) make the throughput drops from ~70 MiB/s (original Trevni) to ~66 MiB/s. > Improve Trevni's performance on row-oriented data access > -------------------------------------------------------- > > Key: AVRO-1208 > URL: https://issues.apache.org/jira/browse/AVRO-1208 > Project: Avro > Issue Type: Improvement > Affects Versions: 1.7.3 > Reporter: Yin Huai > Assignee: Yin Huai > Attachments: AVRO-1208.1.patch > > > Trevni uses an 64KB internal buffer to store values of a column. When > accessing a column, it reads 64KB (if we do not consider compression and > checksum) data from the storage layer. However, when the table is accessed in > a row-oriented fashion (a entire row needs to be handed over to the upper > layer), in the worst case (a full table scan and values of this table are all > the same size), every 64KB data read can cause a seek. > This jira is used to discuss if we should consider the data access pattern > mentioned above and if so, how to improve the performance of Trevni. > Row-oriented data processing engines, e.g. Hive, can benefit from this work. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira