[ 
https://issues.apache.org/jira/browse/AVRO-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603413#comment-13603413
 ] 

Yin Huai commented on AVRO-1208:
--------------------------------

Here are some comparisons between RCFile and Trevni. I mainly focus on the data 
reading.

Based on current implementations, RCFile and Trevni have three major 
differences.
# The row group size in RCFile is configurable. Trevni uses a single row group 
for a Trevni file (so applications need to horizontally partition the table to 
multiple Trevni files, and the size of a Trevni file needs to be less than the 
HDFS block size. Am i right?).
# When reading needed columns of a row group, RCFile loads these columns at 
once. So, applications need to wait the I/O (on reading all needed columns in a 
row group) before accessing any row in this row group. In a Trevni file, a 
column is stored by many small blocks which are compression units. When Trevni 
needs to read data from disks, applications only wait for Trevni to read a few 
blocks before accessing a row.
# When reading needed columns of a row group, RCFile loads these columns in a 
column by column fashion. For Trevni, applications need to decide how to read 
needed columns. They can read data in a row by row fashion or in a column by 
column fashion.

For a given table, applications need to set a suitable row group size for 
RCFile. A small row group size will cause a small size of a column in a row 
group (a column in a row group is stored contiguously). Many seeks will degrade 
the performance of data reading (this is described in Trevni specification). 
Also, a small row group size can cause a read buffer contain data from unneeded 
columns and cause the OS readahead less effective (cannot asynchronous fetch 
data from needed columns).

To overcome the low I/O efficiency of RCFile, a large row group size can be 
used. However, RCFile needs to read all needed columns of a row group at once. 
In this way, CPU and I/O may not be effectively overlapped (less benefit from 
OS asynchronous readahead). Suppose that applications explicitly stores a table 
to multiple RCFile files and every file has a single row group. When 
applications process data in a file, it will be blocked until all needed data 
is loaded from disks. In this example, we will first wait on I/O and then wait 
on CPU.

For the third difference, a large row group size in RCFile imply a higher I/O 
performance since all needed columns in a row group are read in a column by 
column fashion. But for Trevni, since applications usually read data in needed 
columns in a row by row fashion (seems AvroColumnReader reads data in a row by 
row fashion, and Hive and Pig integration of Trevni relies on this reader), the 
throughput of reading data stored in Trevni can be significantly degraded 
(cased by unnecessary disk seeks). 
                
> Improve Trevni's performance on row-oriented data access
> --------------------------------------------------------
>
>                 Key: AVRO-1208
>                 URL: https://issues.apache.org/jira/browse/AVRO-1208
>             Project: Avro
>          Issue Type: Improvement
>    Affects Versions: 1.7.3
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>         Attachments: AVRO-1208.1.patch, AVRO-1208.2.patch
>
>
> Trevni uses an 64KB internal buffer to store values of a column. When 
> accessing a column, it reads 64KB (if we do not consider compression and 
> checksum) data from the storage layer. However, when the table is accessed in 
> a row-oriented fashion (a entire row needs to be handed over to the upper 
> layer), in the worst case (a full table scan and values of this table are all 
> the same size), every 64KB data read can cause a seek.
> This jira is used to discuss if we should consider the data access pattern 
> mentioned above and if so, how to improve the performance of Trevni. 
> Row-oriented data processing engines, e.g. Hive, can benefit from this work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to