[
https://issues.apache.org/jira/browse/HIVE-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13642745#comment-13642745
]
Gopal V commented on HIVE-4423:
-------------------------------
|| split location || before || after ||
| store_sales/000000_0:67108864+67108864 | 748 ms | 81 ms |
| store_sales/000002_0:67108864+67108864 | 966 ms | 54 ms |
| store_sales/000004_0:67108864+67108864 | 948 ms | 51 ms |
| store_sales/000006_0:67108864+67108864 | 922 ms | 42 ms |
| store_sales/000008_0:67108864+67108864 | 842 ms | 40 ms |
| store_sales/000010_0:67108864+67108864 | 1302 ms | 82 ms |
| store_sales/000012_0:67108864+67108864 | 989 ms | 50 ms |
| store_sales/000014_0:67108864+67108864 | 970 ms | 43 ms |
| store_sales/000001_0:67108864+67108864 | 829 ms | 47 ms |
| store_sales/000003_0:67108864+67108864 | 811 ms | 43 ms |
| store_sales/000007_0:67108864+67108864 | 865 ms | 51 ms |
| store_sales/000005_0:67108864+67108864 | 1042 ms | 59 ms |
| store_sales/000009_0:67108864+67108864 | 902 ms | 39 ms |
| store_sales/000011_0:67108864+67108864 | 1046 ms | 42 ms |
| store_sales/000013_0:67108864+67108864 | 1048 ms | 44 ms |
As expected, the function is faster by an order of magnitude & fast enough to
not need more optimization in the inner sync.length for loop.
Over all, the query was faster by 2+ seconds for a 28 second query (since we
have 8 slots and 15 mappers, so that's expected).
> Improve RCFile::sync(long) 10x
> ------------------------------
>
> Key: HIVE-4423
> URL: https://issues.apache.org/jira/browse/HIVE-4423
> Project: Hive
> Issue Type: Improvement
> Environment: Ubuntu LXC (1 SSD, 1 disk, 32 gigs of RAM)
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Minor
> Fix For: 0.11.0
>
> Attachments: HIVE-4423.patch
>
>
> RCFile::sync(long) takes approx ~1 second everytime it gets called because of
> the inner loops in the function.
> From what was observed with HDFS-4710, single byte reads are an order of
> magnitude slower than larger 512 byte buffer reads.
> Even when disk I/O is buffered to this size, there is overhead due to the
> synchronized read() methods in BlockReaderLocal & RemoteBlockReader classes.
> Removing the readByte() calls in RCFile.sync(long) with a readFully(512 byte)
> call will speed this function >10x.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira