[
https://issues.apache.org/jira/browse/HADOOP-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560033#action_12560033
]
Doug Cutting commented on HADOOP-2608:
--------------------------------------
We might also look to see whether
org.apache.hadoop.record.Utils.fromBinaryString could be made any faster. What
happens if this just does 'new String(bytes, "UTF-8")'? Is the problem our
homegrown UTF-8 decoder, or UTF-8 decoding in general? It'd be nice to return
org.apache.io.Text instead, since that permits many string operations w/o
decoding UTF-8, but that'd be a bigger change.
> Reading sequence file consumes 100% cpu with maximum throughput being about
> 5MB/sec per process
> -----------------------------------------------------------------------------------------------
>
> Key: HADOOP-2608
> URL: https://issues.apache.org/jira/browse/HADOOP-2608
> Project: Hadoop
> Issue Type: Improvement
> Components: io
> Reporter: Runping Qi
>
> I did some tests on the throughput of scanning block-compressed sequence
> files.
> The sustained throughput was bounded at 5MB/sec per process, with the cpu of
> each process maxed at 100%.
> It seems to me that the cpu consumption is too high and the throughput is too
> low for just scanning files.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.