[
https://issues.apache.org/jira/browse/PARQUET-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697843#comment-17697843
]
ASF GitHub Bot commented on PARQUET-2202:
-----------------------------------------
jerolba commented on PR #1035:
URL: https://github.com/apache/parquet-mr/pull/1035#issuecomment-1460008854
> Is the change exhaustive?
Yes, I reviewed all calls to `checkArgument` to fix incorrect usage of %f
and %d
> Redundant String allocation on the hot path in
> CapacityByteArrayOutputStream.setByte
> ------------------------------------------------------------------------------------
>
> Key: PARQUET-2202
> URL: https://issues.apache.org/jira/browse/PARQUET-2202
> Project: Parquet
> Issue Type: Bug
> Components: parquet-mr
> Affects Versions: 1.12.3
> Reporter: Andrei Pangin
> Priority: Major
> Labels: performance
> Attachments: profile-alloc.png, profile-cpu.png
>
>
> Profiling of a Spark application revealed a performance issue in production:
> {{CapacityByteArrayOutputStream.setByte}} consumed 2.2% of total CPU time and
> made up 4.6% of total allocations. However, in normal case, this method
> should allocate nothing at all.
> Here is an excerpt from async-profiler report.
> CPU profile:
> !profile-cpu.png|width=560!
> Allocation profile:
> !profile-alloc.png|width=560!
> The reason is a {{checkArgument()}} call with an unconditionally constructed
> dynamic String:
> [https://github.com/apache/parquet-mr/blob/62b774cd0f0c60cfbe540bbfa60bee15929af5d4/parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java#L303]
> The suggested fix is to move String construction under the condition:
> {code:java}
> if (index >= bytesUsed) {
> throw new IllegalArgumentException("Index: " + index +
> " is >= the current size of: " + bytesUsed);
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)