[ 
https://issues.apache.org/jira/browse/PARQUET-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17697840#comment-17697840
 ] 

ASF GitHub Bot commented on PARQUET-2202:
-----------------------------------------

jerolba commented on code in PR #1035:
URL: https://github.com/apache/parquet-mr/pull/1035#discussion_r1129276294


##########
parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java:
##########
@@ -477,15 +477,15 @@ public Builder withMaxBloomFilterBytes(int 
maxBloomFilterBytes) {
      * @return this builder for method chaining
      */
     public Builder withBloomFilterNDV(String columnPath, long ndv) {
-      Preconditions.checkArgument(ndv > 0, "Invalid NDV for column \"%s\": 
%d", columnPath, ndv);
+      Preconditions.checkArgument(ndv > 0, "Invalid NDV for column \"%s\": 
%s", columnPath, ndv);
       this.bloomFilterNDVs.withValue(columnPath, ndv);
       // Setting an NDV for a column implies writing a bloom filter
       this.bloomFilterEnabled.withValue(columnPath, true);
       return this;
     }
 
     public Builder withBloomFilterFPP(String columnPath, double fpp) {
-      Preconditions.checkArgument(fpp > 0.0 && fpp < 1.0, "Invalid FPP for 
column \"%s\": %d", columnPath, fpp);
+      Preconditions.checkArgument(fpp > 0.0 && fpp < 1.0, "Invalid FPP for 
column \"%s\": %s", columnPath, fpp);

Review Comment:
   The changed code was incorrect and in case of failure in the check, the code 
throws a `java.util.IllegalFormatConversionException`
   
   The existing [code behind checkArgument 
method](https://github.com/jerolba/parquet-mr/blob/5078cc629074046b5699544c33256ca980ab2761/parquet-common/src/main/java/org/apache/parquet/Preconditions.java#L243)
 converts all parameters to String before applying the String.format.
   
   If String.format finds an incorrect type, throws 
`IllegalFormatConversionException`. 
   
   For example try this:
   ```java
   String output = String.format("Integer value: %d", "10");
   ```
   
   Then, existing Preconditions implementation forces to use Strings messages 
with %s.
   
   I reviewed all calls to `checkArgument` to fix the incorrect format.





> Redundant String allocation on the hot path in 
> CapacityByteArrayOutputStream.setByte
> ------------------------------------------------------------------------------------
>
>                 Key: PARQUET-2202
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2202
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-mr
>    Affects Versions: 1.12.3
>            Reporter: Andrei Pangin
>            Priority: Major
>              Labels: performance
>         Attachments: profile-alloc.png, profile-cpu.png
>
>
> Profiling of a Spark application revealed a performance issue in production:
> {{CapacityByteArrayOutputStream.setByte}} consumed 2.2% of total CPU time and 
> made up 4.6% of total allocations. However, in normal case, this method 
> should allocate nothing at all.
> Here is an excerpt from async-profiler report.
> CPU profile:
> !profile-cpu.png|width=560!
> Allocation profile:
> !profile-alloc.png|width=560!
> The reason is a {{checkArgument()}} call with an unconditionally constructed 
> dynamic String:
> [https://github.com/apache/parquet-mr/blob/62b774cd0f0c60cfbe540bbfa60bee15929af5d4/parquet-common/src/main/java/org/apache/parquet/bytes/CapacityByteArrayOutputStream.java#L303]
> The suggested fix is to move String construction under the condition:
> {code:java}
> if (index >= bytesUsed) {
>   throw new IllegalArgumentException("Index: " + index +
>       " is >= the current size of: " + bytesUsed);
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to