[jira] [Commented] (CSV-277) Review Lexer simpleToken for Performance

Martin (Jira) Mon, 26 Aug 2024 03:45:06 -0700


    [ 
https://issues.apache.org/jira/browse/CSV-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17876670#comment-17876670
 ]


Martin commented on CSV-277:
----------------------------

I have also performance issues with the current common-csv 1.5 and 1.11 library.

Especially with Java21 the performance is worse than with Java11, because the 
deprecated synchronization optimization in the jvm was removed in Java18.
This means the synchronized blocks within the BufferedReader does still need 
more time as before.
see [https://bugs.openjdk.org/browse/JDK-8235256]

One solution could be that the ExtendedBufferReader is not derived from 
BufferedReader. A BufferedReader instance can be created within 
ExtendedBufferedReader and the ExtendedBufferedReader delegates the read 
operations to the BufferedReader instance. In this case the BufferedReader uses 
optimized lock operations instead of synchronized blocks. The lock operations 
are faster than synchronized blocks.

But indeed a better solution would be another reader without synchronization.


Can this issue get more priorisation?

> Review Lexer simpleToken for Performance
> ----------------------------------------
>
>                 Key: CSV-277
>                 URL: https://issues.apache.org/jira/browse/CSV-277
>             Project: Commons CSV
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Priority: Major
>         Attachments: CSVCapture.PNG
>
>
> Running the Apache ORC benchmarks which has {{commons-csv}} as a dependency 
> and noticed the bulk of running time is in {{commons-csv}}.
> I attached the VisualVM output and here is my test setup:
> {code:none}
> JVM: OpenJDK 64-Bit Server VM (25.292-b10, mixed mode)
> Java: version 1.8.0_292, vendor Private Build
> Java Home: /usr/lib/jvm/java-8-openjdk-amd64/jre
> JVM Flags: <none>
> {code}
> I suspect this is in part because {{ExtendedBufferedReader}} extends 
> {{BufferedReader}}. {{BufferedReader}} is a synchronized method class which 
> means that every call to {{read}} requires synchronization.  Usually it's not 
> an issue, but for {{commons-csv}}, it adds a lot of overhead because it reads 
> each byte one-at-a-time.  So even though it's buffered, it has to go through 
> a synchronization processes for each byte read.  It also has to perform a 
> "jump" into the parent class for each byte.
> Nothing else stands out to me as being "slow."



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CSV-277) Review Lexer simpleToken for Performance

Reply via email to