gianm opened a new pull request, #15681:
URL: https://github.com/apache/druid/pull/15681

   Three changes:
   
   1) Reworked FastLineIterator to optionally avoid generating Strings
      entirely, and reduce copying somewhat. Benefits the line-oriented
      JSON, CSV, delimited (TSV), and regex formats.
   
   2) In the delimited (TSV) format, when the delimiter is a single byte,
      split on UTF-8 bytes directly.
   
   3) In CSV and delimited (TSV) formats, use list-based input rows when
      the column list is provided upfront by the user.
   
   Benchmarks below. Findings:
   
   - `JsonLineReaderBenchmark` only benefits from change (1), and got a 15% 
improvement.
   - `DelimitedInputFormatBenchmark` with `fromHeader: true` benefits from (1) 
and (2), and got a 22% improvement.
   - `DelimitedInputFormatBenchmark` with `fromHeader: false` benefits from all 
three changes, and got a 30% improvement.
   
   ```
   Benchmark                               (fromHeader)  Mode  Cnt     Score    
Error  Units
   DelimitedInputFormatBenchmark.baseline         false  avgt    5  1912.257 ± 
39.227  us/op [master]
   DelimitedInputFormatBenchmark.baseline          true  avgt    5  1953.915 ± 
44.787  us/op [master]
   JsonLineReaderBenchmark.baseline                      avgt    5  2055.294 ± 
28.688  us/op [master]
   
   DelimitedInputFormatBenchmark.baseline         false  avgt    5  1321.142 ± 
10.115  us/op [patch]
   DelimitedInputFormatBenchmark.baseline          true  avgt    5  1506.412 ± 
15.892  us/op [patch]
   JsonLineReaderBenchmark.baseline                      avgt    5  1734.426 ± 
38.518  us/op [patch]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to