eyalleshem opened a new pull request, #2136:
URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2136
This PR implements zero-copy tokenization by using borrowed strings
(`&str`) instead of owned strings (`String`) for identifiers, string literals,
and comments. This eliminates unnecessary string allocations during the
tokenization
process.
## Changes
- Modified `Token` variants to store `&'a str` instead of `String` for:
- `Word` tokens (identifiers like table/column names)
- `SingleQuotedString` literals
- `Whitespace`
- Comments (single-line and multi-line)
- Implemented case-insensitive keyword lookup without `to_uppercase()`
allocation
- Added `tokenize_bench` criterion benchmark for performance measurement
## Performance Impact
Benchmark results using a complex 27KB SQL query with CTEs, joins, window
functions, and extensive comments:
tokenization/tokenize_complex_sql
time: [254.68 µs 254.81 µs 254.97 µs]
change: [−60.885% −60.682% −60.482%] (p = 0.00 < 0.05)
Performance has improved.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]