kpumuk opened a new pull request, #3398: URL: https://github.com/apache/thrift/pull/3398
<!-- Explain the changes in the pull request below: --> While working on serialization depth limits improvement, noticed a quick performance win in a hot path when writing binary strings to memory buffer or socket. For strings that are already have `ASCII-8BIT` encoding, C extension will call into Ruby, which will return the original string. Even more, Ruby code would always duplicate the string if it is frozen, even when the encoding is already binary, which is unnecessary. Instead, this PR re-implements Ruby land's `Thrift::Bytes.force_binary_encoding` in C, and corrects Ruby logic to not duplicate strings unneceserily. ## Benchmark Benchmarked using ```bash test/rb/benchmarks/protocol_benchmark.rb` with `--large-runs 5 --small-runs 50000 ``` ### Accelerated binary scenarios only | Scenario | Before | After | Delta | | --- | ---: | ---: | ---: | | c binary write large (1MB) structure 5 times | 0.758s | 0.473s | -37.6% | | c binary read large (1MB) structure 5 times | 0.546s | 0.544s | -0.2% | | c binary write 50000 small structures | 0.372s | 0.236s | -36.4% | | c binary read 50000 small structures | 0.228s | 0.224s | -1.4% | | **c-only suite total** | **1.901s** | **1.484s** | **-21.9%** | | **c-only write total** | **1.130s** | **0.709s** | **-37.2%** | | **c-only read total** | **0.773s** | **0.769s** | **-0.5%** | ### Full suite summary | Metric | Before | After | Delta | | --- | ---: | ---: | ---: | | Full suite total | 35.912s | 33.222s | -7.5% | | Write scenarios total | 16.282s | 13.473s | -17.3% | | Read scenarios total | 19.635s | 19.579s | -0.3% | ### Selected write-path highlights | Scenario | Before | After | Delta | | --- | ---: | ---: | ---: | | ruby binary write large (1MB) structure 5 times | 1.286s | 0.999s | -22.3% | | c binary write large (1MB) structure 5 times | 0.741s | 0.473s | -36.2% | | ruby compact write large (1MB) structure 5 times | 0.760s | 0.505s | -33.7% | | ruby json write large (1MB) structure 5 times | 6.387s | 5.366s | -16.0% | | ruby binary write 50000 small structures | 0.611s | 0.486s | -20.5% | | c binary write 50000 small structures | 0.365s | 0.237s | -35.0% | | ruby compact write 50000 small structures | 0.377s | 0.258s | -31.5% | | ruby json write 50000 small structures | 2.973s | 2.439s | -18.0% | Read-side scenarios stayed essentially flat overall, while write-side allocation counts remained unchanged, which is consistent with reducing write-path overhead rather than reducing allocation volume. ### Serializer / Deserializer benchmark This also brings a huge win for `Thrift::Serializer`: | Operation | Before | After | Delta | Allocations Before | Allocations After | | --- | ---: | ---: | ---: | ---: | ---: | | `Thrift::Serializer` with `CompactProtocolFactory` (`50000` serializations) | 0.407s | 0.259s | -36.3% | 1,800,031 | 1,800,031 | | `Thrift::Deserializer` with `CompactProtocolFactory` (`50000` deserializations) | 0.251s | 0.252s | +0.8% | 850,007 | 850,007 | <!-- We recommend you review the checklist/tips before submitting a pull request. --> - [x] Did you create an [Apache Jira](https://issues.apache.org/jira/projects/THRIFT/issues/) ticket? [THRIFT-5948](https://issues.apache.org/jira/browse/THRIFT-5948) - [x] If a ticket exists: Does your pull request title follow the pattern "THRIFT-NNNN: describe my issue"? - [x] Did you squash your changes to a single commit? (not required, but preferred) - [x] Did you do your best to avoid breaking changes? If one was needed, did you label the Jira ticket with "Breaking-Change"? - [ ] If your change does not involve any code, include `[skip ci]` anywhere in the commit message to free up build resources. <!-- The Contributing Guide at: https://github.com/apache/thrift/blob/master/CONTRIBUTING.md has more details and tips for committing properly. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
