jonbjo opened a new pull request, #97:
URL: https://github.com/apache/datasketches-rust/pull/97

   ## Summary
   
   Add binary serialization support for Theta sketches, compatible with Apache 
DataSketches Java and C++ implementations.
   
   ## Changes
   
   - Add `CompactThetaSketch` type with `serialize()` and `deserialize()` 
methods
   - Add convenience methods on `ThetaSketch`: `compact()`, `serialize()`, 
`deserialize()`
   - Support all compact sketch formats: empty, single-item, exact mode, and 
estimation mode
   - Handle legacy `seed_hash=0` format for backward compatibility
   - Add cross-language compatibility tests using Java-generated test data
   
   ## Motivation
   
   This enables reading and writing Theta sketches in formats used by Iceberg 
Puffin files.
   
   ## Limitations
   
   This PR focuses on the standard compact format. Identified features not 
included:
   
   - **Compressed format** - Java's `toByteArrayCompressed()` uses bit-packing 
for smaller size; not supported
   - **Non-compact format** - Only compact sketches can be deserialized
   - **Single-item serialization optimization** - We deserialize single-item 
format but always serialize using the standard exact-mode format (functionally 
correct, just slightly larger for single-item sketches)
   
   These could be added in follow-up PRs if needed.
   
   ## Testing
   
   - Unit tests for serialization round-trips
   - Cross-language compatibility tests that deserialize sketches generated by 
`datasketches-java`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to