andygrove opened a new pull request, #29: URL: https://github.com/apache/datafusion-java/pull/29
## Which issue does this PR close? No tracking issue — follow-up to #28. ## Rationale for this change PR #28 established protobuf-over-JNI as the transport for `SessionContext` configuration. This PR applies the same pattern to the CSV and Parquet read paths. Before this change, registering or reading a CSV file passed 14–16 raw JNI arguments — booleans, byte values, nullable-encoded as `xxx_set` / `xxx_value` pairs, `-1L` sentinels for "unset" longs, and `FileCompressionType` shipped as its `name()` string. Parquet had the same shape with 7–9 args. After: each call takes a single serialized `CsvReadOptionsProto` / `ParquetReadOptionsProto` byte array plus an optional Arrow-IPC schema byte array. Nullability, enums, and field evolution are now native to the wire format. The contributor guide documents the proto-over-JNI convention so future structured JNI calls follow the same pattern. ## What changes are included in this PR? - New `proto/csv_read_options.proto` and `proto/parquet_read_options.proto`, mirroring the structure of `session_options.proto`. `FileCompressionType` is a proto3 enum with prefixed values and a `_UNSPECIFIED = 0` sentinel. - `CsvReadOptions.toBytes()` and `ParquetReadOptions.toBytes()` serialize the Java options through the generated builders. - `with_csv_options` and `with_parquet_options` on the Rust side decode the proto via prost and fold the fields into DataFusion's option structs. The `Unspecified` compression arm returns an error rather than silently defaulting. - Four JNI methods collapse to 4 or 5 arguments each: `(handle, [name,] path, byte[] optionsBytes, byte[] schemaIpcBytesOrNull)`. - New `native/src/schema.rs::decode_optional_schema` replaces two copies of identical Arrow-IPC schema-decode logic. - Renamed Rust module `session_options` → `proto_gen` since the single generated file now contains the types for all three protos (they share `package datafusion_java;`). - New contributor-guide section `Passing structured options across the JNI boundary` documents the convention, including proto3 enum-prefix and `_UNSPECIFIED = 0` requirements. The public Java API is unchanged: every public setter on `CsvReadOptions` / `ParquetReadOptions` and every `register*` / `read*` method on `SessionContext` keeps the same signature. ## Are these changes tested? Yes: - `CsvReadOptionsTest` (4 tests) and `ParquetReadOptionsTest` (3 tests) round-trip through `toBytes()` / `Proto.parseFrom(...)`, verifying every field, default presence/absence, and all five `FileCompressionType` values. - The existing `SessionContextCsvTest` and `SessionContextParquetOptionsTest` continue to exercise the public API end-to-end through JNI without modification — strong evidence that the new wire format reaches the Rust side correctly. - Full `./mvnw test` passes (49 run, 0 failed, 12 skipped — skips are pre-existing tpch-data integration tests). - `cd native && cargo build && cargo clippy --all-targets -- -D warnings && cargo fmt --check` clean. ## Are there any user-facing changes? No. Public Java API is unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
