LantaoJin opened a new pull request, #81:
URL: https://github.com/apache/datafusion-java/pull/81

   ## Which issue does this PR close?
   
   - Closes #79 .
   
   ## Rationale for this change
   
   Every error from the native side currently surfaces as a single 
`java.lang.RuntimeException`. `try_unwrap_or_throw` in `native/src/errors.rs` 
calls `env.throw_new("java/lang/RuntimeException", message)` for every 
`Err(...)` it sees, so a `DataFusionError::Plan` and a 
`DataFusionError::ResourcesExhausted` and a `DataFusionError::IoError` are 
indistinguishable on the Java side. Callers that want to retry on transient 
resource pressure but not on a malformed query plan have only the message 
string to discriminate on — fragile and brittle to upstream wording changes.
   
   This PR adds a typed Java exception hierarchy and routes each 
`DataFusionError` variant onto the matching subclass at the JNI layer, so 
callers can `catch (PlanException)` / `catch (ResourcesExhaustedException)` 
selectively without scraping messages.
   
   The hierarchy is shallow on purpose. Multiple Rust variants fold into one 
Java class when they're the same kind of problem from the caller's standpoint 
(e.g. `IoError` / `ObjectStore` / `ParquetError` / `AvroError` all surface as 
`IoException`). Variants with no clean caller-facing category — `Internal`, 
`Substrait`, `Collection`, plus any new variant a future DataFusion bump 
introduces — fall through to the parent `DataFusionException`. The mapping is 
the public contract; the underlying variant set is not.
   
   `ArrowError` gets per-variant treatment because it's a 21-variant grab bag 
that mixes IO with execution-time failures. Routing the whole `ArrowError` arm 
to a single class would put e.g. `SELECT 1/0` (`ArrowError::DivideByZero`) 
under `IoException`, which is wrong from a caller's standpoint. A small 
`classify_arrow` helper splits the variants: `IoError` / `IpcError` go to 
`IoException`, `SchemaError` / `ParseError` go to `PlanException`, and the 
arithmetic / cast / compute variants go to `ExecutionException`.
   
   Sequenced before upstream **#55** ("propagate Java stack traces from JVM 
upcalls into `DataFusionError`"), which is complementary: every subclass ships 
a `(String, Throwable)` constructor as the seam #55 plugs into for setting the 
original Java throwable as `cause`.
   
   ## What changes are included in this PR?
   
   - Java side: new public class `DataFusionException` (extends 
`RuntimeException`) plus 6 subclasses — `PlanException`, `ExecutionException`, 
`ResourcesExhaustedException`, `IoException`, `NotImplementedException`, 
`ConfigurationException`. Each ships `(String)` and `(String, Throwable)` 
constructors.
   - Native side: `errors.rs` downcasts the boxed error to `DataFusionError`, 
walks the cause chain via upstream's 
[`DataFusionError::find_root()`](https://docs.rs/datafusion/53.1.0/datafusion/error/enum.DataFusionError.html#method.find_root),
 and routes the root variant through a `classify()` helper to the right Java 
class. The thrown message preserves the *outer* error's full chain so wrapping 
context isn't lost; only the *class* is picked from the inner variant. 
`JniResult<T>` is unchanged (`Box<dyn Error + Send + Sync>`) — every `?` site 
keeps working as-is. Rust panics surface as the parent class with a `panic: ` 
prefix.
   - `find_root()` matters because DataFusion can wrap a `ResourcesExhausted` / 
`Plan` / etc. inside an `ArrowError::ExternalError(...)` or a `Context(...)` 
chain (upstream documents this exact shape). Without `find_root`, those would 
land on the outer arm and be misclassified.
   - `ArrowError` gets a dedicated `classify_arrow` helper because the enum 
mixes IO (`IoError`, `IpcError`) with execution-time variants (`DivideByZero`, 
`ArithmeticOverflow`, `ComputeError`, `CastError`, `InvalidArgumentError`, 
`MemoryError`, `External`, …) and schema-shaped variants (`SchemaError`, 
`ParseError`). Each lands on the matching Java class instead of all collapsing 
to `IoException`.
   - Variants without a clean category (`Internal`, `Substrait`, `Collection`, 
future variants) fall through to the parent class.
   - `native/Cargo.toml` adds `rlib` alongside `cdylib` to enable Rust unit 
tests that pin the `find_root` routing behavior on synthetic wrapper chains; 
the `cdylib` artifact the JVM loads is unchanged.
   
   ## Are these changes tested?
   
   Yes. 10 new Java tests + 7 new Rust unit tests.
   
   ## Are there any user-facing changes?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to