LantaoJin opened a new pull request, #81:
URL: https://github.com/apache/datafusion-java/pull/81
## Which issue does this PR close?
- Closes #79 .
## Rationale for this change
Every error from the native side currently surfaces as a single
`java.lang.RuntimeException`. `try_unwrap_or_throw` in `native/src/errors.rs`
calls `env.throw_new("java/lang/RuntimeException", message)` for every
`Err(...)` it sees, so a `DataFusionError::Plan` and a
`DataFusionError::ResourcesExhausted` and a `DataFusionError::IoError` are
indistinguishable on the Java side. Callers that want to retry on transient
resource pressure but not on a malformed query plan have only the message
string to discriminate on — fragile and brittle to upstream wording changes.
This PR adds a typed Java exception hierarchy and routes each
`DataFusionError` variant onto the matching subclass at the JNI layer, so
callers can `catch (PlanException)` / `catch (ResourcesExhaustedException)`
selectively without scraping messages.
The hierarchy is shallow on purpose. Multiple Rust variants fold into one
Java class when they're the same kind of problem from the caller's standpoint
(e.g. `IoError` / `ObjectStore` / `ParquetError` / `AvroError` all surface as
`IoException`). Variants with no clean caller-facing category — `Internal`,
`Substrait`, `Collection`, plus any new variant a future DataFusion bump
introduces — fall through to the parent `DataFusionException`. The mapping is
the public contract; the underlying variant set is not.
`ArrowError` gets per-variant treatment because it's a 21-variant grab bag
that mixes IO with execution-time failures. Routing the whole `ArrowError` arm
to a single class would put e.g. `SELECT 1/0` (`ArrowError::DivideByZero`)
under `IoException`, which is wrong from a caller's standpoint. A small
`classify_arrow` helper splits the variants: `IoError` / `IpcError` go to
`IoException`, `SchemaError` / `ParseError` go to `PlanException`, and the
arithmetic / cast / compute variants go to `ExecutionException`.
Sequenced before upstream **#55** ("propagate Java stack traces from JVM
upcalls into `DataFusionError`"), which is complementary: every subclass ships
a `(String, Throwable)` constructor as the seam #55 plugs into for setting the
original Java throwable as `cause`.
## What changes are included in this PR?
- Java side: new public class `DataFusionException` (extends
`RuntimeException`) plus 6 subclasses — `PlanException`, `ExecutionException`,
`ResourcesExhaustedException`, `IoException`, `NotImplementedException`,
`ConfigurationException`. Each ships `(String)` and `(String, Throwable)`
constructors.
- Native side: `errors.rs` downcasts the boxed error to `DataFusionError`,
walks the cause chain via upstream's
[`DataFusionError::find_root()`](https://docs.rs/datafusion/53.1.0/datafusion/error/enum.DataFusionError.html#method.find_root),
and routes the root variant through a `classify()` helper to the right Java
class. The thrown message preserves the *outer* error's full chain so wrapping
context isn't lost; only the *class* is picked from the inner variant.
`JniResult<T>` is unchanged (`Box<dyn Error + Send + Sync>`) — every `?` site
keeps working as-is. Rust panics surface as the parent class with a `panic: `
prefix.
- `find_root()` matters because DataFusion can wrap a `ResourcesExhausted` /
`Plan` / etc. inside an `ArrowError::ExternalError(...)` or a `Context(...)`
chain (upstream documents this exact shape). Without `find_root`, those would
land on the outer arm and be misclassified.
- `ArrowError` gets a dedicated `classify_arrow` helper because the enum
mixes IO (`IoError`, `IpcError`) with execution-time variants (`DivideByZero`,
`ArithmeticOverflow`, `ComputeError`, `CastError`, `InvalidArgumentError`,
`MemoryError`, `External`, …) and schema-shaped variants (`SchemaError`,
`ParseError`). Each lands on the matching Java class instead of all collapsing
to `IoException`.
- Variants without a clean category (`Internal`, `Substrait`, `Collection`,
future variants) fall through to the parent class.
- `native/Cargo.toml` adds `rlib` alongside `cdylib` to enable Rust unit
tests that pin the `find_root` routing behavior on synthetic wrapper chains;
the `cdylib` artifact the JVM loads is unchanged.
## Are these changes tested?
Yes. 10 new Java tests + 7 new Rust unit tests.
## Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]