parthchandra commented on PR #4019:
URL:
https://github.com/apache/datafusion-comet/pull/4019#issuecomment-4301141544
More Claude analysis on schema mismatch. Claude recommends we explicitly
check that the following tests fail with a different message instead of
actually succeeding (because the results will be wrong) -
| Description | Test Name(s) | Diffs |
|-------------|--------------|-------|
| `binary -> timestamp` | `SPARK-35640: read binary as timestamp should
throw schema incompatible error` | 3.4.3, 3.5.8, 4.0.1 |
| `timestamp_ntz -> array<timestamp_ntz>` | `SPARK-45604: schema mismatch
failure error on timestamp_ntz to array<timestamp_ntz>` | 3.4.3, 3.5.8, 4.0.1 |
| `string read as int` | `schema mismatch failure error message for
parquet vectorized reader` | 3.4.3, 3.5.8, 4.0.1 |
| decimal precision/scale overflow | `SPARK-34212 Parquet should read
decimals correctly` | 3.4.3, 3.5.8, 4.0.1 |
| int->bigint row group overflow | `row group skipping doesn't overflow
when reading into larger type` | 3.4.3, 3.5.8, 4.0.1 |
| int read as long (schema evolution disabled) | `SPARK-35640: int as long
should throw schema incompatible error` | 3.4.3, 3.5.8 |
| `TimestampLTZ -> TimestampNTZ` | `SPARK-47447: read TimestampLTZ as
TimestampNTZ` (4.0.1), `SPARK-36182: can't read TimestampLTZ as TimestampNTZ`
(3.4.3, 3.5.8) | 3.4.3, 3.5.8, 4.0.1 |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]