parthchandra commented on PR #4019:
URL:
https://github.com/apache/datafusion-comet/pull/4019#issuecomment-4301099715
My general recommendation would be that we enable ignored tests before
dropping `native_iceberg_compat`.
Also, the description of #3720 seems to indicate that issue is more complex
than just mismatched error messages (I can be convinced that it is not a
serious problem after all). We do have the framework to match Spark error
messages but perhaps it is not necessary to have error messages match exactly.
Here's Claude's summary of ignored tests -
### 1. `IgnoreCometNativeDataFusion` — skipped for native_datafusion and
auto
#### AdaptiveQueryExecSuite
([#3321](https://github.com/apache/datafusion-comet/issues/3321))
| Test Name | Diffs |
|-----------|-------|
| `join key with multiple references on the filtering plan` | 4.0.1 |
| `SPARK-43402: FileSourceScanExec supports push down data filter with
scalar subquery` | 4.0.1 |
| `alter temporary view should follow current storeAnalyzedPlanForView
config` | 4.0.1 |
#### AdaptiveQueryExecSuite
([#3442](https://github.com/apache/datafusion-comet/issues/3442))
| Test Name | Diffs |
|-----------|-------|
| `static scan metrics` | 3.4.3, 3.5.8, 4.0.1 |
#### FileBasedDataSourceSuite
([#3321](https://github.com/apache/datafusion-comet/issues/3321))
| Test Name | Diffs |
|-----------|-------|
| `Enabling/disabling ignoreMissingFiles using parquet` (conditionally
tagged only when `format == "parquet"`) | 4.0.1 |
| `Enabling/disabling ignoreCorruptFiles` | 4.0.1 |
#### ParquetFilterSuite (3.4.3 only)
| Test Name | Diffs |
|-----------|-------|
| `filter pushdown - StringPredicate` (tagged
`IgnoreCometNativeDataFusion` in 3.4.3; `IgnoreCometNativeScan` in 3.5.8/4.0.1)
| 3.4.3 |
#### ParquetSchemaSuite
([#3720](https://github.com/apache/datafusion-comet/issues/3720))
| Test Name | Diffs |
|-----------|-------|
| `SPARK-35640: read binary as timestamp should throw schema incompatible
error` | 3.4.3, 3.5.8, 4.0.1 |
| `SPARK-35640: int as long should throw schema incompatible error` |
3.4.3, 3.5.8 |
| `SPARK-47447: read TimestampLTZ as TimestampNTZ` | 4.0.1 |
| `SPARK-36182: can't read TimestampLTZ as TimestampNTZ` | 3.4.3, 3.5.8 |
| `SPARK-34212 Parquet should read decimals correctly` | 3.4.3, 3.5.8,
4.0.1 |
| `row group skipping doesn't overflow when reading into larger type` |
3.4.3, 3.5.8, 4.0.1 |
#### ParquetSchemaEvolutionSuite
([#3720](https://github.com/apache/datafusion-comet/issues/3720))
| Test Name | Diffs |
|-----------|-------|
| `schema mismatch failure error message for parquet vectorized reader` |
3.4.3, 3.5.8, 4.0.1 |
| `SPARK-45604: schema mismatch failure error on timestamp_ntz to
array<timestamp_ntz>` | 3.4.3, 3.5.8, 4.0.1 |
#### ParquetTypeWideningSuite
([#3321](https://github.com/apache/datafusion-comet/issues/3321))
| Test Name | Diffs |
|-----------|-------|
| `parquet widening conversion DateType -> TimestampNTZType`
(conditionally tagged) | 4.0.1 |
| `unsupported parquet conversion $fromType -> $toType` (multiple type
combos) | 4.0.1 |
| `unsupported parquet timestamp conversion $fromType
($outputTimestampType) -> $toType` | 4.0.1 |
| `parquet decimal precision change Decimal($fromPrecision, 2) ->
Decimal($toPrecision, 2)` | 4.0.1 |
| `parquet decimal precision and scale change Decimal($fromPrecision,
$fromScale) -> Decimal($toPrecision, $toScale)` | 4.0.1 |
---
### 2. `assume()` — runtime skip
#### ParquetRowIndexSuite
([#3886](https://github.com/apache/datafusion-comet/issues/3886)) — 4.0.1 only
| Test Name | Condition |
|-----------|-----------|
| `invalid row index column type - ${conf.desc}` | Skipped when
`COMET_NATIVE_SCAN_IMPL` is `SCAN_NATIVE_DATAFUSION` or `SCAN_AUTO`. Comet
throws `RuntimeException` instead of `SparkException`. |
#### CometExpressionSuite — Comet's own test suite
| Test Name | Condition |
|-----------|-----------|
| `get_struct_field - select primitive fields` | Skipped when `scanImpl ==
SCAN_AUTO && Spark 4.0+` |
| `get_struct_field - select subset of struct` | Skipped when `scanImpl ==
SCAN_AUTO && Spark 4.0+` |
| `get_struct_field - read entire struct` | Skipped when `scanImpl ==
SCAN_AUTO && Spark 4.0+` |
---
### Summary by Tracking Issue
| Issue | Count | Description |
|-------|-------|-------------|
| [#3321](https://github.com/apache/datafusion-comet/issues/3321) | ~12 |
Schema evolution, corrupt/missing files, AQE, type widening |
| [#3720](https://github.com/apache/datafusion-comet/issues/3720) | ~8 |
Schema mismatch errors, decimal reads, row group skipping |
| [#3442](https://github.com/apache/datafusion-comet/issues/3442) | 1 |
Static scan metrics with DPP |
| [#3886](https://github.com/apache/datafusion-comet/issues/3886) | 1 |
Row index column type error type mismatch |
| (no issue) | 5 | Filter pushdown / accumulator tests
(`IgnoreCometNativeScan`) |
| (no issue) | 3 | `get_struct_field` tests (auto + Spark 4.0+ only) |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]