[
https://issues.apache.org/jira/browse/SPARK-57164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085015#comment-18085015
]
Marcus Lin commented on SPARK-57164:
------------------------------------
I’d like to work on this issue.
> Add parser test coverage for nanosecond-capable timestamp types across all
> data-type string entry points
> --------------------------------------------------------------------------------------------------------
>
> Key: SPARK-57164
> URL: https://issues.apache.org/jira/browse/SPARK-57164
> Project: Spark
> Issue Type: Sub-task
> Components: SQL, Tests
> Affects Versions: 4.3.0
> Reporter: Max Gekk
> Priority: Minor
> Labels: starter
>
> h2. What
> Add focused test coverage asserting that the nanosecond-capable timestamp
> spellings ({{TIMESTAMP_NTZ(p)}}, {{TIMESTAMP_LTZ(p)}}, and the
> {{TIMESTAMP(p) WITH[OUT] [LOCAL] TIME ZONE}} aliases, p in [7, 9]) parse
> consistently across every public string-to-DataType entry point, and that
> out-of-range precisions are rejected identically everywhere.
> h2. Why
> This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond
> precision).
> Spark parses data-type strings through two independent parser families:
> * *Family A - ANTLR {{DataTypeAstBuilder}}*: the bare/zoned {{TIMESTAMP(p)}}
> handling lives in one place, but it is reached through many distinct public
> surfaces (see below). Each surface is a separate user-facing contract.
> * *Family B - JSON {{nameToType}}* in {{DataType.scala}}: a second,
> hand-maintained parser with its own {{TIMESTAMP_LTZ_NANOS_TYPE}} /
> {{TIMESTAMP_NTZ_NANOS_TYPE}} regex branches. This is where precision/error
> semantics can silently drift from Family A.
> Today the nanos parsing is exercised mainly via
> {{CatalystSqlParser.parseDataType}} in {{DataTypeParserSuite}}. The other
> public
> entry points have no explicit assertions, so a regression on any one of them
> (or drift between Family A and Family B) would go unnoticed.
> h2. Entry points to cover
> Family A (ANTLR {{DataTypeAstBuilder}}):
> * {{DataType.fromDDL}} and {{StructType.fromDDL}}
> * {{StructType.add(name, "TIMESTAMP_NTZ(9)")}}
> * {{Column.cast(String)}} and {{Column.try_cast(String)}}
> * {{DataFrameReader.schema(String)}} (and {{DataStreamReader.schema(String)}})
> * DDL/SQL schema strings passed to {{from_json}} / {{from_csv}}
> * SQL via the full {{AstBuilder}}: {{CAST(x AS TIMESTAMP_NTZ(9))}},
> {{CREATE TABLE ... c TIMESTAMP_LTZ(7)}}
> Family B (JSON):
> * {{DataType.fromJson}} / {{DataTypeJsonUtils}} round-trip
> ({{typeName}}/{{json}} <-> {{DataType}})
> h2. Acceptance criteria
> * For p in {7, 8, 9}, every entry point above resolves:
> ** {{TIMESTAMP_NTZ(p)}} -> {{TimestampNTZNanosType(p)}}
> ** {{TIMESTAMP_LTZ(p)}} -> {{TimestampLTZNanosType(p)}}
> ** {{TIMESTAMP(p) WITHOUT TIME ZONE}} -> {{TimestampNTZNanosType(p)}}
> ** {{TIMESTAMP(p) WITH LOCAL TIME ZONE}} -> {{TimestampLTZNanosType(p)}}
> * All entry points reject out-of-range precision (e.g. {{(6)}}, {{(10)}})
> with {{INVALID_TIMESTAMP_PRECISION}}, with identical parameters across
> Family A and Family B. (If the separate {{TIMESTAMP_*(6)}} mapping task has
> landed, update the {{(6)}} expectations to the microsecond types instead.)
> * All entry points reject the spellings with {{FEATURE_NOT_ENABLED}} when
> {{spark.sql.timestampNanosTypes.enabled = false}}.
> * A round-trip test confirms Family B agrees with Family A:
> {{DataType.fromJson(t.json)}} == {{t}} for the nanos types, and the
> {{typeName}} of a nanos type re-parses to the same type.
> h2. Where to add tests
> * {{sql/catalyst/.../parser/DataTypeParserSuite.scala}} - {{fromDDL}},
> {{StructType.fromDDL}}, {{StructType.add(String)}}.
> * {{sql/catalyst/.../types/DataTypeSuite.scala}} - {{fromJson}}/{{json}}
> round-trip (Family B).
> * {{Column.cast(String)}} / {{DataFrameReader.schema(String)}} /
> {{from_json}} DDL-schema cases in the appropriate {{sql/core}} suite
> (gated by the preview flag via {{withSQLConf}}).
> h2. Out of scope
> * Behavior changes. This task only adds assertions for the current contract
> (any intended behavior change for {{p}} = 6 is handled by its own task).
> * Spark Connect proto conversion (tracked separately under SPARK-57160 /
> SPARK-57161).
> h2. Notes for first-time contributors
> Good first issue - test-only, no production code changes. Enable the preview
> flag in tests with:
> {code}
> withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") { ... }
> {code}
> Run an affected suite with SBT:
> {code}
> build/sbt 'catalyst/testOnly *DataTypeParserSuite *DataTypeSuite'
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]