[jira] [Commented] (SPARK-57164) Add parser test coverage for nanosecond-capable timestamp types across all data-type string entry points

Marcus Lin (Jira) Sun, 31 May 2026 12:30:22 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-57164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18085015#comment-18085015
 ]


Marcus Lin commented on SPARK-57164:
------------------------------------

I’d like to work on this issue.

> Add parser test coverage for nanosecond-capable timestamp types across all 
> data-type string entry points
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-57164
>                 URL: https://issues.apache.org/jira/browse/SPARK-57164
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Tests
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Minor
>              Labels: starter
>
> h2. What
> Add focused test coverage asserting that the nanosecond-capable timestamp
> spellings ({{TIMESTAMP_NTZ(p)}}, {{TIMESTAMP_LTZ(p)}}, and the
> {{TIMESTAMP(p) WITH[OUT] [LOCAL] TIME ZONE}} aliases, p in [7, 9]) parse
> consistently across every public string-to-DataType entry point, and that
> out-of-range precisions are rejected identically everywhere.
> h2. Why
> This is a sub-task of SPARK-56822 (SPIP: Timestamps with nanosecond 
> precision).
> Spark parses data-type strings through two independent parser families:
> * *Family A - ANTLR {{DataTypeAstBuilder}}*: the bare/zoned {{TIMESTAMP(p)}}
>   handling lives in one place, but it is reached through many distinct public
>   surfaces (see below). Each surface is a separate user-facing contract.
> * *Family B - JSON {{nameToType}}* in {{DataType.scala}}: a second,
>   hand-maintained parser with its own {{TIMESTAMP_LTZ_NANOS_TYPE}} /
>   {{TIMESTAMP_NTZ_NANOS_TYPE}} regex branches. This is where precision/error
>   semantics can silently drift from Family A.
> Today the nanos parsing is exercised mainly via
> {{CatalystSqlParser.parseDataType}} in {{DataTypeParserSuite}}. The other 
> public
> entry points have no explicit assertions, so a regression on any one of them
> (or drift between Family A and Family B) would go unnoticed.
> h2. Entry points to cover
> Family A (ANTLR {{DataTypeAstBuilder}}):
> * {{DataType.fromDDL}} and {{StructType.fromDDL}}
> * {{StructType.add(name, "TIMESTAMP_NTZ(9)")}}
> * {{Column.cast(String)}} and {{Column.try_cast(String)}}
> * {{DataFrameReader.schema(String)}} (and {{DataStreamReader.schema(String)}})
> * DDL/SQL schema strings passed to {{from_json}} / {{from_csv}}
> * SQL via the full {{AstBuilder}}: {{CAST(x AS TIMESTAMP_NTZ(9))}},
>   {{CREATE TABLE ... c TIMESTAMP_LTZ(7)}}
> Family B (JSON):
> * {{DataType.fromJson}} / {{DataTypeJsonUtils}} round-trip
>   ({{typeName}}/{{json}} <-> {{DataType}})
> h2. Acceptance criteria
> * For p in {7, 8, 9}, every entry point above resolves:
> ** {{TIMESTAMP_NTZ(p)}} -> {{TimestampNTZNanosType(p)}}
> ** {{TIMESTAMP_LTZ(p)}} -> {{TimestampLTZNanosType(p)}}
> ** {{TIMESTAMP(p) WITHOUT TIME ZONE}} -> {{TimestampNTZNanosType(p)}}
> ** {{TIMESTAMP(p) WITH LOCAL TIME ZONE}} -> {{TimestampLTZNanosType(p)}}
> * All entry points reject out-of-range precision (e.g. {{(6)}}, {{(10)}})
>   with {{INVALID_TIMESTAMP_PRECISION}}, with identical parameters across
>   Family A and Family B. (If the separate {{TIMESTAMP_*(6)}} mapping task has
>   landed, update the {{(6)}} expectations to the microsecond types instead.)
> * All entry points reject the spellings with {{FEATURE_NOT_ENABLED}} when
>   {{spark.sql.timestampNanosTypes.enabled = false}}.
> * A round-trip test confirms Family B agrees with Family A:
>   {{DataType.fromJson(t.json)}} == {{t}} for the nanos types, and the
>   {{typeName}} of a nanos type re-parses to the same type.
> h2. Where to add tests
> * {{sql/catalyst/.../parser/DataTypeParserSuite.scala}} - {{fromDDL}},
>   {{StructType.fromDDL}}, {{StructType.add(String)}}.
> * {{sql/catalyst/.../types/DataTypeSuite.scala}} - {{fromJson}}/{{json}}
>   round-trip (Family B).
> * {{Column.cast(String)}} / {{DataFrameReader.schema(String)}} /
>   {{from_json}} DDL-schema cases in the appropriate {{sql/core}} suite
>   (gated by the preview flag via {{withSQLConf}}).
> h2. Out of scope
> * Behavior changes. This task only adds assertions for the current contract
>   (any intended behavior change for {{p}} = 6 is handled by its own task).
> * Spark Connect proto conversion (tracked separately under SPARK-57160 /
>   SPARK-57161).
> h2. Notes for first-time contributors
> Good first issue - test-only, no production code changes. Enable the preview
> flag in tests with:
> {code}
> withSQLConf(SQLConf.TIMESTAMP_NANOS_TYPES_ENABLED.key -> "true") { ... }
> {code}
> Run an affected suite with SBT:
> {code}
> build/sbt 'catalyst/testOnly *DataTypeParserSuite *DataTypeSuite'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-57164) Add parser test coverage for nanosecond-capable timestamp types across all data-type string entry points

Reply via email to