[ 
https://issues.apache.org/jira/browse/SPARK-57165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-57165:
-----------------------------
    Description: 
h2. Summary

Extend the test-only {{LiteralGenerator}} (in
{{sql/catalyst/src/test/.../expressions/LiteralGenerator.scala}}) to produce
random {{Literal}}s for the nanosecond-capable timestamp types
{{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}} (p in [7, 9]).

Test code only - no user-facing API change.

h2. Background

{{LiteralGenerator.randomGen(dt)}} is the literal source for ScalaCheck
property checks across expression suites (interpreted-vs-codegen consistency via
{{ExpressionEvalHelper}}, ordering/predicate/hash suites, etc.). Today it only
handles the microsecond timestamp types and throws for everything else:

{code}
case TimestampType => timestampLiteralGen
case TimestampNTZType => timestampNTZLiteralGen
...
case dt => throw new IllegalArgumentException(s"not supported type $dt")
{code}

So {{randomGen(TimestampNTZNanosType(9))}} / 
{{randomGen(TimestampLTZNanosType(7))}}
currently throw {{IllegalArgumentException}}, and no property-based suite can
exercise the nanos types.

Two further limitations to address:
* No nanosecond literal generator exists at all.
* The existing micro generators derive from {{millisGen}} (millisecond-grained),
  so they never produce sub-millisecond fractional digits. The new generators
  must produce full sub-microsecond variation.

The row/value-level counterpart ({{RandomDataGenerator}}) and the shared
{{TimestampNanosTestUtils}} helper / {{specialNanosTs}} corpus were already 
added
by SPARK-57034; this ticket is the expression-literal counterpart and should
reuse those helpers where practical.

h2. Scope

# Add {{timestampLTZNanosLiteralGen(precision: Int)}} and
  {{timestampNTZNanosLiteralGen(precision: Int)}} producing
  {{Literal}}s whose Catalyst value is
  {{org.apache.spark.unsafe.types.TimestampNanosVal(epochMicros, nanosOfMicro)}}
  with the matching data type. (Construct the literal with the internal
  {{TimestampNanosVal}}; do not rely on java.time external conversion, which is
  tracked separately under SPARK-57033.)

# # Wire them into {{randomGen}}:
{code}
case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
{code}

# Value distribution:
** {{epochMicros}}: reuse the existing valid-range bounds
   ([0001-01-01 .. 9999-12-31]) used by the micro generators.
** {{nanosOfMicro}}: random in [0, 999], biased to include the edge values
   {0, 1, 999}.
** Respect the declared precision {{p}} so generated values are valid for the
   type: p=7 -> {{nanosOfMicro}} multiple of 100, p=8 -> multiple of 10,
   p=9 -> any value in [0, 999].
** Mix in entries from {{TimestampNanosTestUtils.specialNanosTs}} (SPARK-57034).

# # Keep all values normalized ({{nanosOfMicro}} in [0, 999]).


h2. Acceptance criteria

* For p in {7, 8, 9}, {{randomGen(TimestampNTZNanosType(p))}} and
  {{randomGen(TimestampLTZNanosType(p))}} return generators that produce
  {{Literal}}s of the correct type carrying {{TimestampNanosVal}} values with
  visible nanosecond variation (and edge values {0, 1, 999} appearing).
* Generated values are valid for the declared precision and normalized.
* Existing {{randomGen}} cases for {{TimestampType}} / {{TimestampNTZType}} are
  unchanged.
* At least one property-based suite is extended (or a small targeted test added)
  to confirm a nanos type round-trips through interpreted vs codegen evaluation
  using the new generator.

h2. Out of scope

* {{RandomDataGenerator}} and {{TimestampNanosTestUtils}} (already delivered by
  SPARK-57034).
* Any production code or behavior change.

h2. Notes for first-time contributors

Good first issue - test-only. Run an affected suite with SBT, e.g.:

{code}
build/sbt 'catalyst/testOnly *LiteralExpressionSuite'
{code}

  was:
h2. Summary

Extend the test-only {{LiteralGenerator}} (in
{{sql/catalyst/src/test/.../expressions/LiteralGenerator.scala}}) to produce
random {{Literal}}s for the nanosecond-capable timestamp types
{{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}} (p in [7, 9]).

Test code only - no user-facing API change.

h2. Background

{{LiteralGenerator.randomGen(dt)}} is the literal source for ScalaCheck
property checks across expression suites (interpreted-vs-codegen consistency via
{{ExpressionEvalHelper}}, ordering/predicate/hash suites, etc.). Today it only
handles the microsecond timestamp types and throws for everything else:

{code}
case TimestampType => timestampLiteralGen
case TimestampNTZType => timestampNTZLiteralGen
...
case dt => throw new IllegalArgumentException(s"not supported type $dt")
{code}

So {{randomGen(TimestampNTZNanosType(9))}} / 
{{randomGen(TimestampLTZNanosType(7))}}
currently throw {{IllegalArgumentException}}, and no property-based suite can
exercise the nanos types.

Two further limitations to address:
* No nanosecond literal generator exists at all.
* The existing micro generators derive from {{millisGen}} (millisecond-grained),
  so they never produce sub-millisecond fractional digits. The new generators
  must produce full sub-microsecond variation.

The row/value-level counterpart ({{RandomDataGenerator}}) and the shared
{{TimestampNanosTestUtils}} helper / {{specialNanosTs}} corpus were already 
added
by SPARK-57034; this ticket is the expression-literal counterpart and should
reuse those helpers where practical.

h2. Scope

# Add {{timestampLTZNanosLiteralGen(precision: Int)}} and
  {{timestampNTZNanosLiteralGen(precision: Int)}} producing
  {{Literal}}s whose Catalyst value is
  {{org.apache.spark.unsafe.types.TimestampNanosVal(epochMicros, nanosOfMicro)}}
  with the matching data type. (Construct the literal with the internal
  {{TimestampNanosVal}}; do not rely on java.time external conversion, which is
  tracked separately under SPARK-57033.)
# Wire them into {{randomGen}}:
{code}
case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
{code}
# Value distribution:
** {{epochMicros}}: reuse the existing valid-range bounds
   ([0001-01-01 .. 9999-12-31]) used by the micro generators.
** {{nanosOfMicro}}: random in [0, 999], biased to include the edge values
   {0, 1, 999}.
** Respect the declared precision {{p}} so generated values are valid for the
   type: p=7 -> {{nanosOfMicro}} multiple of 100, p=8 -> multiple of 10,
   p=9 -> any value in [0, 999].
** Mix in entries from {{TimestampNanosTestUtils.specialNanosTs}} (SPARK-57034).
# Keep all values normalized ({{nanosOfMicro}} in [0, 999]).

h2. Acceptance criteria

* For p in {7, 8, 9}, {{randomGen(TimestampNTZNanosType(p))}} and
  {{randomGen(TimestampLTZNanosType(p))}} return generators that produce
  {{Literal}}s of the correct type carrying {{TimestampNanosVal}} values with
  visible nanosecond variation (and edge values {0, 1, 999} appearing).
* Generated values are valid for the declared precision and normalized.
* Existing {{randomGen}} cases for {{TimestampType}} / {{TimestampNTZType}} are
  unchanged.
* At least one property-based suite is extended (or a small targeted test added)
  to confirm a nanos type round-trips through interpreted vs codegen evaluation
  using the new generator.

h2. Out of scope

* {{RandomDataGenerator}} and {{TimestampNanosTestUtils}} (already delivered by
  SPARK-57034).
* Any production code or behavior change.

h2. Notes for first-time contributors

Good first issue - test-only. Run an affected suite with SBT, e.g.:

{code}
build/sbt 'catalyst/testOnly *LiteralExpressionSuite'
{code}


> Add LiteralGenerator support for nanosecond-capable timestamp types
> -------------------------------------------------------------------
>
>                 Key: SPARK-57165
>                 URL: https://issues.apache.org/jira/browse/SPARK-57165
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL, Tests
>    Affects Versions: 4.3.0
>            Reporter: Max Gekk
>            Priority: Minor
>
> h2. Summary
> Extend the test-only {{LiteralGenerator}} (in
> {{sql/catalyst/src/test/.../expressions/LiteralGenerator.scala}}) to produce
> random {{Literal}}s for the nanosecond-capable timestamp types
> {{TimestampNTZNanosType(p)}} and {{TimestampLTZNanosType(p)}} (p in [7, 9]).
> Test code only - no user-facing API change.
> h2. Background
> {{LiteralGenerator.randomGen(dt)}} is the literal source for ScalaCheck
> property checks across expression suites (interpreted-vs-codegen consistency 
> via
> {{ExpressionEvalHelper}}, ordering/predicate/hash suites, etc.). Today it only
> handles the microsecond timestamp types and throws for everything else:
> {code}
> case TimestampType => timestampLiteralGen
> case TimestampNTZType => timestampNTZLiteralGen
> ...
> case dt => throw new IllegalArgumentException(s"not supported type $dt")
> {code}
> So {{randomGen(TimestampNTZNanosType(9))}} / 
> {{randomGen(TimestampLTZNanosType(7))}}
> currently throw {{IllegalArgumentException}}, and no property-based suite can
> exercise the nanos types.
> Two further limitations to address:
> * No nanosecond literal generator exists at all.
> * The existing micro generators derive from {{millisGen}} 
> (millisecond-grained),
>   so they never produce sub-millisecond fractional digits. The new generators
>   must produce full sub-microsecond variation.
> The row/value-level counterpart ({{RandomDataGenerator}}) and the shared
> {{TimestampNanosTestUtils}} helper / {{specialNanosTs}} corpus were already 
> added
> by SPARK-57034; this ticket is the expression-literal counterpart and should
> reuse those helpers where practical.
> h2. Scope
> # Add {{timestampLTZNanosLiteralGen(precision: Int)}} and
>   {{timestampNTZNanosLiteralGen(precision: Int)}} producing
>   {{Literal}}s whose Catalyst value is
>   {{org.apache.spark.unsafe.types.TimestampNanosVal(epochMicros, 
> nanosOfMicro)}}
>   with the matching data type. (Construct the literal with the internal
>   {{TimestampNanosVal}}; do not rely on java.time external conversion, which 
> is
>   tracked separately under SPARK-57033.)
> # # Wire them into {{randomGen}}:
> {code}
> case t: TimestampNTZNanosType => timestampNTZNanosLiteralGen(t.precision)
> case t: TimestampLTZNanosType => timestampLTZNanosLiteralGen(t.precision)
> {code}
> # Value distribution:
> ** {{epochMicros}}: reuse the existing valid-range bounds
>    ([0001-01-01 .. 9999-12-31]) used by the micro generators.
> ** {{nanosOfMicro}}: random in [0, 999], biased to include the edge values
>    {0, 1, 999}.
> ** Respect the declared precision {{p}} so generated values are valid for the
>    type: p=7 -> {{nanosOfMicro}} multiple of 100, p=8 -> multiple of 10,
>    p=9 -> any value in [0, 999].
> ** Mix in entries from {{TimestampNanosTestUtils.specialNanosTs}} 
> (SPARK-57034).
> # # Keep all values normalized ({{nanosOfMicro}} in [0, 999]).
> h2. Acceptance criteria
> * For p in {7, 8, 9}, {{randomGen(TimestampNTZNanosType(p))}} and
>   {{randomGen(TimestampLTZNanosType(p))}} return generators that produce
>   {{Literal}}s of the correct type carrying {{TimestampNanosVal}} values with
>   visible nanosecond variation (and edge values {0, 1, 999} appearing).
> * Generated values are valid for the declared precision and normalized.
> * Existing {{randomGen}} cases for {{TimestampType}} / {{TimestampNTZType}} 
> are
>   unchanged.
> * At least one property-based suite is extended (or a small targeted test 
> added)
>   to confirm a nanos type round-trips through interpreted vs codegen 
> evaluation
>   using the new generator.
> h2. Out of scope
> * {{RandomDataGenerator}} and {{TimestampNanosTestUtils}} (already delivered 
> by
>   SPARK-57034).
> * Any production code or behavior change.
> h2. Notes for first-time contributors
> Good first issue - test-only. Run an affected suite with SBT, e.g.:
> {code}
> build/sbt 'catalyst/testOnly *LiteralExpressionSuite'
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to