Chenyang Zhang created SPARK-50726:
--------------------------------------
Summary: Deserialize Arrow stream of timestamp other than
microseconds precision
Key: SPARK-50726
URL: https://issues.apache.org/jira/browse/SPARK-50726
Project: Spark
Issue Type: Question
Components: Spark Core
Affects Versions: 3.5.4
Reporter: Chenyang Zhang
I have a question regarding the `ArrowUtils.scala`. When we deserialize Arrow
stream of timestamp, we only recognize ArrowType.Timestamp with unit of
MICROSECOND
{code:java}
// in ArrowUtils.scala line 82
case ts: ArrowType.Timestamp
if ts.getUnit == TimeUnit.MICROSECOND && ts.getTimezone == null =>
TimestampNTZType
case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND =>
TimestampType {code}
I know that Spark internal representation of timestamp is microseconds, but I
want to understand why can't we accept second and milisecond precision. They
could be treated as microseconds with 0 value, i.e. 1s == 1.000000s. The issue
I encounter is that the above handling prevents me to deserialize Arrow stream
of timestamp with seconds or miliseconds precision.
Are there any issues if we change the above code into
{code:java}
case ts: ArrowType.Timestamp if ts.getUnit != TimeUnit.NANOSECOND &&
ts.getTimezone == null => TimestampNTZType
case ts: ArrowType.Timestamp if ts.getUnit != TimeUnit.NANOSECOND =>
TimestampType {code}
Any help is appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]