Chenyang Zhang created SPARK-50726:
--------------------------------------

             Summary: Deserialize Arrow stream of timestamp other than 
microseconds precision
                 Key: SPARK-50726
                 URL: https://issues.apache.org/jira/browse/SPARK-50726
             Project: Spark
          Issue Type: Question
          Components: Spark Core
    Affects Versions: 3.5.4
            Reporter: Chenyang Zhang


I have a question regarding the `ArrowUtils.scala`. When we deserialize Arrow 
stream of timestamp, we only recognize ArrowType.Timestamp with unit of 
MICROSECOND
{code:java}
// in ArrowUtils.scala line 82
case ts: ArrowType.Timestamp
  if ts.getUnit == TimeUnit.MICROSECOND && ts.getTimezone == null => 
TimestampNTZType
case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => 
TimestampType {code}
I know that Spark internal representation of timestamp is microseconds, but I 
want to understand why can't we accept second and milisecond precision. They 
could be treated as microseconds with 0 value, i.e. 1s == 1.000000s. The issue 
I encounter is that the above handling prevents me to deserialize Arrow stream 
of timestamp with seconds or miliseconds precision. 

Are there any issues if we change the above code into 
{code:java}
case ts: ArrowType.Timestamp if ts.getUnit != TimeUnit.NANOSECOND && 
ts.getTimezone == null => TimestampNTZType 
case ts: ArrowType.Timestamp if ts.getUnit != TimeUnit.NANOSECOND => 
TimestampType  {code}
Any help is appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to