[ https://issues.apache.org/jira/browse/SPARK-50726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954208#comment-17954208 ]
Yang Jie commented on SPARK-50726: ---------------------------------- [~czhang829] For questions, it's best to ask via email on the dev mailing list. > Deserialize Arrow stream of timestamp other than microseconds precision > ----------------------------------------------------------------------- > > Key: SPARK-50726 > URL: https://issues.apache.org/jira/browse/SPARK-50726 > Project: Spark > Issue Type: Question > Components: Spark Core > Affects Versions: 3.5.4 > Reporter: Chenyang Zhang > Priority: Critical > > I have a question regarding the `ArrowUtils.scala`. When we deserialize Arrow > stream of timestamp, we only recognize ArrowType.Timestamp with unit of > MICROSECOND > {code:java} > // in ArrowUtils.scala line 82 > case ts: ArrowType.Timestamp > if ts.getUnit == TimeUnit.MICROSECOND && ts.getTimezone == null => > TimestampNTZType > case ts: ArrowType.Timestamp if ts.getUnit == TimeUnit.MICROSECOND => > TimestampType {code} > I know that Spark internal representation of timestamp is microseconds, but I > want to understand why can't we accept second and milisecond precision. They > could be treated as microseconds with 0 value, i.e. 1s == 1.000000s. The > issue I encounter is that the above handling prevents me to deserialize Arrow > stream of timestamp with seconds or miliseconds precision. > Are there any issues if we change the above code into > {code:java} > case ts: ArrowType.Timestamp if ts.getUnit != TimeUnit.NANOSECOND && > ts.getTimezone == null => TimestampNTZType > case ts: ArrowType.Timestamp if ts.getUnit != TimeUnit.NANOSECOND => > TimestampType {code} > Any help is appreciated. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org