[ 
https://issues.apache.org/jira/browse/ARROW-9915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191646#comment-17191646
 ] 

Matt Jadczak commented on ARROW-9915:
-------------------------------------

It looks like the mailing list discussion was about a specific shortcoming from 
my list, however given the extent of the inconsistencies, it seems to me like 
perhaps more discussion would be warranted? I'm not super-familiar with how the 
project tends to work - is the best place for such discussion here or on the 
mailing list?

IMO if the goal is to leave these implementations alone for back-compat and 
encourage users to write their own "hydration" function based on the raw `get` 
functions, we should go as far as deprecating `getObject` entirely. When used 
in a dynamic context (i.e. when you just have some vector returning some 
`Object`, rather than a statically typed vector returning a statically typed 
object) I think it can be extremely confusing when two vector types like 
TimeNanoVector and TimeMilliVector return different types of objects, neither 
of them what you would expect (in fact, that is exactly how we stumbled upon 
this issue internally).

> [Java] getObject API for temporal types is inconsistent and in some cases 
> incorrect
> -----------------------------------------------------------------------------------
>
>                 Key: ARROW-9915
>                 URL: https://issues.apache.org/jira/browse/ARROW-9915
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: 0.13.0, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.16.0, 0.17.0, 
> 0.17.1, 1.0.0
>            Reporter: Matt Jadczak
>            Priority: Major
>
> It seems that the work which has been tracked in ARROW-2015 and merged in 
> [https://github.com/apache/arrow/pull/2966] to change the return types of the 
> various Time and Date vector types when using the getObject API missed some 
> of the vector types which are temporal and so should return a temporal type, 
> and provided an incorrect implementation for others (some of this was pointed 
> out in the initial PR review, but it seems that it slipped through the cracks 
> and was not addressed before merging).
> Here is a table of the various temporal vector types, what they currently 
> return from getObject, and what they should return, in my opinion (I have 
> included ones in which the implementation is correct for completeness, and 
> coloured them green).
>  
>  
> ||Vector class||Current return type||Proposed return type||Comments||
> |DateDayVector|Integer|LocalDate|Currently returns the raw value of days 
> since epoch, should return the actual date|
> |DateMilliVector|LocalDateTime|LocalDate|This type is supposed to encode a 
> date, not a datetime, so even though epoch millis are used, the result should 
> be a LocalDate|
> |{color:#00875a}DurationVector{color}|{color:#00875a}Duration{color}|{color:#00875a}Duration{color}|{color:#00875a}Correct.{color}|
> |IntervalDayVector|Duration|Period|As per 
> [https://github.com/apache/arrow/blob/master/format/Schema.fbs#L251] , 
> Interval should be a calendar-based datatype, not a time-based one. This is 
> represented in Java by a Period type. However, I note that the Java Period 
> class does not support milliseconds, unlike the Arrow type, which might be 
> why Duration is being returned. Some discussion may be needed on the best way 
> to deal with this.|
> |{color:#00875a}IntervalYearVector{color}|{color:#00875a}Period{color}|{color:#00875a}Period{color}|{color:#00875a}Correct.{color}|
> |TimeMicroVector|Long|LocalTime|Currently returns the raw number of micros, 
> should return the actual time|
> |TimeMilliVector|LocalDateTime|LocalTime|Currently returns a datetime on 
> 1970-01-01 with the correct time component, should just return the time|
> |TimeNanoVector|Long|LocalTime|Currently returns the raw number of nanos, 
> should return the actual time|
> |TimeSecVector|Integer|LocalTime|Currently returns the raw number of seconds, 
> should return the actual time|
> |{color:#00875a}TimeStampMicroVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampMilliVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampNanoVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |{color:#00875a}TimeStampSecVector{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}LocalDateTime{color}|{color:#00875a}Correct.{color}|
> |TimeStampMicroTZVector|Long|ZonedDateTime|Currently returns the underlying 
> micros, and TZ has to be obtained separately. Should return the actual 
> datetime with timezone|
> |TimeStampMilliTZVector|Long|ZonedDateTime|Currently returns the underlying 
> millis, and TZ has to be obtained separately. Should return the actual 
> datetime with timezone|
> |TimeStampNanoTZVector|Long|ZonedDateTime|Currently returns the underlying 
> nanos, and TZ has to be obtained separately. Should return the actual 
> datetime with timezone|
> |TimeStampSecTZVector|Long|ZonedDateTime|Currently returns the underlying 
> seconds, and TZ has to be obtained separately. Should return the actual 
> datetime with timezone|
> I am happy to submit a PR to fix this if there is no other reason for this to 
> persist other than these types being rarely used, but note that these would 
> all be breaking API changes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to