That's a good point. I'm not sure if DFDL describes how this should be
handled. I think for the most part, DFDL just says the correct behavior
when dealing with dates/times is whatever ICU does. It seems if ICU is
not given with a timezone, then it just assumes the offset is the
standard timezone and not the daylight timezone, so maybe that's the
expected behavior.

DFDL does describe a "calendarObserveDST" property (which I notice
Daffodil doesn't implement, we need to implement that or add it to our
unsupported features page), though based on the brief description I'm
not sure if that property applies in this case.

Mike, any insight on what the spec says/implies about time zones without
dates? I don't see anything obvious in a quick scan.

On 1/8/21 10:34 AM, Dave Fisher wrote:
> There is a deeper problem with this example. It is a dateTime without a date 
> and loses the nuance between when Los Angeles is GMT-7 and GMT-8.
> 
> Sent from my iPhone
> 
>> On Jan 8, 2021, at 6:48 AM, Beckerle, Mike <mbecke...@owlcyberdefense.com> 
>> wrote:
>>
>> I believe SL is correct. This is as expected.  This is data 
>> canonicalization, which is very typically what happens when a parser 
>> tolerates many diverse formats, but the data format doesn't capture which of 
>> those specifically. They're considered, by the DFDL schema, to be 100% 
>> equivalent. The output when unparsing is then the canonical representation 
>> of that information.
>>
>> In general to deal with this we use roundTrip="twoPass" tests in TDML. You 
>> probably need to change some one-pass tests to two-pass.
>>
>> That way it parses, unparses (to the canonical representation) then parses 
>> again and compares infosets. At that second parse, it will get the same 
>> infoset from 8:43.Los Angeles Time as from 8:43.GMT-08:00 so the test will 
>> pass.
>>
>>
>> ________________________________
>> From: Larry Barber <larry.bar...@nteligen.com>
>> Sent: Friday, January 8, 2021 9:36 AM
>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>> Subject: RE: Timezones in DFDL
>>
>> This reminds me of the case where there are multiple possible delimiters - 
>> the one provided in the original file may not be the one that appears in the 
>> unparse output.
>>
>> -----Original Message-----
>> From: Steve Lawrence [mailto:slawre...@apache.org]
>> Sent: Friday, January 8, 2021 9:18 AM
>> To: dev@daffodil.apache.org
>> Subject: Timezones in DFDL
>>
>> I was confirming that DAFFODIL-1580 [1] is still an issue and was going to 
>> open a bug with ICU, but as I look more at this, I think this is just a 
>> limitation with timezones and DFDL, but wanted confirmation first.
>>
>> For example, we have a test schema that looks like this:
>>
>> <xs:element name="time" type="xs:dateTime"
>>   dfdl:calendarPattern="hh:mm.VVVV" ... />
>>
>> And matching data that looks like this:
>>
>>  8:43.Los Angeles Time
>>
>> This parses to an infoset that looks like this:
>>
>>  <time>08:43:00-08:00</time>
>>
>> And that infoset unparses to this:
>>
>>  08:43.GMT-08:00
>>
>> Note that the unparsed timezone does not match the original data.
>> DAFFODIL-1580 describes this behavior as a bug (either in Daffodil or
>> ICU) but I think this is actually expected behavior. A DFDL infoset does not 
>> contain any location-specific timezone information--it only contains a GMT 
>> offset (a restriction of XML Schema). So this data will always unparse to a 
>> non-location specific timezone, depending on the calendar pattern. For some 
>> patterns this will be an offset or a generic timezone like PST (which should 
>> both roundtrip fine), but others might result in "Unknown" or "unk". I think 
>> this only affects the "V" and "v" calendar patterns, but additional tests 
>> should be added to confirm this behavior.
>>
>> This is the expected behavior, correct?
>>
>>
>> [1] 
>> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FDAFFODIL-1580&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7Ced53e6d6768e41dca6ec08d8b3e041ae%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637457123063759940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=hV8nkoGDxv039R6ZVkVwfYB%2BaUAIG3YLt3aRfebTrMI%3D&amp;reserved=0
> 

Reply via email to