I believe SL is correct. This is as expected.  This is data canonicalization, 
which is very typically what happens when a parser tolerates many diverse 
formats, but the data format doesn't capture which of those specifically. 
They're considered, by the DFDL schema, to be 100% equivalent. The output when 
unparsing is then the canonical representation of that information.

In general to deal with this we use roundTrip="twoPass" tests in TDML. You 
probably need to change some one-pass tests to two-pass.

That way it parses, unparses (to the canonical representation) then parses 
again and compares infosets. At that second parse, it will get the same infoset 
from 8:43.Los Angeles Time as from 8:43.GMT-08:00 so the test will pass.


________________________________
From: Larry Barber <larry.bar...@nteligen.com>
Sent: Friday, January 8, 2021 9:36 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: RE: Timezones in DFDL

This reminds me of the case where there are multiple possible delimiters - the 
one provided in the original file may not be the one that appears in the 
unparse output.

-----Original Message-----
From: Steve Lawrence [mailto:slawre...@apache.org]
Sent: Friday, January 8, 2021 9:18 AM
To: dev@daffodil.apache.org
Subject: Timezones in DFDL

I was confirming that DAFFODIL-1580 [1] is still an issue and was going to open 
a bug with ICU, but as I look more at this, I think this is just a limitation 
with timezones and DFDL, but wanted confirmation first.

For example, we have a test schema that looks like this:

 <xs:element name="time" type="xs:dateTime"
   dfdl:calendarPattern="hh:mm.VVVV" ... />

And matching data that looks like this:

  8:43.Los Angeles Time

This parses to an infoset that looks like this:

  <time>08:43:00-08:00</time>

And that infoset unparses to this:

  08:43.GMT-08:00

Note that the unparsed timezone does not match the original data.
DAFFODIL-1580 describes this behavior as a bug (either in Daffodil or
ICU) but I think this is actually expected behavior. A DFDL infoset does not 
contain any location-specific timezone information--it only contains a GMT 
offset (a restriction of XML Schema). So this data will always unparse to a 
non-location specific timezone, depending on the calendar pattern. For some 
patterns this will be an offset or a generic timezone like PST (which should 
both roundtrip fine), but others might result in "Unknown" or "unk". I think 
this only affects the "V" and "v" calendar patterns, but additional tests 
should be added to confirm this behavior.

This is the expected behavior, correct?


[1] 
https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FDAFFODIL-1580&amp;data=04%7C01%7Clarry.barber%40nteligen.com%7Ced53e6d6768e41dca6ec08d8b3e041ae%7C379c214c5c944e86a6062d047675f02a%7C0%7C0%7C637457123063759940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=hV8nkoGDxv039R6ZVkVwfYB%2BaUAIG3YLt3aRfebTrMI%3D&amp;reserved=0

Reply via email to