[ https://issues.apache.org/jira/browse/ARROW-8581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Adam Szmigin updated ARROW-8581: -------------------------------- Description: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} Assume that the user is in the UK (as I am), where the GMT offset on the above date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}} will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] of its object, not the time portion or offset. This means that the number of days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought they were specifying. If the user chooses to use NodaTime as a "better" date and time-handling library, they will still likely run into the bug if they do the obvious thing: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); var ld = new NodaTime.LocalDate(2020, 4, 24); builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} h1. Suggested Improvement * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). The conversion method for a {{Date32Array}} would then look a bit like this: {code:java} private static readonly DateTime Epoch = new DateTime(1970, 1, 1); protected override int ConvertTo(DateTime value) { return (int)(value - Epoch).TotalDays; } {code} was: h1. Summary Proposal The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values of type {{DateTimeOffset}}, but this makes it very easy for the user to introduce subtle bugs when they work with the {{DateTime}} type in their own code. This class of bugs could be avoided if these builders were instead typed on {{DateTime}} rather than {{DateTimeOffset}}. h1. Details The danger is introduced by the implicit widening conversion provided by the _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] The important part is this text: {quote}The offset of the resulting DateTimeOffset object depends on the value of the DateTime.Kind property of the dateTime parameter: * If the value of the DateTime.Kind property is DateTimeKind.Local or DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is set equal to dateTime, and its Offset property *is set equal to the offset of the local system's current time zone*.{quote} (Emphasis mine) If the user is operating in an environment with a positive GMT offset, it is very easy to write the wrong date to the builder: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} Assume that the user is in the UK (as I am), where the GMT offset on the above date is 1 hour ahead. This means that the conversion to {{DateTimeOffset}} will actually result in a value of {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] of its object, not the time portion or offset. This means that the number of days gets calculated based on 2020-04-23, not 2020-04-24 as the user thought they were specifying. If the user chooses to use NodaTime as a "better" date and time-handling library, they will still likely run into the bug if they do the obvious thing: {code:c#} Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug triggers if > 00:00:00 var builder = new Date32Array.Builder(); var ld = new NodaTime.LocalDate(2020, 4, 24); builder.Append(ld.ToDateTimeUnspecified()); // Kind == DateTimeKind.Unspecified var allocator = new NativeMemoryAllocator(); Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! {code} h1. Suggested Improvement * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). > [C#] Date32/64Array.Builder should accept DateTime, not DateTimeOffset > ---------------------------------------------------------------------- > > Key: ARROW-8581 > URL: https://issues.apache.org/jira/browse/ARROW-8581 > Project: Apache Arrow > Issue Type: Improvement > Components: C# > Affects Versions: 0.17.0 > Reporter: Adam Szmigin > Priority: Major > > h1. Summary Proposal > The {{Date32Array.Builder}} and {{Date64.Builder}} classes both accept values > of type {{DateTimeOffset}}, but this makes it very easy for the user to > introduce subtle bugs when they work with the {{DateTime}} type in their own > code. This class of bugs could be avoided if these builders were instead > typed on {{DateTime}} rather than {{DateTimeOffset}}. > h1. Details > The danger is introduced by the implicit widening conversion provided by the > _DateTimeOffset.Implicit(DateTime to DateTimeOffset)_ operator: > > [https://docs.microsoft.com/en-us/dotnet/api/system.datetimeoffset.op_implicit?view=netcore-3.1] > The important part is this text: > {quote}The offset of the resulting DateTimeOffset object depends on the value > of the DateTime.Kind property of the dateTime parameter: > * If the value of the DateTime.Kind property is DateTimeKind.Local or > DateTimeKind.Unspecified, the date and time of the DateTimeOffset object is > set equal to dateTime, and its Offset property *is set equal to the offset of > the local system's current time zone*.{quote} > (Emphasis mine) > If the user is operating in an environment with a positive GMT offset, it is > very easy to write the wrong date to the builder: > {code:c#} > Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug > triggers if > 00:00:00 > var builder = new Date32Array.Builder(); > builder.Append(new DateTime(2020, 4, 24)); // Kind == DateTimeKind.Unspecified > var allocator = new NativeMemoryAllocator(); > Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! > {code} > Assume that the user is in the UK (as I am), where the GMT offset on the > above date is 1 hour ahead. This means that the conversion to > {{DateTimeOffset}} will actually result in a value of > {{2020-04-23T23:00:00+01:00}} being passed to the {{Append()}} method. Arrow > then calls {{ToUnixTimeMilliseconds()}}, which [only considers the date > portion|https://referencesource.microsoft.com/#mscorlib/system/datetimeoffset.cs,8f33340c07c4787e] > of its object, not the time portion or offset. This means that the number > of days gets calculated based on 2020-04-23, not 2020-04-24 as the user > thought they were specifying. > If the user chooses to use NodaTime as a "better" date and time-handling > library, they will still likely run into the bug if they do the obvious thing: > {code:c#} > Console.WriteLine(TimeZoneInfo.Local.GetUtcOffset(DateTime.UtcNow)); // Bug > triggers if > 00:00:00 > var builder = new Date32Array.Builder(); > var ld = new NodaTime.LocalDate(2020, 4, 24); > builder.Append(ld.ToDateTimeUnspecified()); // Kind == > DateTimeKind.Unspecified > var allocator = new NativeMemoryAllocator(); > Console.WriteLine(builder.Build(allocator).GetDate(0)); // Prints 2020-04-23! > {code} > h1. Suggested Improvement > * Change {{Date32Array.Builder}} and {{Date64Array.Builder}} to specify a > {{TFrom}} parameter of {{DateTime}}, not {{DateTimeOffset}} (breaking change). > * Change {{Date32Array.GetDate()}} and {{Date64Array.GetDate()}} to return a > {{DateTime}}, not {{DateTimeOffset}} (also a breaking change). > > The conversion method for a {{Date32Array}} would then look a bit like this: > {code:java} > private static readonly DateTime Epoch = new DateTime(1970, 1, 1); > protected override int ConvertTo(DateTime value) > { > return (int)(value - Epoch).TotalDays; > } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)