https://bugs.documentfoundation.org/show_bug.cgi?id=127170

--- Comment #11 from Albrecht Müller <albrecht.muel...@astrail.de> ---
@Jaromir: As the de facto standard is quite old the floating point approach may
have some historical roots. Maybe there were a calendar first which used days
as basic unit. Later there was a need to add time but it was not possible to
change the basic unit. This might have broken too much of existing software. So
time was added as a fraction of days.

I recently discovered another bunch of bug reports related to date and time
arithmetic and found Winfried Donkers’ attachment #147433 (datetime parts
comparison) in Bug 118800 comment 14 (discussed also in Bug 121978 comment 10
and the following comment) which compares how different programs handle date
and time functions as well as formatting. From this attachment I learned an
additional variant of how LibreOffice might interpret date and time handling.
But I also saw that Excel 2016 seems to handle date and time values in the same
way as Excel 97 did about 20 years ago (see Bug 127476 comment 6). So why there
was no need to change Microsofts date and time handling? Why LibreOffice needs
so many changes that cause a lot of unnecessary work and irritation both at the
user and developer side?

My opinion is that Microsoft pretty early had a clear concept of how an
intuitive date and time arithmetic should work. In contrast, LibreOffice
development seems to work on a one-bug-at-a-time approach that is based on an
inconsistent specification that implies counter-intuitive behaviour. This
approach is doomed: You may have a clever idea how to fix one bug but this
creates another problem at some different location.

So I see a need to have a clear description that explicitly states the key
features of how LibreOffice’s date and time arithmetic should work. Therefore I
try to reverse engineer the key features of the de facto standard:

1) A value of 1 corresponds to one day.
2) One day is 24 hours of equal length
3) One hour is 60 minutes of equal length
4) One minute is 60 seconds of equal length
5) There are no exceptions to rules 1) … 4) as long as you stay in the defined
calendar range, which is something from ??? to 31.12.9999 (?) (this should be
specified)
6) Points in time and durations are represented by floating point values. (of
sufficient precision; it is necessary to specify the details to allow the user
to assess the size of the errors that will occur, I assume are double values
are used).
7) Date and time calculations are done using the features of the underlying
floating point arithmetic only. This floating point representation cannot in
general represent the date and time values exactly. Due to the properties of
floating point arithmetic and the inexact representation the user should expect
errors both in representation and in the calculations. (Note: This is a common
problem inherent to any floating point calculation.) The size of the errors
should be expected to vary by several orders of magnitude depending on the kind
of calculations that is done. 
8)The standard uses an error recovery mechanism that tries to recover the exact
values from their inexact representation. This mechanism is used in all date
and time functions as well as for formatting. It cannot guarantee to recover
correct values but it has to return consistent values. Example: The recovery
may consider a value of 59.5 seconds as a representation of 59 or of 60
seconds. The SECOND and MINUTE functions should return either 0 minutes and 59
second or 1 minute 0 seconds, but not 0 minutes and 0 seconds. See bug 127476
for an example where this kind of problems affects which millennium you get.
9) There is a reference point in time that is represented by the floating point
value zero. (I think there are several possible reference points that are
already specified in the help information)
10) The standard does not support a conceptual distinction between points in
time and durations. Points in time are defined by the time difference between
this point and the reference point, i.e. by a duration. As long as the
resulting values stay within the supported range durations always can be
represented by points in time and vice versa.
11) Formatting of time values uses a "round down" strategy, for example 10
minutes and 55 seconds rounded to minutes are 10 minutes. Open questions: What
kind of rounding is used in the fractional part of seconds, such as
3:10:55.954? Does the error recovery mechanism use different time units that
depend on the number of digits in the fractional part?
12) Realisation of the error recovery mechanism: In order to reconstruct an
exact date and time value from an inexact representation some basic unit of
time is used. This is an integer fraction of a second, e.g. a second or a
millisecond (the exact value has to be defined and documented, maybe the
existing industry standard leaves no choice what value to use). Using a
"rounding to the nearest" strategy the time value is mapped to some multiple of
this basic unit. All further calculations are based on the resulting integer
value. Based on the recovered exact value all date and time function can be
implemented in a way such that they return consistent values.

I think the "Recalculated Formula (OpenFormula) Format" specification of the
time and date mechanism is broken as it does not consider any kind of error
recovery as described in #8 and #12. Error recovery is necessary to avoid
issues caused by the combination of the inherent inexact floating point
representation and "round down" strategies required for formatting. These
issues are documented in a couple of bug reports. This specification also
disregards that the users expect to see consistent results, such as 1+1=2.

As a consequence of #10 and #11 no DURATION kind of rounding is possible. The
distinction between CLOCK and DURATION rounding seems not to solve problems of
incorrect math anyway. Bug 131151 seems to provide an example that produces
wrong results despite a correct use of CLOCK formatting.

Maybe I understand what motivates the introduction of the rounding to the
middle alias DURATION rounding. Points #7, #8 and #12 imply that you may have
to sacrifice several orders of magnitude of the precision the underlying
floating point representation provides. Using a "rounding to the next" strategy
should avoid the need of an error recovery mechanism and thus allow to use the
full precision of the underlying representation.

So there is a point to have this kind of rounding. Unfortunately it is
incompatible with the requirements of the established industry standard despite
the fact that in many cases it will make no or little difference which kind of
rounding you use. Therefore offering a DURATION kind of rounding may be useful,
but only if it is offered as an additional feature but not, if it is tied to
the use of [HH] or HH. A description is necessary that informs clearly about
the differences between the different rounding schemes. Thus some mechanism
should be offered that allows to select the DURATION kind of rounding no matter
if the [HH] or HH format is used. This has the further advantage that the
formatting mechanism allows more variations, e.g. combining HH formatting with
DURATION type rounding. In addition all date and time functions such as SECOND,
MINUTE etc. need additional parameters that provide them with the information
that is necessary to return values that are compatible with the formatting. If
you consider the common decimal notation for floating point number you will
find a similar situation: The FLOOR, CEILING and ROUND functions all have
additional parameters.

Maybe it is possible to combine error recovery and long fractional parts. Idea:
The calculation of the fractional part could be used to calculate the exact
second as recovered value. Thus this value is consistent with the rest of the
fractional part. The calculation of the other time units will not introduce
inconsistencies as they are based on an exact recovered value.

In general I am somewhat sceptical about trying to enhance this old standard. I
think it is a pretty simple approximation of date and time calculations which
is quite useful if you can live with its limitations. If you try to add
features you may run quickly into the monsters Mike Kaganski mentioned in bug
132083 comment 13 and into a lot of weird situations. Increasing time
resolution may suffice. An example: Assume a current version of LibreOffice
that has a slightly more sophisticated date and time arithmetic. You can ask it
to calculate the number of days or hours between the first day of January 2020
0:00 and the first day of January 2025 0:00. This arithmetic returns a correct
number of days but an error value for the number of hours. Why?

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
Libreoffice-bugs mailing list
Libreoffice-bugs@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/libreoffice-bugs

Reply via email to