https://bugs.documentfoundation.org/show_bug.cgi?id=166195

            Bug ID: 166195
           Summary: Integer Overflow in Year Recognition Leads to Periodic
                    Date/Text Interpretation
           Product: LibreOffice
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Calc
          Assignee: libreoffice-bugs@lists.freedesktop.org
          Reporter: gplhust...@gmail.com

Description:
When parsing date-like strings, Calc exhibits inconsistent behavior for large
year values due to an integer overflow issue. Specifically, year recognition
alternates between valid (Date) and invalid (Text) periodically as the input
year number increases. This occurs because of how large year numbers are
handled during conversion from sal_Int32 to sal_uInt16 and the subsequent
interpretation of that sal_uInt16 as a sal_Int16.


Enter the following strings into Calc cells:
"32767-1-1" -> Correctly recognized as Date(32767, 1, 1)
"32768-1-1" -> Incorrectly recognized as Text("32768-1-1")
"80000-1-1" -> Incorrectly recognized as Date(14464, 1, 1) (due to 80000 %
65536 = 14464)
"100000-1-1" -> Incorrectly recognized as Text("100000-1-1") (due to 100000 %
65536 = 34464, which is > 32767)

The issue stems from the ImpSvNumberInputScan::ImplGetYear function in
svl/source/numbers/zforfind.cxx.

1. The input year (as digits) is parsed into a sal_Int32.
2. This sal_Int32 is then cast to a sal_uInt16 (nYear). When the input year
value exceeds 65535, this conversion results in uint16 overflow (modulo 65536
arithmetic). This explains the observed periodicity of 65536.
3. Downstream, this sal_uInt16 nYear value is effectively treated as a
sal_Int16 when determining validity or setting the cell value (e.g., via
Calc.setValue or similar mechanisms).
4. When the sal_uInt16 value falls within the range 32768 to 65535, its
interpretation as a sal_Int16 results in a negative value.
5. Calc apparently treats these negative years as invalid in this context,
causing the entire input string to be fallback-recognized as Text.

Therefore:
- Years resulting in an effective sal_uInt16 value between 0 and 32767
(inclusive) are considered valid Date years.
- Years resulting in an effective sal_uInt16 value between 32768 and 65535
(inclusive) are considered invalid, and the input becomes Text.

While this is unlikely to affect users entering typical, smaller year values,
it can lead to unexpected data interpretation if large numbers are entered,
potentially due to typos. This inconsistent behavior could compromise data
integrity in specific scenarios. 

A straightforward mitigation is to enforce a maximum valid year limit,
effectively treating any year input that would result in a value > 32767 as
invalid (Text). This can be achieved with a small modification to the sal_Int32
to sal_uInt16 conversion:

The following patch implements the proposed solution. It has been tested
locally and resolves the described inconsistent behavior.

diff --git a/svl/source/numbers/zforfind.cxx b/svl/source/numbers/zforfind.cxx
index 2b91fa662..648c1007f 100644
--- a/svl/source/numbers/zforfind.cxx
+++ b/svl/source/numbers/zforfind.cxx
@@ -1111,7 +1111,11 @@ sal_uInt16 ImpSvNumberInputScan::ImplGetYear( sal_uInt16
nIndex )
     // leading zero as convention.
     if (nLen <= 6)
     {
-        nYear = static_cast<sal_uInt16>(sStrArray[nNums[nIndex]].toInt32());
+        sal_Int32 temp = sStrArray[nNums[nIndex]].toInt32();
+        // safely convert int32 to uint16 by capping values at 65535
+        // downstream code will cast uint16 to int16(where values >32767
become negative and are considerd invalid for year)
+        nYear = temp < 65536 ? temp : 65535;
+
         // A year in another, not Gregorian CE era is never expanded.
         // A year < 100 entered with at least 3 digits with leading 0 is taken
         // as is without expansion.


I have implemented and tested this fix. However, I am currently unfamiliar with
the process for contributing code changes to the Calc/LibreOffice project.
Guidance on submitting this patch would be appreciated.

Steps to Reproduce:
Enter the following strings into Calc cells:
"32767-1-1" -> Correctly recognized as Date(32767, 1, 1)
"32768-1-1" -> Incorrectly recognized as Text("32768-1-1")
"80000-1-1" -> Incorrectly recognized as Date(14464, 1, 1) (due to 80000 %
65536 = 14464)
"100000-1-1" -> Incorrectly recognized as Text("100000-1-1") (due to 100000 %
65536 = 34464, which is > 32767)

Actual Results:
"32767-1-1" -> Date(32767, 1, 1)
"32768-1-1" -> Text("32768-1-1")
"80000-1-1" -> Date(14464, 1, 1)
"100000-1-1" -> Text("100000-1-1")

Expected Results:
"32767-1-1" -> Date(32767, 1, 1)
"32768-1-1" -> Text("32768-1-1")
"80000-1-1" -> Text("80000-1-1")
"100000-1-1" -> Text("100000-1-1")


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.8.3.2 (AARCH64) / LibreOffice Community
Build ID: 48a6bac9e7e268aeb4c3483fcf825c94556d9f92
CPU threads: 8; OS: macOS 15.0; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_CN.UTF-8); UI: en-US
Calc: threaded

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to