https://bugs.documentfoundation.org/show_bug.cgi?id=171907

            Bug ID: 171907
           Summary: Calc writes XLSX larger than Excel and wrong
                    dimensions
           Product: LibreOffice
           Version: 25.8.5.2 release
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: medium
         Component: Calc
          Assignee: [email protected]
          Reporter: [email protected]

Created attachment 206871
  --> https://bugs.documentfoundation.org/attachment.cgi?id=206871&action=edit
Spreadsheets referenced in the initial report

Summary:

The attached Excel Spreadsheet (Created on Windows 10 pro with Excel 2021 v2604
 Build 16.0.19929.20086) 64-bit) is 34KB.

If opened with Calc & saved (as XLSX), it balloons to 1,331KB.

An application that reads the Calc-written version crashes with "out of memory"
as a result.

Analysis:

The Perl Module `SpreadSheet::ParseXLSX` reports that the Excel version has row
(indices) 0-1082 and columns 0 - 8 in the 'Proposal Index" worksheet, and row
0-1 and col 0-104 in the 'Full Data" worksheet.  The Perl debugger shows this
coming from the s:dimension section of the XML.

For the Calc version, the index has rows 0 - 1007 and col 0 - 16383 and the
data 0 - 1 and 0 - 104.

Apparently, blank or uninitialized cells are contributing to the dimensions and
cells stored by Calc in the XLSX file.

The larger file size is annoying.  But the dimensions are used by the Perl
module to allocate memory.  Instead of the 9 rows and 4 columns of data in the
index sheet, 105 x 16384 (1,720,320) are accessed and large data structures are
created.  

The ParseXLSX load time for the Excel version is 0.6 seconds; for the Calc
version: 24 seconds.

Worse, another version of this input file had 2 more rows, but caused the Perl
process to be terminated by the Linux OOM killer after 10s of minutes of
consuming GBs of memory. (A server with 4GB of RAM and 22GB of swap, more than
enough for this job.)

Evidence that the cause is empty cells:

Opening the Calc-written file in Excel, selecting the maximum empty cells
(XFD1065 to E1),  deleting them, and saving produces  a 31KB file.

The same procedure on a slightly different version of the file (also included)
has the same results.

Consequences:

This makes Calc unusable for the application.


Correct behavior:

Calc should write the correct dimensions and omit the blank/uninitialized cells
outside that extent.

Attached files in the zip archive:

Multi-proposal_Sample_Workbook.xlsx - The original Excel-written worksheet.

Multi-proposal_Sample_WorkbookLO.xlsx - Opened by Calc and Saved (AS).

excel-resaved.xlsx - The Calc-written worksheet opened, empty cells deleted and
saved by Excel

Multi-Proposal_Sample_Workbook_with duplicates.xlsx - the larger version
(37MB!) from Calc

Multi-Proposal_Sample_Workbook_trimmed duplicates.xlsx - the larger version
resaved by Excel (34KB)

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to