https://bugs.documentfoundation.org/show_bug.cgi?id=171907
Bug ID: 171907
Summary: Calc writes XLSX larger than Excel and wrong
dimensions
Product: LibreOffice
Version: 25.8.5.2 release
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: medium
Component: Calc
Assignee: [email protected]
Reporter: [email protected]
Created attachment 206871
--> https://bugs.documentfoundation.org/attachment.cgi?id=206871&action=edit
Spreadsheets referenced in the initial report
Summary:
The attached Excel Spreadsheet (Created on Windows 10 pro with Excel 2021 v2604
Build 16.0.19929.20086) 64-bit) is 34KB.
If opened with Calc & saved (as XLSX), it balloons to 1,331KB.
An application that reads the Calc-written version crashes with "out of memory"
as a result.
Analysis:
The Perl Module `SpreadSheet::ParseXLSX` reports that the Excel version has row
(indices) 0-1082 and columns 0 - 8 in the 'Proposal Index" worksheet, and row
0-1 and col 0-104 in the 'Full Data" worksheet. The Perl debugger shows this
coming from the s:dimension section of the XML.
For the Calc version, the index has rows 0 - 1007 and col 0 - 16383 and the
data 0 - 1 and 0 - 104.
Apparently, blank or uninitialized cells are contributing to the dimensions and
cells stored by Calc in the XLSX file.
The larger file size is annoying. But the dimensions are used by the Perl
module to allocate memory. Instead of the 9 rows and 4 columns of data in the
index sheet, 105 x 16384 (1,720,320) are accessed and large data structures are
created.
The ParseXLSX load time for the Excel version is 0.6 seconds; for the Calc
version: 24 seconds.
Worse, another version of this input file had 2 more rows, but caused the Perl
process to be terminated by the Linux OOM killer after 10s of minutes of
consuming GBs of memory. (A server with 4GB of RAM and 22GB of swap, more than
enough for this job.)
Evidence that the cause is empty cells:
Opening the Calc-written file in Excel, selecting the maximum empty cells
(XFD1065 to E1), deleting them, and saving produces a 31KB file.
The same procedure on a slightly different version of the file (also included)
has the same results.
Consequences:
This makes Calc unusable for the application.
Correct behavior:
Calc should write the correct dimensions and omit the blank/uninitialized cells
outside that extent.
Attached files in the zip archive:
Multi-proposal_Sample_Workbook.xlsx - The original Excel-written worksheet.
Multi-proposal_Sample_WorkbookLO.xlsx - Opened by Calc and Saved (AS).
excel-resaved.xlsx - The Calc-written worksheet opened, empty cells deleted and
saved by Excel
Multi-Proposal_Sample_Workbook_with duplicates.xlsx - the larger version
(37MB!) from Calc
Multi-Proposal_Sample_Workbook_trimmed duplicates.xlsx - the larger version
resaved by Excel (34KB)
--
You are receiving this mail because:
You are the assignee for the bug.