https://bugs.documentfoundation.org/show_bug.cgi?id=171907
--- Comment #4 from [email protected] --- (In reply to ady from comment #3) [snip] > > At any rate, this report could probably be set as a DUPE of many others, > perhaps adding some info about the XML "dimension" details. Alternatively, > someone with the relevant knowledge should rather take this specific report > and actually improve the situation. > > There is possibly some reason for the (XML) "dimension" to be "too big(?)" > when saving in Calc. While there may be other reports that complain about the size, this is about function. Size appears to be a useful clue. Again, there may well be more to this than the size. The size is annoying. The result in this case is catastrophic. It may be just size. It may be that there's a structural problem hiding behind it. Automation is trying to process the worksheet. The result when fed a Calc sheet is that it runs out of memory and is killed. Either or both a structural problem or simply including area that Excel does not. Again, Calc is writing a file with (many) cells that Excel does not. Calc is clearly providing a 'dimension' tag that doesn't reflect the useful area. Asking my users to go thru a 12 step procedure for each sheet in every workbook that they submit - every time they edit - is not realistic. Besides the time, it's error-prone. I can confirm that if I manually select & delete DB1-XFD16384 IN CALC, Calc reduces the 37MB file to 1.3M. Do the same for A3-XFD16384, and it drops to a mere 14K. But these are cells that Calc ADDED. So, maybe the issue is in Calc's READER (vs. writer). Absent a fix, the answer is going to be "Buy an Office license and sue Excel - just for this application". It's not a good answer. Excel defines the correct behavior. (In this case, it's even the right behavior.) There's no good reason for Calc to write more than Excel does. Nor for the dimension tag to include unwritten rows (particularly when Excel does not). I am NOT saying that Calc's output must be bit-for-bit identical to Excel's. But it should be functionally compatible and reasonable. In this case, it is neither. I hope that someone with the necessary expertise in Calc's output can fix this. I expect that person has the necessary tools to investigate. But below are one-line reproducers - each takes a filename argument. You need Perl, and Spreadsheet::ParseXLSX (cpan install Spreadsheet::ParseXLSX). Read the Excel (trimmed) Sheet: 0.5 sec: time perl -MSpreadsheet::ParseXLSX -e'Spreadsheet::ParseXLSX->new->parse($ARGV[0])' Multi-Proposal_Sample_Workbook_trimmed\ duplicates.xlsx real 0m0.549s user 0m0.521s sys 0m0.027s Tried to read the Calc version on a bigger machine: 11 min interactive before failure, but too big and too long for a web service: time perl -MSpreadsheet::ParseXLSX -e'$wbs=Spreadsheet::ParseXLSX->new->parse($ARGV[0])->worksheet(1); printf( "R: \%u - \%u\nC \%u - \%u\n", $ws->row_range, $ws->col_range)' Multi-Proposal_Sample_Workbook_with\ duplicates.xlsx Can't call method "row_range" on an undefined value at -e line 1. (did not find sheet) real 10m45.220s user 10m39.042s sys 0m4.926s Print the dimensions reported by Perl from trimmed - the sheet number is coded as worksheet(1): time perl -MSpreadsheet::ParseXLSX -e'$ws=Spreadsheet::ParseXLSX->new->parse($ARGV[0])->worksheet(1); printf( "R: \%u - \%u\nC \%u - \%u\n", $ws->row_range, $ws->col_range)' Multi-Proposal_Sample_Workbook_trimmed\ duplicates.xlsx R: 0 - 1082 C 0 - 8 Read the dimension tag directly from the XML file: unzip the .xlsx. Then: sed -e's/^.*\(<dimension\)/\1/; s/>.*/>/' xl/worksheets/sheet1.xml Results - note that sheet 1 is consistently A1:DA2, but sheet 2 correlates with file size : Excel's (original) xl/worksheets/sheet1.xml : <dimension ref="A1:DA2"/> xl/worksheets/sheet2.xml : <dimension ref="A1:I1083"/> Calc's: Multi-Proposal_Sample_WorkbookLO.xlsx xl/worksheets/sheet1.xml : <dimension ref="A1:DA2"/> xl/worksheets/sheet2.xml : <dimension ref="A1:XFD1008"/> Calc's: Multi-Proposal_Sample_Workbook_with_duplicates.xlsx xl/worksheets/sheet1.xml : <dimension ref="A1:DA2"/> xl/worksheets/sheet2.xml : <dimension ref="A1:XFD1010"/> Calc's: Multi-Proposal_Sample_Workbook_with_duplicates LO delete DB and A3 - xfd16384.xlsx xl/worksheets/sheet1.xml : <dimension ref="A1:DA2"/> xl/worksheets/sheet2.xml : <dimension ref="A1:I32"/> Excel's: excel-resaved.xlsx xl/worksheets/sheet1.xml : <dimension ref="A1:DA2"/> xl/worksheets/sheet2.xml : <dimension ref="A1:I1065"/> Excel's (trimmed): xl/worksheets/sheet1.xml : <dimension ref="A1:DA2"/> xl/worksheets/sheet2.xml : <dimension ref="A1:I1083"/> The rows on sheet1 are constant at 2, with the columns constant at DA (104) The rows on sheet2 extend to ~1,000, but the columns from I (9) to XFD (16,384) In no case is the actual data using as many columns as declared. On sheet2, there are only 2 rows used, not ~1,000. Finding the used dimensions ought to be pretty simple: for each row starting with the maximum, start at the maximum column, and scan left until you hit a non-blank or initialized cell. Keep track of the maximum column. Stop when you nit a non-blank row. That's the max row and column. The min is the same. You can do this at input and track, or just do it at input and output. Printing (in the absence of printareas) can use the same info... -- You are receiving this mail because: You are the assignee for the bug.
