To comment on the following update, log in, then open the issue: http://www.openoffice.org/issues/show_bug.cgi?id=113558 Issue #|113558 Summary|Change Case broken by language tags and/or ligatures Component|framework Version|OOO330m1 Platform|PC URL| OS/Version|Windows, all Status|UNCONFIRMED Status whiteboard| Keywords| Resolution| Issue type|DEFECT Priority|P2 Subcomponent|code Assigned to|tm Reported by|jurf
------- Additional comments from j...@openoffice.org Sat Jul 31 05:10:08 +0000 2010 ------- Casing options broken by language tags and/or ligatures Issue 1601 (http://qa.openoffice.org/issues/show_bug.cgi?id=1601), marked Fixed and with CWS tl74 included in OOo-dev300m85 (tested) and OOO330m2 (not tested, but likely identical), implements three new and welcome options in Format | Change case, namely: Sentence case Capitalize Every Word tOGGLE cASE Whilst I've not tested tOGGLE cASE (it's not something I need), I have spent a good while poking Sentence case and Capitalize Every Word with a stick. Both functions are, unfortunately, very buggy. The implementation of Capitalize Every Word is especially bad, with a high probability of data loss (disappearing text with no guarantee that Undo works properly). So far, I've seen the bugs be triggered by either language mark-up or ligatures (the latter not necessarily in text selections), which are actually the only conditions I've been testing for. As such, it's likely there are other triggers, too. The data loss is particularly troubling as the "undo" function, even if given sufficient steps, does not necessarily restore the original text correctly. And even that assumes that the user is half-expecting trouble. Issue present in both Writer and Calc (not tested others), and in both cases is severe. I'm attaching an ODT file to this issue. It contains several examples you can try out yourself, together with mock-ups of expected and actual results. ************************************************** ISSUE DESCRIPTION In brief, the main problems I've found so far are: Sentence case - The presence of language mark-up within selected text confuses the parser, causing it to consider the marked-up section as a new sentence, thus capitalizing two or more words in the middle of a sentence. Capitalize Every Word - Language mark-up causes similar miscalculations, but more exaggerated, potentially causing data loss (see attached file) - The presence of ligatures, either within selected text, or before it (but in the same paragraph) causes similar problems. - Applying Capitalize Every Word to multiple selections further exacerbates the problem. ************************************************** POSSIBLE CAUSES I'm not a programmer, but I think the primary cause of the bugs in either function is a miscalculation of selection bounds, which leads to at times extremely severe offset errors both as regards the selection area and the bounds of the text itself. Among the causes would appear to be: 1. the parser gives language declarations a width (two characters for each "tag", apparently, being one for the opening, another for the closure); 2. the parser miscounts the length of ligatures (unicode FF00 to FF06) whether or not they're selected, which causes both selections and actual words processed to expand to the right - if there's no room at the end of the paragraph for this expansion, text disappears; 3. multiple selections are incorrectly handled (it appears as though errors in one selection block are carried over to the next, and so on). This may simply be the symptomatic of the first potential causes, but it may also be compounded by buffers not being cleared. Or something (TM). The problem was exacerbated, I think, by the original test case (http://quaste.services.openoffice.org/index.php?option=com_tcs&task=tcs_show&tcsid=3116), which is just plain text: no formatting, no language tags, no awkward characters such as non-diphthong ligatures (ff, fi, fl, etc.) ************************************************** EXAMPLE The following is a simple example of the buggy behaviour of Sentence case, to give you an idea of the type of problem. See the attached file for many more examples (all different) of both Sentence case and Capitalize Every Word: Input: the rapide brown fox [with "rapide" marked as French] Expected: The rapide brown fox Output: The Rapide Brown fox The underlying code (from contents.xml) is this, where T3 is default format, and T4 is French: <text:p text:style-name="Standard"> <text:span text:style-name="T3">The </text:span> <text:span text:style-name="T4">Rapide BroWn Fox-Like Creat</text:span> <text:span text:style-name="T3">ure</text:span> </text:p> ************************************************** Given the possibility of data loss, I reckon this should be a SHOWSTOPPER for 3.3 - but I'll leave it to one of the experts to decide and, if so, add it to the meta issue. Enjoy! --------------------------------------------------------------------- Please do not reply to this automatically generated notification from Issue Tracker. Please log onto the website and enter your comments. http://qa.openoffice.org/issue_handling/project_issues.html#notification --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org For additional commands, e-mail: issues-h...@framework.openoffice.org --------------------------------------------------------------------- To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org For additional commands, e-mail: allbugs-h...@openoffice.org