To comment on the following update, log in, then open the issue:
http://www.openoffice.org/issues/show_bug.cgi?id=113558
                 Issue #|113558
                 Summary|Change Case broken by language tags and/or ligatures
               Component|framework
                 Version|OOO330m1
                Platform|PC
                     URL|
              OS/Version|Windows, all
                  Status|UNCONFIRMED
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P2
            Subcomponent|code
             Assigned to|tm
             Reported by|jurf





------- Additional comments from j...@openoffice.org Sat Jul 31 05:10:08 +0000 
2010 -------
Casing options broken by language tags and/or ligatures

Issue 1601 (http://qa.openoffice.org/issues/show_bug.cgi?id=1601), marked Fixed
and with CWS tl74 included in OOo-dev300m85 (tested) and OOO330m2 (not tested,
but likely identical), implements three new and welcome options in Format |
Change case, namely:

        Sentence case
        Capitalize Every Word
        tOGGLE cASE

Whilst I've not tested tOGGLE cASE (it's not something I need), I have spent a
good while poking Sentence case and Capitalize Every Word with a stick. Both
functions are, unfortunately, very buggy. The implementation of Capitalize Every
Word is especially bad, with a high probability of data loss (disappearing text
with no guarantee that Undo works properly). So far, I've seen the bugs be
triggered by either language mark-up or ligatures (the latter not necessarily in
text selections), which are actually the only conditions I've been testing for.
As such, it's likely there are other triggers, too.

The data loss is particularly troubling as the "undo" function, even if given
sufficient steps, does not necessarily restore the original text correctly. And
even that assumes that the user is half-expecting trouble.

Issue present in both Writer and Calc (not tested others), and in both cases is
severe.

I'm attaching an ODT file to this issue. It contains several examples you can
try out yourself, together with mock-ups of expected and actual results.


**************************************************

ISSUE DESCRIPTION

In brief, the main problems I've found so far are:

Sentence case
- The presence of language mark-up within selected text confuses the parser,
causing it to consider the marked-up section as a new sentence, thus
capitalizing two or more words in the middle of a sentence.

Capitalize Every Word
- Language mark-up causes similar miscalculations, but more exaggerated,
potentially causing data loss (see attached file)
- The presence of ligatures, either within selected text, or before it (but in
the same paragraph) causes similar problems.
- Applying Capitalize Every Word to multiple selections further exacerbates the
problem.

**************************************************

POSSIBLE CAUSES

I'm not a programmer, but I think the primary cause of the bugs in either
function is a miscalculation of selection bounds, which leads to at times
extremely severe offset errors both as regards the selection area and the bounds
of the text itself. Among the causes would appear to be:
1. the parser gives language declarations a width (two characters for each
"tag", apparently, being one for the opening, another for the closure);
2. the parser miscounts the length of ligatures (unicode FF00 to FF06) whether
or not they're selected, which causes both selections and actual words processed
to expand to the right - if there's no room at the end of the paragraph for this
expansion, text disappears;
3. multiple selections are incorrectly handled (it appears as though errors in
one selection block are carried over to the next, and so on). This may simply be
the symptomatic of the first potential causes, but it may also be compounded by
buffers not being cleared. Or something (TM).

The problem was exacerbated, I think, by the original test case
(http://quaste.services.openoffice.org/index.php?option=com_tcs&task=tcs_show&tcsid=3116),
which is just plain text: no formatting, no language tags, no awkward characters
such as non-diphthong ligatures (ff, fi, fl, etc.)

**************************************************

EXAMPLE

The following is a simple example of the buggy behaviour of Sentence case, to
give you an idea of the type of problem. See the attached file for many more
examples (all different) of both Sentence case and Capitalize Every Word:

Input:          the rapide brown fox [with "rapide" marked as French]
Expected:       The rapide brown fox
Output:         The Rapide Brown fox

The underlying code (from contents.xml) is this, where T3 is default format, and
T4 is French:

<text:p text:style-name="Standard">
        <text:span text:style-name="T3">The </text:span>
        <text:span text:style-name="T4">Rapide BroWn Fox-Like Creat</text:span>
        <text:span text:style-name="T3">ure</text:span>
</text:p>


**************************************************

Given the possibility of data loss, I reckon this should be a SHOWSTOPPER for
3.3 - but I'll leave it to one of the experts to decide and, if so, add it to
the meta issue.

Enjoy!

---------------------------------------------------------------------
Please do not reply to this automatically generated notification from
Issue Tracker. Please log onto the website and enter your comments.
http://qa.openoffice.org/issue_handling/project_issues.html#notification

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@framework.openoffice.org
For additional commands, e-mail: issues-h...@framework.openoffice.org


---------------------------------------------------------------------
To unsubscribe, e-mail: allbugs-unsubscr...@openoffice.org
For additional commands, e-mail: allbugs-h...@openoffice.org

Reply via email to