Re: [E-devel] elm_label problem with specific text

Gustavo Sverzut Barbieri Wed, 18 Nov 2009 02:59:15 -0800

On Wed, Nov 18, 2009 at 1:32 AM, Brian Wang <[email protected]> wrote:
> On Wed, Nov 18, 2009 at 10:51 AM, Gustavo Sverzut Barbieri
> <[email protected]> wrote:
>> On Wed, Nov 18, 2009 at 12:37 AM, Brian Wang <[email protected]> 
>> wrote:
>>> On Sat, Nov 14, 2009 at 1:21 PM, Brian Wang <[email protected]> 
>>> wrote:
>>>> On Sat, Nov 14, 2009 at 1:44 AM, Gustavo Sverzut Barbieri
>>>> <[email protected]> wrote:
>>>>> On Fri, Nov 13, 2009 at 3:13 PM, Brian Wang <[email protected]> 
>>>>> wrote:
>>>>>> On Fri, Nov 13, 2009 at 8:12 PM, Gustavo Sverzut Barbieri
>>>>>> <[email protected]> wrote:
>>>>>>> On Fri, Nov 13, 2009 at 4:49 AM, Brian Wang <[email protected]> 
>>>>>>> wrote:
>>>>>>>> [snip]
>>>>>>>>> OK.  With the help of gdbserver/gdb, I am able to find where it's 
>>>>>>>>> looping.
>>>>>>>>> However, I don't know what causes it.  Here it goes:
>>>>>>>>> [svn r43601]
>>>>>>>>> evas/src/lib/canvas/evas_object_textblock.c:3152
>>>>>>>>
>>>>>>>> The line number is off... I've put some of my stuff in the front of
>>>>>>>> the source code... My bad.
>>>>>>>>
>>>>>>>>> evas_object_textblock_text_markup_get() calls
>>>>>>>>> escape = _escaped_char_match(p, &adv);
>>>>>>>>> _escaped_char_match() returns "" and 'adv' is set to 0
>>>>>>>>> thus the looping.
>>>>>>>>>
>>>>>>>>> Here is the gdb backtrace:
>>>>>>>>> --------------------------------------
>>>>>>>>> (gdb) bt
>>>>>>>>> #0  _escaped_char_match (s=0x780a1 "追 ", adv=0xbec8c490)
>>>>>>>>>    at evas_object_textblock.c:2732
>>>>>>>>> #1  0x4008338c in evas_object_textblock_text_markup_get (obj=0x75380)
>>>>>>>>>    at evas_object_textblock.c:3152
>>>>>>>>> #2  0x40826be0 in _edje_part_recalc_single (ed=0x0, ep=0x0, 
>>>>>>>>> desc=0x75ae0,
>>>>>>>>>    chosen_desc=0x40300000, rel1_to_x=0x0, rel1_to_y=0x40884c2c,
>>>>>>>>>    rel2_to_x=0x3, rel2_to_y=0x751c0, confine_to=0x0, params=0x749ac, 
>>>>>>>>> flags=3)
>>>>>>>>>    at edje_calc.c:651
>>>>>>>>> #3  0x408271c8 in _edje_part_recalc (ed=0x10, ep=0x748bc, flags=1)
>>>>>>>>>    at edje_calc.c:1721
>>>>>>>>> #4  0x40828e10 in _edje_recalc_do (ed=0x780a1) at edje_calc.c:224
>>>>>>>>> #5  0x408473b4 in edje_object_size_min_restricted_calc (
>>>>>>>>>    obj=<value optimized out>, minw=0x0, minh=0x4025e27c, 
>>>>>>>>> restrictedw=0,
>>>>>>>>>    restrictedh=-1) at edje_util.c:2362
>>>>>>>>> #6  0x40847618 in edje_object_size_min_calc (obj=0x780a1, 
>>>>>>>>> minw=0xbec8c490,
>>>>>>>>>    minh=0x40144860) at edje_util.c:2311
>>>>>>>>> #7  0x40210a6c in _sizing_eval (obj=0x5d060) at elm_label.c:55
>>>>>>>>> #8  0x40210e10 in elm_label_label_set (obj=0x5d060, label=0x8c88 " 追 
>>>>>>>>> ")
>>>>>>>>>    at elm_label.c:121
>>>>>>>>> #9  0x00008b1c in elm_main (argc=1, argv=0xbec8cd04) at 
>>>>>>>>> elm-label-bug-test.c:36
>>>>>>>>> #10 0x00008bac in main (argc=1, argv=0xbec8cd04) at 
>>>>>>>>> elm-label-bug-test.c:64
>>>>>>>>> --------------------------------------
>>>>>>>>>
>>>>>>>>> I don't know what's special about the string that makes it end up the 
>>>>>>>>> condition.
>>>>>>>>> Checking if (strlen(escape)==0 && adv==0) seems to terminate the loop.
>>>>>>>>>  But I totally have no clue what's going on here...  Fixing it without
>>>>>>>>> knowing what's causing the condition is wrong.
>>>>>>>>>
>>>>>>>>> The string "追 " in UTF-8 is of value: 0xe8 0xbf 0xbd 0x20
>>>>>>>>>
>>>>>>>>> Is the info above enough to track down the problem?
>>>>>>>>
>>>>>>>> I've tracked it down a bit.
>>>>>>>> On x86, via gdb,
>>>>>>>> ------------------------
>>>>>>>> (gdb) p escape_strings[sizeof(escape_strings)]
>>>>>>>> $4 = 102 'f'
>>>>>>>> ------------------------
>>>>>>>>
>>>>>>>> On my arm, via gdb,
>>>>>>>> ------------------------
>>>>>>>> (gdb) p escape_strings[sizeof(escape_strings)]
>>>>>>>> $12 = 0 '\0'
>>>>>>>> ------------------------
>>>>>>>
>>>>>>> As you said below, it is invalid read.
>>>>>>>
>>>>>>>
>>>>>>>> At the very last check of
>>>>>>>> while ((*mc) && (*sc)),
>>>>>>>> map_itr is equal to map_end and the difference is the on x86 *mc is
>>>>>>>> non-zero and on my arm, *mc is zero.
>>>>>>>> Therefore, on arm, the while loop is skipped all together and 'match'
>>>>>>>> is still 1.  And hence the looping...
>>>>>>>>
>>>>>>>> sizeof(escape_strings) on both platforms are the same: 1551.
>>>>>>>> I would say that it is invalid read for both cases.  The x86 case gets
>>>>>>>> lucky and got away with it.
>>>>>>>> I'm surprised that valgrind did not catch this.  Or maybe I'm wrong...
>>>>>>>> char a[] = "s";
>>>>>>>> sizeof(a) == 2
>>>>>>>> accessing a[2] is out of bounds
>>>>>>>
>>>>>>> this is correct. As why it does not warn, maybe there is another valid
>>>>>>> string allocated right after, thus you end reading something valid and
>>>>>>> valgrind will not warn you.
>>>>>>
>>>>>> I guess so too.  I thought valgrind is pretty good at catching
>>>>>> out-of-bounds accesses..
>>>>>
>>>>> well, there are some flags to make GCC produce code with garbage
>>>>> before and after arrays, I guess it's fortify source or something like
>>>>> that. mudflap should help as well.
>>>>>
>>>>>
>>>>>>>> We may check if (map_itr < map_end) after the first
>>>>>>>> _advance_after_end_of_string() inside _escaped_char_match()
>>>>>>>> or we may decrease map_end by 1 since every escape character is
>>>>>>>> already terminated by a null character.
>>>>>>>> Thus, my proposed patch (sorry for the bad part at the front, which is
>>>>>>>> only for my own usage):
>>>>>>>> ------------------------------------------------
>>>>>>>> Index: src/lib/canvas/evas_object_textblock.c
>>>>>>>> ===================================================================
>>>>>>>> --- src/lib/canvas/evas_object_textblock.c      (revision 43601)
>>>>>>>> +++ src/lib/canvas/evas_object_textblock.c      (working copy)
>>>>>>>> @@ -1608,6 +1608,18 @@
>>>>>>>>  _layout_word_start(char *str, int start)
>>>>>>>>  {
>>>>>>>>    int p, tp, chr = 0;
>>>>>>>> +
>>>>>>>> +#if 1
>>>>>>>> +   //
>>>>>>>> +   // coolbrian: break if the word is not within the ASCII range
>>>>>>>> +   //   @note This is good for breaking up Chinese words, which are
>>>>>>>> made of Chinese characters.
>>>>>>>> +   //         Chinese characters do not look strange if they are not
>>>>>>>> grouped to form a 'phrase'.
>>>>>>>> +   //         Also, Chinese 'phrases' are of too many patterns and
>>>>>>>> probably need a dictionary to look up.
>>>>>>>> +   //         That would be too much.
>>>>>>>> +   //
>>>>>>>> +   if (((unsigned char)str[start]) >= 0x80)
>>>>>>>> +     return start;
>>>>>>>> +#endif
>>>>>>>>
>>>>>>>>    p = start;
>>>>>>>>    chr = evas_common_font_utf8_get_next((unsigned char *)(str), &p);
>>>>>>>> @@ -2709,7 +2721,7 @@
>>>>>>>>    const char *map_itr, *map_end, *mc, *sc;
>>>>>>>>
>>>>>>>>    map_itr = escape_strings;
>>>>>>>> -   map_end = map_itr + sizeof(escape_strings);
>>>>>>>> +   map_end = map_itr + sizeof(escape_strings) - 1;
>>>>>>>>
>>>>>>>>    while (map_itr < map_end)
>>>>>>>
>>>>>>> It's weird:
>>>>>>>
>>>>>>> map_end = map_itr + sizeof(escape_strings)
>>>>>>>
>>>>>>> is fine, as we compare for less than, so this address would never be
>>>>>>> used. The last run should be on escape_strings:
>>>>>>>        "&perp;\0"     "\xe2\x8a\xa5\0"
>>>>>>>
>>>>>>> Thus:
>>>>>>>
>>>>>>>   while (map_itr < map_end)
>>>>>>>
>>>>>>> --> yes, we're at map_itr = &perp;\0
>>>>>>>
>>>>>>>
>>>>>>>     {
>>>>>>>        const char *escape;
>>>>>>>        int match;
>>>>>>>
>>>>>>>        escape = map_itr;
>>>>>>>
>>>>>>> --> escape = &perp;\0
>>>>>>>
>>>>>>>
>>>>>>>        _advance_after_end_of_string(&map_itr);
>>>>>>>        mc = map_itr;
>>>>>>>
>>>>>>> --> map_itr = \xe2\x8a\xa5\0
>>>>>>>
>>>>>>>
>>>>>>>        sc = s;
>>>>>>>        match = 1;
>>>>>>>        while ((*mc) && (*sc))
>>>>>>>          {
>>>>>>>
>>>>>>> --> enters
>>>>>>>
>>>>>>>
>>>>>>>             if ((unsigned char)*sc < (unsigned char)*mc) return NULL;
>>>>>>>
>>>>>>> --> I have no clue why this
>>>>>>
>>>>>> I don't either...
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>             if (*sc != *mc) match = 0;
>>>>>>>
>>>>>>> --> fails, thus match = 0
>>>>>>>
>>>>>>>
>>>>>>>             mc++;
>>>>>>>             sc++;
>>>>>>>          }
>>>>>>>        if (match)
>>>>>>>
>>>>>>> --> skipped, as match = 0
>>>>>>>
>>>>>>>          {
>>>>>>>             *adv = mc - map_itr;
>>>>>>>             return escape;
>>>>>>>          }
>>>>>>>        _advance_after_end_of_string(&map_itr);
>>>>>>>
>>>>>>> --> map_itr was \xe2\x8a\xa5\0, thus it's now one byte after, thus
>>>>>>> map_itr == escape_strings + sizeof(escape_strings) and thus the next
>>>>>>> while (map_itr < map_end) will fail.
>>>>>>
>>>>>> No.  map_itr == escape_strings + sizeof(escape_strings) -1
>>>>>> Thus the next while (map_itr < map_end) will pass.
>>>>>
>>>>> So that's the bug. If map_itr was  \xe2\x8a\xa5\0 and
>>>>> _advance_after_end_of_string(&map_itr); looks for the first \0 (the
>>>>> last char) then increments one, we clearly should be correct! :-(
>>>>>
>>>>>
>>>>>> If we simplify the string a bit:
>>>>>> char escape_strings[] = "\xe2\x8a\0";   // sizeof it == 4
>>>>>> say escape_strings is at address 0x0
>>>>>> map_itr = 0;
>>>>>> map_end = 0 + 4 = 4;
>>>>>> _advance_after_end_of_string(&map_itr);
>>>>>> map_itr=3;
>>>>>
>>>>> what?! how advance after of string is returning 3???
>>>>> it should read 3 times, stop since it found 0, then increment one...
>>>>> please debug that one :-)
>>>>
>>>> static void _advance_after_end_of_string(const char **p_buf)
>>>> {
>>>>        while (**p_buf != 0) (*p_buf)++;
>>>>        (*p_buf)++;
>>>> }
>>>>
>>>> After the while loop, *p_buf=2 (read 3 times but the 3rd time does not
>>>> increment it).  The second line would make it 3.
>>>> Please have a look at the attached file.
>>>
>>> Sorry for replying to my own mail.
>>>
>>> Is it a non-bug?  Or am I looking at the wrong spot?
>>>
>>> Thanks for the feedbacks. :-)
>>
>> I still think there is something wrong there:
>>
>> static void _advance_after_end_of_string(const char **p_buf)
>> {
>>        while (**p_buf != 0) (*p_buf)++;
>>        (*p_buf)++;
>> }
>>
>> with p_buf pointing to "\xe2\x8a\0", after the loop it should
>> increment 2 times, you'd be at the third character (\0). Then
>> increment it again and you're at the last \0 (implicit by using "" in
>> C). Maybe this is the problem? Having the trailing \0, that will have
>> the next _advance_after_end_of_string() call to stop at the first loop
>> iteration and then increment once more, going to the 5th byte
>> (non-existing).
>
> OK.
> This problem only affects languages whose UTF-8 encodings fall outside
> the last UTF-8 character of escape_strings.
> I would guess CJK users are affected somehow.
>
>>
>> If this is the case, then just remove the last \0, or use your patch
>> (it would make sense), or just check if pointer went out of boundaries
>> after every call of _advance_after_end_of_string() (would be the most
>> correct solution, since it would never fail).
>
> OK.
> Checking the boundaries after _advance_after_end_of_string() is more expensive
> than removing the last \0 or reducing the map_end by one.
>
> How could one make the fix upstream?  Keeping my own offline patches
> would be a maintenance
> nightmare.


I'll go with the check after map since it's safer and people extending
Evas would not break it accidentally since they removed or added \0 at
the end of the string.   The integer comparison is not hat heavy
anyway.

See if this applies and fixes the bug. Sorry for the long back and
forth with this thread :-(


Index: src/lib/canvas/evas_object_textblock.c
===================================================================
--- src/lib/canvas/evas_object_textblock.c      (revision 43746)
+++ src/lib/canvas/evas_object_textblock.c      (working copy)
@@ -2718,6 +2718,8 @@

        escape = map_itr;
        _advance_after_end_of_string(&map_itr);
+       if (map_itr >= map_end) break;
+
        mc = map_itr;
        sc = s;
        match = 1;



-- 
Gustavo Sverzut Barbieri
http://profusion.mobi embedded systems
--------------------------------------
MSN: [email protected]
Skype: gsbarbieri
Mobile: +55 (19) 9225-2202

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
enlightenment-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] elm_label problem with specific text

Reply via email to