Re: [E-devel] elm_label problem with specific text

Brian Wang Tue, 17 Nov 2009 19:34:21 -0800

On Wed, Nov 18, 2009 at 10:51 AM, Gustavo Sverzut Barbieri
<barbi...@profusion.mobi> wrote:
> On Wed, Nov 18, 2009 at 12:37 AM, Brian Wang <brian.wang.0...@gmail.com> 
> wrote:
>> On Sat, Nov 14, 2009 at 1:21 PM, Brian Wang <brian.wang.0...@gmail.com> 
>> wrote:
>>> On Sat, Nov 14, 2009 at 1:44 AM, Gustavo Sverzut Barbieri
>>> <barbi...@profusion.mobi> wrote:
>>>> On Fri, Nov 13, 2009 at 3:13 PM, Brian Wang <brian.wang.0...@gmail.com> 
>>>> wrote:
>>>>> On Fri, Nov 13, 2009 at 8:12 PM, Gustavo Sverzut Barbieri
>>>>> <barbi...@profusion.mobi> wrote:
>>>>>> On Fri, Nov 13, 2009 at 4:49 AM, Brian Wang <brian.wang.0...@gmail.com> 
>>>>>> wrote:
>>>>>>> [snip]
>>>>>>>> OK.  With the help of gdbserver/gdb, I am able to find where it's 
>>>>>>>> looping.
>>>>>>>> However, I don't know what causes it.  Here it goes:
>>>>>>>> [svn r43601]
>>>>>>>> evas/src/lib/canvas/evas_object_textblock.c:3152
>>>>>>>
>>>>>>> The line number is off... I've put some of my stuff in the front of
>>>>>>> the source code... My bad.
>>>>>>>
>>>>>>>> evas_object_textblock_text_markup_get() calls
>>>>>>>> escape = _escaped_char_match(p, &adv);
>>>>>>>> _escaped_char_match() returns "" and 'adv' is set to 0
>>>>>>>> thus the looping.
>>>>>>>>
>>>>>>>> Here is the gdb backtrace:
>>>>>>>> --------------------------------------
>>>>>>>> (gdb) bt
>>>>>>>> #0  _escaped_char_match (s=0x780a1 "追 ", adv=0xbec8c490)
>>>>>>>>    at evas_object_textblock.c:2732
>>>>>>>> #1  0x4008338c in evas_object_textblock_text_markup_get (obj=0x75380)
>>>>>>>>    at evas_object_textblock.c:3152
>>>>>>>> #2  0x40826be0 in _edje_part_recalc_single (ed=0x0, ep=0x0, 
>>>>>>>> desc=0x75ae0,
>>>>>>>>    chosen_desc=0x40300000, rel1_to_x=0x0, rel1_to_y=0x40884c2c,
>>>>>>>>    rel2_to_x=0x3, rel2_to_y=0x751c0, confine_to=0x0, params=0x749ac, 
>>>>>>>> flags=3)
>>>>>>>>    at edje_calc.c:651
>>>>>>>> #3  0x408271c8 in _edje_part_recalc (ed=0x10, ep=0x748bc, flags=1)
>>>>>>>>    at edje_calc.c:1721
>>>>>>>> #4  0x40828e10 in _edje_recalc_do (ed=0x780a1) at edje_calc.c:224
>>>>>>>> #5  0x408473b4 in edje_object_size_min_restricted_calc (
>>>>>>>>    obj=<value optimized out>, minw=0x0, minh=0x4025e27c, restrictedw=0,
>>>>>>>>    restrictedh=-1) at edje_util.c:2362
>>>>>>>> #6  0x40847618 in edje_object_size_min_calc (obj=0x780a1, 
>>>>>>>> minw=0xbec8c490,
>>>>>>>>    minh=0x40144860) at edje_util.c:2311
>>>>>>>> #7  0x40210a6c in _sizing_eval (obj=0x5d060) at elm_label.c:55
>>>>>>>> #8  0x40210e10 in elm_label_label_set (obj=0x5d060, label=0x8c88 " 追 ")
>>>>>>>>    at elm_label.c:121
>>>>>>>> #9  0x00008b1c in elm_main (argc=1, argv=0xbec8cd04) at 
>>>>>>>> elm-label-bug-test.c:36
>>>>>>>> #10 0x00008bac in main (argc=1, argv=0xbec8cd04) at 
>>>>>>>> elm-label-bug-test.c:64
>>>>>>>> --------------------------------------
>>>>>>>>
>>>>>>>> I don't know what's special about the string that makes it end up the 
>>>>>>>> condition.
>>>>>>>> Checking if (strlen(escape)==0 && adv==0) seems to terminate the loop.
>>>>>>>>  But I totally have no clue what's going on here...  Fixing it without
>>>>>>>> knowing what's causing the condition is wrong.
>>>>>>>>
>>>>>>>> The string "追 " in UTF-8 is of value: 0xe8 0xbf 0xbd 0x20
>>>>>>>>
>>>>>>>> Is the info above enough to track down the problem?
>>>>>>>
>>>>>>> I've tracked it down a bit.
>>>>>>> On x86, via gdb,
>>>>>>> ------------------------
>>>>>>> (gdb) p escape_strings[sizeof(escape_strings)]
>>>>>>> $4 = 102 'f'
>>>>>>> ------------------------
>>>>>>>
>>>>>>> On my arm, via gdb,
>>>>>>> ------------------------
>>>>>>> (gdb) p escape_strings[sizeof(escape_strings)]
>>>>>>> $12 = 0 '\0'
>>>>>>> ------------------------
>>>>>>
>>>>>> As you said below, it is invalid read.
>>>>>>
>>>>>>
>>>>>>> At the very last check of
>>>>>>> while ((*mc) && (*sc)),
>>>>>>> map_itr is equal to map_end and the difference is the on x86 *mc is
>>>>>>> non-zero and on my arm, *mc is zero.
>>>>>>> Therefore, on arm, the while loop is skipped all together and 'match'
>>>>>>> is still 1.  And hence the looping...
>>>>>>>
>>>>>>> sizeof(escape_strings) on both platforms are the same: 1551.
>>>>>>> I would say that it is invalid read for both cases.  The x86 case gets
>>>>>>> lucky and got away with it.
>>>>>>> I'm surprised that valgrind did not catch this.  Or maybe I'm wrong...
>>>>>>> char a[] = "s";
>>>>>>> sizeof(a) == 2
>>>>>>> accessing a[2] is out of bounds
>>>>>>
>>>>>> this is correct. As why it does not warn, maybe there is another valid
>>>>>> string allocated right after, thus you end reading something valid and
>>>>>> valgrind will not warn you.
>>>>>
>>>>> I guess so too.  I thought valgrind is pretty good at catching
>>>>> out-of-bounds accesses..
>>>>
>>>> well, there are some flags to make GCC produce code with garbage
>>>> before and after arrays, I guess it's fortify source or something like
>>>> that. mudflap should help as well.
>>>>
>>>>
>>>>>>> We may check if (map_itr < map_end) after the first
>>>>>>> _advance_after_end_of_string() inside _escaped_char_match()
>>>>>>> or we may decrease map_end by 1 since every escape character is
>>>>>>> already terminated by a null character.
>>>>>>> Thus, my proposed patch (sorry for the bad part at the front, which is
>>>>>>> only for my own usage):
>>>>>>> ------------------------------------------------
>>>>>>> Index: src/lib/canvas/evas_object_textblock.c
>>>>>>> ===================================================================
>>>>>>> --- src/lib/canvas/evas_object_textblock.c      (revision 43601)
>>>>>>> +++ src/lib/canvas/evas_object_textblock.c      (working copy)
>>>>>>> @@ -1608,6 +1608,18 @@
>>>>>>>  _layout_word_start(char *str, int start)
>>>>>>>  {
>>>>>>>    int p, tp, chr = 0;
>>>>>>> +
>>>>>>> +#if 1
>>>>>>> +   //
>>>>>>> +   // coolbrian: break if the word is not within the ASCII range
>>>>>>> +   //   @note This is good for breaking up Chinese words, which are
>>>>>>> made of Chinese characters.
>>>>>>> +   //         Chinese characters do not look strange if they are not
>>>>>>> grouped to form a 'phrase'.
>>>>>>> +   //         Also, Chinese 'phrases' are of too many patterns and
>>>>>>> probably need a dictionary to look up.
>>>>>>> +   //         That would be too much.
>>>>>>> +   //
>>>>>>> +   if (((unsigned char)str[start]) >= 0x80)
>>>>>>> +     return start;
>>>>>>> +#endif
>>>>>>>
>>>>>>>    p = start;
>>>>>>>    chr = evas_common_font_utf8_get_next((unsigned char *)(str), &p);
>>>>>>> @@ -2709,7 +2721,7 @@
>>>>>>>    const char *map_itr, *map_end, *mc, *sc;
>>>>>>>
>>>>>>>    map_itr = escape_strings;
>>>>>>> -   map_end = map_itr + sizeof(escape_strings);
>>>>>>> +   map_end = map_itr + sizeof(escape_strings) - 1;
>>>>>>>
>>>>>>>    while (map_itr < map_end)
>>>>>>
>>>>>> It's weird:
>>>>>>
>>>>>> map_end = map_itr + sizeof(escape_strings)
>>>>>>
>>>>>> is fine, as we compare for less than, so this address would never be
>>>>>> used. The last run should be on escape_strings:
>>>>>>        "&perp;\0"     "\xe2\x8a\xa5\0"
>>>>>>
>>>>>> Thus:
>>>>>>
>>>>>>   while (map_itr < map_end)
>>>>>>
>>>>>> --> yes, we're at map_itr = &perp;\0
>>>>>>
>>>>>>
>>>>>>     {
>>>>>>        const char *escape;
>>>>>>        int match;
>>>>>>
>>>>>>        escape = map_itr;
>>>>>>
>>>>>> --> escape = &perp;\0
>>>>>>
>>>>>>
>>>>>>        _advance_after_end_of_string(&map_itr);
>>>>>>        mc = map_itr;
>>>>>>
>>>>>> --> map_itr = \xe2\x8a\xa5\0
>>>>>>
>>>>>>
>>>>>>        sc = s;
>>>>>>        match = 1;
>>>>>>        while ((*mc) && (*sc))
>>>>>>          {
>>>>>>
>>>>>> --> enters
>>>>>>
>>>>>>
>>>>>>             if ((unsigned char)*sc < (unsigned char)*mc) return NULL;
>>>>>>
>>>>>> --> I have no clue why this
>>>>>
>>>>> I don't either...
>>>>>
>>>>>>
>>>>>>
>>>>>>             if (*sc != *mc) match = 0;
>>>>>>
>>>>>> --> fails, thus match = 0
>>>>>>
>>>>>>
>>>>>>             mc++;
>>>>>>             sc++;
>>>>>>          }
>>>>>>        if (match)
>>>>>>
>>>>>> --> skipped, as match = 0
>>>>>>
>>>>>>          {
>>>>>>             *adv = mc - map_itr;
>>>>>>             return escape;
>>>>>>          }
>>>>>>        _advance_after_end_of_string(&map_itr);
>>>>>>
>>>>>> --> map_itr was \xe2\x8a\xa5\0, thus it's now one byte after, thus
>>>>>> map_itr == escape_strings + sizeof(escape_strings) and thus the next
>>>>>> while (map_itr < map_end) will fail.
>>>>>
>>>>> No.  map_itr == escape_strings + sizeof(escape_strings) -1
>>>>> Thus the next while (map_itr < map_end) will pass.
>>>>
>>>> So that's the bug. If map_itr was  \xe2\x8a\xa5\0 and
>>>> _advance_after_end_of_string(&map_itr); looks for the first \0 (the
>>>> last char) then increments one, we clearly should be correct! :-(
>>>>
>>>>
>>>>> If we simplify the string a bit:
>>>>> char escape_strings[] = "\xe2\x8a\0";   // sizeof it == 4
>>>>> say escape_strings is at address 0x0
>>>>> map_itr = 0;
>>>>> map_end = 0 + 4 = 4;
>>>>> _advance_after_end_of_string(&map_itr);
>>>>> map_itr=3;
>>>>
>>>> what?! how advance after of string is returning 3???
>>>> it should read 3 times, stop since it found 0, then increment one...
>>>> please debug that one :-)
>>>
>>> static void _advance_after_end_of_string(const char **p_buf)
>>> {
>>>        while (**p_buf != 0) (*p_buf)++;
>>>        (*p_buf)++;
>>> }
>>>
>>> After the while loop, *p_buf=2 (read 3 times but the 3rd time does not
>>> increment it).  The second line would make it 3.
>>> Please have a look at the attached file.
>>
>> Sorry for replying to my own mail.
>>
>> Is it a non-bug?  Or am I looking at the wrong spot?
>>
>> Thanks for the feedbacks. :-)
>
> I still think there is something wrong there:
>
> static void _advance_after_end_of_string(const char **p_buf)
> {
>        while (**p_buf != 0) (*p_buf)++;
>        (*p_buf)++;
> }
>
> with p_buf pointing to "\xe2\x8a\0", after the loop it should
> increment 2 times, you'd be at the third character (\0). Then
> increment it again and you're at the last \0 (implicit by using "" in
> C). Maybe this is the problem? Having the trailing \0, that will have
> the next _advance_after_end_of_string() call to stop at the first loop
> iteration and then increment once more, going to the 5th byte
> (non-existing).


OK.
This problem only affects languages whose UTF-8 encodings fall outside
the last UTF-8 character of escape_strings.
I would guess CJK users are affected somehow.

>
> If this is the case, then just remove the last \0, or use your patch
> (it would make sense), or just check if pointer went out of boundaries
> after every call of _advance_after_end_of_string() (would be the most
> correct solution, since it would never fail).

OK.
Checking the boundaries after _advance_after_end_of_string() is more expensive
than removing the last \0 or reducing the map_end by one.

How could one make the fix upstream?  Keeping my own offline patches
would be a maintenance
nightmare.

Best regards,


brian

>
> BR,
>
> --
> Gustavo Sverzut Barbieri
> http://profusion.mobi embedded systems
> --------------------------------------
> MSN: barbi...@gmail.com
> Skype: gsbarbieri
> Mobile: +55 (19) 9225-2202
>



-- 
brian
------------------

Cool-Karaoke - The smallest recording studio, in your palm, open-sourced
http://cool-idea.com.tw/

iMaGiNaTiOn iS mOrE iMpOrTaNt tHaN kNoWlEdGe

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
enlightenment-devel mailing list
enlightenment-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

Re: [E-devel] elm_label problem with specific text

Reply via email to