On Wed, Nov 18, 2009 at 10:51 AM, Gustavo Sverzut Barbieri <barbi...@profusion.mobi> wrote: > On Wed, Nov 18, 2009 at 12:37 AM, Brian Wang <brian.wang.0...@gmail.com> > wrote: >> On Sat, Nov 14, 2009 at 1:21 PM, Brian Wang <brian.wang.0...@gmail.com> >> wrote: >>> On Sat, Nov 14, 2009 at 1:44 AM, Gustavo Sverzut Barbieri >>> <barbi...@profusion.mobi> wrote: >>>> On Fri, Nov 13, 2009 at 3:13 PM, Brian Wang <brian.wang.0...@gmail.com> >>>> wrote: >>>>> On Fri, Nov 13, 2009 at 8:12 PM, Gustavo Sverzut Barbieri >>>>> <barbi...@profusion.mobi> wrote: >>>>>> On Fri, Nov 13, 2009 at 4:49 AM, Brian Wang <brian.wang.0...@gmail.com> >>>>>> wrote: >>>>>>> [snip] >>>>>>>> OK. With the help of gdbserver/gdb, I am able to find where it's >>>>>>>> looping. >>>>>>>> However, I don't know what causes it. Here it goes: >>>>>>>> [svn r43601] >>>>>>>> evas/src/lib/canvas/evas_object_textblock.c:3152 >>>>>>> >>>>>>> The line number is off... I've put some of my stuff in the front of >>>>>>> the source code... My bad. >>>>>>> >>>>>>>> evas_object_textblock_text_markup_get() calls >>>>>>>> escape = _escaped_char_match(p, &adv); >>>>>>>> _escaped_char_match() returns "" and 'adv' is set to 0 >>>>>>>> thus the looping. >>>>>>>> >>>>>>>> Here is the gdb backtrace: >>>>>>>> -------------------------------------- >>>>>>>> (gdb) bt >>>>>>>> #0 _escaped_char_match (s=0x780a1 "追 ", adv=0xbec8c490) >>>>>>>> at evas_object_textblock.c:2732 >>>>>>>> #1 0x4008338c in evas_object_textblock_text_markup_get (obj=0x75380) >>>>>>>> at evas_object_textblock.c:3152 >>>>>>>> #2 0x40826be0 in _edje_part_recalc_single (ed=0x0, ep=0x0, >>>>>>>> desc=0x75ae0, >>>>>>>> chosen_desc=0x40300000, rel1_to_x=0x0, rel1_to_y=0x40884c2c, >>>>>>>> rel2_to_x=0x3, rel2_to_y=0x751c0, confine_to=0x0, params=0x749ac, >>>>>>>> flags=3) >>>>>>>> at edje_calc.c:651 >>>>>>>> #3 0x408271c8 in _edje_part_recalc (ed=0x10, ep=0x748bc, flags=1) >>>>>>>> at edje_calc.c:1721 >>>>>>>> #4 0x40828e10 in _edje_recalc_do (ed=0x780a1) at edje_calc.c:224 >>>>>>>> #5 0x408473b4 in edje_object_size_min_restricted_calc ( >>>>>>>> obj=<value optimized out>, minw=0x0, minh=0x4025e27c, restrictedw=0, >>>>>>>> restrictedh=-1) at edje_util.c:2362 >>>>>>>> #6 0x40847618 in edje_object_size_min_calc (obj=0x780a1, >>>>>>>> minw=0xbec8c490, >>>>>>>> minh=0x40144860) at edje_util.c:2311 >>>>>>>> #7 0x40210a6c in _sizing_eval (obj=0x5d060) at elm_label.c:55 >>>>>>>> #8 0x40210e10 in elm_label_label_set (obj=0x5d060, label=0x8c88 " 追 ") >>>>>>>> at elm_label.c:121 >>>>>>>> #9 0x00008b1c in elm_main (argc=1, argv=0xbec8cd04) at >>>>>>>> elm-label-bug-test.c:36 >>>>>>>> #10 0x00008bac in main (argc=1, argv=0xbec8cd04) at >>>>>>>> elm-label-bug-test.c:64 >>>>>>>> -------------------------------------- >>>>>>>> >>>>>>>> I don't know what's special about the string that makes it end up the >>>>>>>> condition. >>>>>>>> Checking if (strlen(escape)==0 && adv==0) seems to terminate the loop. >>>>>>>> But I totally have no clue what's going on here... Fixing it without >>>>>>>> knowing what's causing the condition is wrong. >>>>>>>> >>>>>>>> The string "追 " in UTF-8 is of value: 0xe8 0xbf 0xbd 0x20 >>>>>>>> >>>>>>>> Is the info above enough to track down the problem? >>>>>>> >>>>>>> I've tracked it down a bit. >>>>>>> On x86, via gdb, >>>>>>> ------------------------ >>>>>>> (gdb) p escape_strings[sizeof(escape_strings)] >>>>>>> $4 = 102 'f' >>>>>>> ------------------------ >>>>>>> >>>>>>> On my arm, via gdb, >>>>>>> ------------------------ >>>>>>> (gdb) p escape_strings[sizeof(escape_strings)] >>>>>>> $12 = 0 '\0' >>>>>>> ------------------------ >>>>>> >>>>>> As you said below, it is invalid read. >>>>>> >>>>>> >>>>>>> At the very last check of >>>>>>> while ((*mc) && (*sc)), >>>>>>> map_itr is equal to map_end and the difference is the on x86 *mc is >>>>>>> non-zero and on my arm, *mc is zero. >>>>>>> Therefore, on arm, the while loop is skipped all together and 'match' >>>>>>> is still 1. And hence the looping... >>>>>>> >>>>>>> sizeof(escape_strings) on both platforms are the same: 1551. >>>>>>> I would say that it is invalid read for both cases. The x86 case gets >>>>>>> lucky and got away with it. >>>>>>> I'm surprised that valgrind did not catch this. Or maybe I'm wrong... >>>>>>> char a[] = "s"; >>>>>>> sizeof(a) == 2 >>>>>>> accessing a[2] is out of bounds >>>>>> >>>>>> this is correct. As why it does not warn, maybe there is another valid >>>>>> string allocated right after, thus you end reading something valid and >>>>>> valgrind will not warn you. >>>>> >>>>> I guess so too. I thought valgrind is pretty good at catching >>>>> out-of-bounds accesses.. >>>> >>>> well, there are some flags to make GCC produce code with garbage >>>> before and after arrays, I guess it's fortify source or something like >>>> that. mudflap should help as well. >>>> >>>> >>>>>>> We may check if (map_itr < map_end) after the first >>>>>>> _advance_after_end_of_string() inside _escaped_char_match() >>>>>>> or we may decrease map_end by 1 since every escape character is >>>>>>> already terminated by a null character. >>>>>>> Thus, my proposed patch (sorry for the bad part at the front, which is >>>>>>> only for my own usage): >>>>>>> ------------------------------------------------ >>>>>>> Index: src/lib/canvas/evas_object_textblock.c >>>>>>> =================================================================== >>>>>>> --- src/lib/canvas/evas_object_textblock.c (revision 43601) >>>>>>> +++ src/lib/canvas/evas_object_textblock.c (working copy) >>>>>>> @@ -1608,6 +1608,18 @@ >>>>>>> _layout_word_start(char *str, int start) >>>>>>> { >>>>>>> int p, tp, chr = 0; >>>>>>> + >>>>>>> +#if 1 >>>>>>> + // >>>>>>> + // coolbrian: break if the word is not within the ASCII range >>>>>>> + // @note This is good for breaking up Chinese words, which are >>>>>>> made of Chinese characters. >>>>>>> + // Chinese characters do not look strange if they are not >>>>>>> grouped to form a 'phrase'. >>>>>>> + // Also, Chinese 'phrases' are of too many patterns and >>>>>>> probably need a dictionary to look up. >>>>>>> + // That would be too much. >>>>>>> + // >>>>>>> + if (((unsigned char)str[start]) >= 0x80) >>>>>>> + return start; >>>>>>> +#endif >>>>>>> >>>>>>> p = start; >>>>>>> chr = evas_common_font_utf8_get_next((unsigned char *)(str), &p); >>>>>>> @@ -2709,7 +2721,7 @@ >>>>>>> const char *map_itr, *map_end, *mc, *sc; >>>>>>> >>>>>>> map_itr = escape_strings; >>>>>>> - map_end = map_itr + sizeof(escape_strings); >>>>>>> + map_end = map_itr + sizeof(escape_strings) - 1; >>>>>>> >>>>>>> while (map_itr < map_end) >>>>>> >>>>>> It's weird: >>>>>> >>>>>> map_end = map_itr + sizeof(escape_strings) >>>>>> >>>>>> is fine, as we compare for less than, so this address would never be >>>>>> used. The last run should be on escape_strings: >>>>>> "⊥\0" "\xe2\x8a\xa5\0" >>>>>> >>>>>> Thus: >>>>>> >>>>>> while (map_itr < map_end) >>>>>> >>>>>> --> yes, we're at map_itr = ⊥\0 >>>>>> >>>>>> >>>>>> { >>>>>> const char *escape; >>>>>> int match; >>>>>> >>>>>> escape = map_itr; >>>>>> >>>>>> --> escape = ⊥\0 >>>>>> >>>>>> >>>>>> _advance_after_end_of_string(&map_itr); >>>>>> mc = map_itr; >>>>>> >>>>>> --> map_itr = \xe2\x8a\xa5\0 >>>>>> >>>>>> >>>>>> sc = s; >>>>>> match = 1; >>>>>> while ((*mc) && (*sc)) >>>>>> { >>>>>> >>>>>> --> enters >>>>>> >>>>>> >>>>>> if ((unsigned char)*sc < (unsigned char)*mc) return NULL; >>>>>> >>>>>> --> I have no clue why this >>>>> >>>>> I don't either... >>>>> >>>>>> >>>>>> >>>>>> if (*sc != *mc) match = 0; >>>>>> >>>>>> --> fails, thus match = 0 >>>>>> >>>>>> >>>>>> mc++; >>>>>> sc++; >>>>>> } >>>>>> if (match) >>>>>> >>>>>> --> skipped, as match = 0 >>>>>> >>>>>> { >>>>>> *adv = mc - map_itr; >>>>>> return escape; >>>>>> } >>>>>> _advance_after_end_of_string(&map_itr); >>>>>> >>>>>> --> map_itr was \xe2\x8a\xa5\0, thus it's now one byte after, thus >>>>>> map_itr == escape_strings + sizeof(escape_strings) and thus the next >>>>>> while (map_itr < map_end) will fail. >>>>> >>>>> No. map_itr == escape_strings + sizeof(escape_strings) -1 >>>>> Thus the next while (map_itr < map_end) will pass. >>>> >>>> So that's the bug. If map_itr was \xe2\x8a\xa5\0 and >>>> _advance_after_end_of_string(&map_itr); looks for the first \0 (the >>>> last char) then increments one, we clearly should be correct! :-( >>>> >>>> >>>>> If we simplify the string a bit: >>>>> char escape_strings[] = "\xe2\x8a\0"; // sizeof it == 4 >>>>> say escape_strings is at address 0x0 >>>>> map_itr = 0; >>>>> map_end = 0 + 4 = 4; >>>>> _advance_after_end_of_string(&map_itr); >>>>> map_itr=3; >>>> >>>> what?! how advance after of string is returning 3??? >>>> it should read 3 times, stop since it found 0, then increment one... >>>> please debug that one :-) >>> >>> static void _advance_after_end_of_string(const char **p_buf) >>> { >>> while (**p_buf != 0) (*p_buf)++; >>> (*p_buf)++; >>> } >>> >>> After the while loop, *p_buf=2 (read 3 times but the 3rd time does not >>> increment it). The second line would make it 3. >>> Please have a look at the attached file. >> >> Sorry for replying to my own mail. >> >> Is it a non-bug? Or am I looking at the wrong spot? >> >> Thanks for the feedbacks. :-) > > I still think there is something wrong there: > > static void _advance_after_end_of_string(const char **p_buf) > { > while (**p_buf != 0) (*p_buf)++; > (*p_buf)++; > } > > with p_buf pointing to "\xe2\x8a\0", after the loop it should > increment 2 times, you'd be at the third character (\0). Then > increment it again and you're at the last \0 (implicit by using "" in > C). Maybe this is the problem? Having the trailing \0, that will have > the next _advance_after_end_of_string() call to stop at the first loop > iteration and then increment once more, going to the 5th byte > (non-existing).
OK. This problem only affects languages whose UTF-8 encodings fall outside the last UTF-8 character of escape_strings. I would guess CJK users are affected somehow. > > If this is the case, then just remove the last \0, or use your patch > (it would make sense), or just check if pointer went out of boundaries > after every call of _advance_after_end_of_string() (would be the most > correct solution, since it would never fail). OK. Checking the boundaries after _advance_after_end_of_string() is more expensive than removing the last \0 or reducing the map_end by one. How could one make the fix upstream? Keeping my own offline patches would be a maintenance nightmare. Best regards, brian > > BR, > > -- > Gustavo Sverzut Barbieri > http://profusion.mobi embedded systems > -------------------------------------- > MSN: barbi...@gmail.com > Skype: gsbarbieri > Mobile: +55 (19) 9225-2202 > -- brian ------------------ Cool-Karaoke - The smallest recording studio, in your palm, open-sourced http://cool-idea.com.tw/ iMaGiNaTiOn iS mOrE iMpOrTaNt tHaN kNoWlEdGe ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ enlightenment-devel mailing list enlightenment-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/enlightenment-devel