Thank you for replying. Here "Unicode support" refers to allowing non-ascii characters as input string, which may be Chinese or Japanese.
It is known that by utf-8 encoding a Chinese character is represented with 3 bytes, for example, 0xe8b685. Then there raises a problem, if a utf-8 encoded Chinese character is treated as an array of unsigned char, then suppose we have two string as input of suffix tree, both of them contain one Chinese character, say, 0xe8b685 and 0xe8b686, after construction, we'll got following edges: 86$, 85$, b6, 85$, 86$, e8b6, 85$, 86$, e8b6 will be considered as longest-common-string of the two characters, which is obviously wrong, for it is an illegal utf-8 character. One solution to above problem is to use wchar_t instead of char during suffix tree construction, and modify compare function as well. Any suggestions? On Aug 19, 4:07 am, Miroslav Balaz <gpsla...@googlemail.com> wrote: > What you mean by unicode supprot? > I think only problem is that characters that look the same may have > different encodings. > So it is enough in each compare to use the function that resolves above > problem. > > I made 3 suffix tree implementations and it is easy to change string type in > that. > But my implementations was not good, it was slow, however in O(n). Suffix > array was faster. > > 2009/8/18 Fred <hn.ft.p...@gmail.com> > > > > > Does anybody know, by chance, a suffix tree implementation with > > Unicode support? --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/algogeeks -~----------~----~----~----~------~----~------~--~---