2010/1/1 Paul Gilmartin <[email protected]>: > On Thu, 31 Dec 2009 15:28:16 -0600, McKown, John wrote: >> >>I guess the order is aAbBcCdD and so on. >> > Actually, no. Not according to a couple dictionaries I glanced at, > and OpenSolaris: > > 509 $ ls -1 > castor > Castor > castor bean > 510 $ > > What does Linux do? > > The technique appears to be: First sort as if entirely case-insensitive; > only then resolve any ties by considering the case of the characters. > > Which is why I suggested keeping all alphabetic characters in a single > case, followed by a bitmap identifying the case of the characters. > Case-insensitive lookup would ignore the bitmap; case sensitive would > consider it.
You really can't do a proper text sort by ordering individual byte values. The usual approach, pioneered about 15 years ago by some smart people at IBM's National Language Centre in Toronto, and one smart guy in the Quebec government, is to assign separate sort keys to a string, based on the character value, the case, the accent, and "special" weighting. Then you sort those keys instead of the original string. Of course you can precalculate the sort keys or do them on the fly, depending on the performance and storage tradeoffs. Sorting is a cultural thing (where "culture" can include C programming as much as French-in-France, French-in-Canada, English, German, etc.) And each culture may have multiple sort orders appropriate for different circumstances. For example French dictionaries have a different order from French phonebooks; a French phonebook user may expect to find the name duPont under P, not under D. Even in English, where do you expect to find castor-oil in the list above? Surely the hyphen should be given lower weighting than even the letters that follow it, so that it comes out after castor bean. How about Caesar vs Cæsar or Noel vs Noël? Google search knows that they are the same thing, but Gmail flunks the latter in its spelling checker. What does the "ls" command think? Tony H. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

