Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: Alternatively we can also think about ignoring chars like . and _ (and possibly more) in the beginning of a file name (e.g. .rockbox is sorted under r). Just an idea. It doesn't really add complexity, but would definitely do more than the setting advertises. But, this is also something windows/nautilus/more do. Huh? Windows Explorer at least *does not* ignore "_" and does not sort "_something" among the "s" but puts it at the top before "a", even before numbers. The ".rockbox" folder is also sorted above the "Music" folder in my "simdisk" directory... It also respects number of spaces and *does not* collapse it to one. Resulting order in a small test: ! Test ! A If it would ignore the second space it would sort "! Test" last. In reply to an earlier mail here, I am one who occasionally (ab)uses this function of sorting to put temporary data at the top, somwhere outside the rest of the list. This is not specific to Rockbox because I don't use it for my music collection but for other files on my computer. I like the described way for file/directory names starting with special characters and think they should be treated like this in Rockbox, too (as they are currently?). And as you said yourself, adding this will do more than "advertised"; the same thing applies to spaces as well (as Dominik pointed out), so the setting either has a wrong name or should be fixed in this regard. My preference would also be to treat leading zeros as intentional, just looking at a list in explorer, "01, 02, 3, 04" seemed weird to me - guess it's the same effect Paul described and reported about his friends. About the mathematical rule: that's true without a doubt but if I read "04 + 03 = 07", I would suspect some weird reasoning behind (something intentional that I only don't know about). One just doesn't write a leading zero if (s)he doesn't have to. To summarise: first a strong wish that more than one space is treated as such and no ignoring of even more "special" chars - also to comply with the setting name. Perhaps don't ignore leading zeros, although I could understand the reason "all major file browsers do, so should we"; so far I found Paul's example in favour of not ignoring them more realistic though. Regards, Marianne.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le wrote: Paul, I think we can agree that there are different cases. There are cases where a leading zero is intentional and there are cases where it's just there (because you used a wrong setting in the ripping software or because you copied the file from somewhere else). The problem is that a single "natural" sort won't fit all. Maybe we should have two natural sort procedures? One would ignore the leading zeroes, i.e. just consider numbers as in mathematics (it would put "007" after "6") and the other wouldn't (it would put "007" before "6"). Neither of your described cases would result in a mix of leading zeros and no leading zeros though. If you set the wrong setting, all your files would have leading zeros and still sort fine. So what's the problem? This is what I don't get - nobody's described a real case where acknowledging leading zeros causes a *bad* sort except the one "mix folder" case where the user chooses to rename some, but not all, of his files.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 19.03.2009 00:13, Paul Louden wrote: Al Le wrote: > My personal position is also that if a user adds a 0 before a number, they expect it to change something, rather than being ignored. I think, on average, more 0s (in lists meant to be sorted) will be intentional than "accidental." Paul, I think we can agree that there are different cases. There are cases where a leading zero is intentional and there are cases where it's just there (because you used a wrong setting in the ripping software or because you copied the file from somewhere else). The problem is that a single "natural" sort won't fit all. Maybe we should have two natural sort procedures? One would ignore the leading zeroes, i.e. just consider numbers as in mathematics (it would put "007" after "6") and the other wouldn't (it would put "007" before "6"). The major file browsers (since produced by a techies :-) operate on just numbers, without special treatment of the leading zeroes.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: Thomas Martitz wrote: I've uploaded the patch to FS#10030. Any final comments? If not, I consider to commit it before release. I'm not sure whether it should be backported too, though. Please also read the recent comments on the task. In my opinion, we should disable the option and revert to basic ASCII sort for this release, and make "natural" sorting a feature of the next one. We shouldn't change the "default" sort method in a release version until we have a more or less final algorithm, and it certainly seems like even outside of my own objections, there's still several opinions on how this should go. People have got by with ASCII before, they can wait with it 3 more months (or use current builds) until we've settled our algorithm issue.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: I've uploaded the patch to FS#10030. Any final comments? If not, I consider to commit it before release. I'm not sure whether it should be backported too, though. Please also read the recent comments on the task.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Thu, Mar 19, 2009 at 6:47 PM, Thomas Martitz wrote: > The problem with his proposal was, that it looked 3 times at every char. > That's hardly optimal. Which doesn't imply that it can't be done better. An inefficient solution is therefore no reasoning against fixing a feature. > And I think that a file-listing is relatively timing critical. I wouldn't > want to have noticeably delay just due to sorting on each folder I enter. Then you need to define what time-critical means. For me, this is interrupting some real-time process (playback, communication with a chip on a bus, data corruption etc). A file browser should be fast, but it's definitely not time-critical -- nothing bad will happen if it takes slightly longer. Except maybe the user getting annoyed. But I'd rather call that time-relevant. It's not critical. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
codemonkey wrote: > Are you guys aware that there's a quasi-standard regarding this in > the GNU libraries? See the following excerpt from Fedora "info ls" > and "man strverscmp". > > ~ray > > PS: I've found that "ls -v" works well for sorting MP3s with track > numbering, etc. I don't know if it handles all of the cases described in > this thread though. Maybe GNU's implementation is worth borrowing for > rockbox? > > -- > > $ info ls > > (...excerpt...) > > 10.1.4 More details about version sort > -- > > The version sort takes into account the fact that file names frequently > include indices or version numbers. Standard sorting functions usually > do not produce the ordering that people expect because comparisons are > made on a character-by-character basis. The version sort addresses > this problem, and is especially useful when browsing directories that > contain many files with indices/version numbers in their names: > > $ ls -1$ ls -1v > foo.zml-1.gz foo.zml-1.gz > foo.zml-100.gz foo.zml-2.gz > foo.zml-12.gz foo.zml-6.gz > foo.zml-13.gz foo.zml-12.gz > foo.zml-2.gz foo.zml-13.gz > foo.zml-25.gz foo.zml-25.gz > foo.zml-6.gz foo.zml-100.gz > >Note also that numeric parts with leading zeros are considered as > fractional one: > > $ ls -1$ ls -1v > abc-1.007.tgz abc-1.007.tgz > abc-1.012b.tgz abc-1.01a.tgz > abc-1.01a.tgz abc-1.012b.tgz > >This functionality is implemented using the `strverscmp' function. > > -- > > $ man strverscmp > > STRVERSCMP(3) Linux Programmer’s Manual > STRVERSCMP(3) > > NAME >strverscmp - compare two version strings > > SYNOPSIS >#define _GNU_SOURCE >#include > >int strverscmp(const char *s1, const char *s2); > > DESCRIPTION >Often one has files jan1, jan2, ..., jan9, jan10, ... and it > feels >wrong when ls(1) orders them jan1, jan10, ..., jan2, ..., jan9. In >order to rectify this, GNU introduced the -v option to ls(1), which > is >implemented using versionsort(3), which again uses strverscmp(). > >Thus, the task of strverscmp() is to compare two strings and > find >the "right" order, while strcmp(3) only finds the lexicographic > order. >This function does not use the locale category LC_COLLATE, so is > meant >mostly for situations where the strings are expected to be in > ASCII. > >What this function does is the following. If both strings are > equal, >return 0. Otherwise find the position between two bytes with the >property that before it both strings are equal, while directly after > it >there is a difference. Find the largest consecutive digit strings >containing (or starting at, or ending at) this position. If one or > both >of these is empty, then return what strcmp(3) would have returned >(numerical ordering of byte values). Otherwise, compare both digit >strings numerically, where digit strings with one or more leading > zeroes >are interpreted as if they have a decimal point in front (so that in >particular digit strings with more leading zeroes come before digit >strings with fewer leading zeroes). Thus, the ordering is 000, > 00, >01, 010, 09, 0, 1, 9, 10. > > RETURN VALUE >The strverscmp() function returns an integer less than, equal to, or >greater than zero if s1 is found, respectively, to be earlier than, >equal to, or later than s2. > > CONFORMING TO >This function is a GNU extension. > > SEE ALSO >rename(1), strcasecmp(3), strcmp(3), strcoll(3), > feature_test_macros(7) > > GNU 2001-12-19 > STRVERSCMP(3) > > Seems very close. My understanding is natural sort would interpret as: 000, 00, 0, 01, 1, 09, 9, 010, 10.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
codemonkey wrote: Are you guys aware that there's a quasi-standard regarding this in the GNU libraries? See the following excerpt from Fedora "info ls" and "man strverscmp". ~ray PS: I've found that "ls -v" works well for sorting MP3s with track numbering, etc. I don't know if it handles all of the cases described in this thread though. Maybe GNU's implementation is worth borrowing for rockbox? -- $ info ls (...excerpt...) 10.1.4 More details about version sort -- The version sort takes into account the fact that file names frequently include indices or version numbers. Standard sorting functions usually do not produce the ordering that people expect because comparisons are made on a character-by-character basis. The version sort addresses this problem, and is especially useful when browsing directories that contain many files with indices/version numbers in their names: $ ls -1$ ls -1v foo.zml-1.gz foo.zml-1.gz foo.zml-100.gz foo.zml-2.gz foo.zml-12.gz foo.zml-6.gz foo.zml-13.gz foo.zml-12.gz foo.zml-2.gz foo.zml-13.gz foo.zml-25.gz foo.zml-25.gz foo.zml-6.gz foo.zml-100.gz Note also that numeric parts with leading zeros are considered as fractional one: $ ls -1$ ls -1v abc-1.007.tgz abc-1.007.tgz abc-1.012b.tgz abc-1.01a.tgz abc-1.01a.tgz abc-1.012b.tgz This functionality is implemented using the `strverscmp' function. -- $ man strverscmp STRVERSCMP(3) Linux Programmer’s Manual STRVERSCMP(3) NAME strverscmp - compare two version strings SYNOPSIS #define _GNU_SOURCE #include int strverscmp(const char *s1, const char *s2); DESCRIPTION Often one has files jan1, jan2, ..., jan9, jan10, ... and it feels wrong when ls(1) orders them jan1, jan10, ..., jan2, ..., jan9. In order to rectify this, GNU introduced the -v option to ls(1), which is implemented using versionsort(3), which again uses strverscmp(). Thus, the task of strverscmp() is to compare two strings and find the "right" order, while strcmp(3) only finds the lexicographic order. This function does not use the locale category LC_COLLATE, so is meant mostly for situations where the strings are expected to be in ASCII. What this function does is the following. If both strings are equal, return 0. Otherwise find the position between two bytes with the property that before it both strings are equal, while directly after it there is a difference. Find the largest consecutive digit strings containing (or starting at, or ending at) this position. If one or both of these is empty, then return what strcmp(3) would have returned (numerical ordering of byte values). Otherwise, compare both digit strings numerically, where digit strings with one or more leading zeroes are interpreted as if they have a decimal point in front (so that in particular digit strings with more leading zeroes come before digit strings with fewer leading zeroes). Thus, the ordering is 000, 00, 01, 010, 09, 0, 1, 9, 10. RETURN VALUE The strverscmp() function returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be earlier than, equal to, or later than s2. CONFORMING TO This function is a GNU extension. SEE ALSO rename(1), strcasecmp(3), strcmp(3), strcoll(3), feature_test_macros(7) GNU 2001-12-19 STRVERSCMP(3) Sounds exactly like strnatcmp. It behaves the same for the two examples.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Are you guys aware that there's a quasi-standard regarding this in the GNU libraries? See the following excerpt from Fedora "info ls" and "man strverscmp". ~ray PS: I've found that "ls -v" works well for sorting MP3s with track numbering, etc. I don't know if it handles all of the cases described in this thread though. Maybe GNU's implementation is worth borrowing for rockbox? -- $ info ls (...excerpt...) 10.1.4 More details about version sort -- The version sort takes into account the fact that file names frequently include indices or version numbers. Standard sorting functions usually do not produce the ordering that people expect because comparisons are made on a character-by-character basis. The version sort addresses this problem, and is especially useful when browsing directories that contain many files with indices/version numbers in their names: $ ls -1$ ls -1v foo.zml-1.gz foo.zml-1.gz foo.zml-100.gz foo.zml-2.gz foo.zml-12.gz foo.zml-6.gz foo.zml-13.gz foo.zml-12.gz foo.zml-2.gz foo.zml-13.gz foo.zml-25.gz foo.zml-25.gz foo.zml-6.gz foo.zml-100.gz Note also that numeric parts with leading zeros are considered as fractional one: $ ls -1$ ls -1v abc-1.007.tgz abc-1.007.tgz abc-1.012b.tgz abc-1.01a.tgz abc-1.01a.tgz abc-1.012b.tgz This functionality is implemented using the `strverscmp' function. -- $ man strverscmp STRVERSCMP(3) Linux Programmer’s Manual STRVERSCMP(3) NAME strverscmp - compare two version strings SYNOPSIS #define _GNU_SOURCE #include int strverscmp(const char *s1, const char *s2); DESCRIPTION Often one has files jan1, jan2, ..., jan9, jan10, ... and it feels wrong when ls(1) orders them jan1, jan10, ..., jan2, ..., jan9. In order to rectify this, GNU introduced the -v option to ls(1), which is implemented using versionsort(3), which again uses strverscmp(). Thus, the task of strverscmp() is to compare two strings and find the "right" order, while strcmp(3) only finds the lexicographic order. This function does not use the locale category LC_COLLATE, so is meant mostly for situations where the strings are expected to be in ASCII. What this function does is the following. If both strings are equal, return 0. Otherwise find the position between two bytes with the property that before it both strings are equal, while directly after it there is a difference. Find the largest consecutive digit strings containing (or starting at, or ending at) this position. If one or both of these is empty, then return what strcmp(3) would have returned (numerical ordering of byte values). Otherwise, compare both digit strings numerically, where digit strings with one or more leading zeroes are interpreted as if they have a decimal point in front (so that in particular digit strings with more leading zeroes come before digit strings with fewer leading zeroes). Thus, the ordering is 000, 00, 01, 010, 09, 0, 1, 9, 10. RETURN VALUE The strverscmp() function returns an integer less than, equal to, or greater than zero if s1 is found, respectively, to be earlier than, equal to, or later than s2. CONFORMING TO This function is a GNU extension. SEE ALSO rename(1), strcasecmp(3), strcmp(3), strcoll(3), feature_test_macros(7) GNU 2001-12-19 STRVERSCMP(3)
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: On Thu, Mar 19, 2009 at 4:04 PM, Thomas Martitz wrote: Now imagine this for every char in a string, and for every string in a file list (with some 100 files). It's three-times (or even more) more complexity than just. while (is_zero(a)) a = next; This "natural sorting" is more complex than ASCII sorting anyway. And what's the added complexity by simply going back by one to get the last 0? That's only an added check, and everything else only if you hit a digit. if(is_digit(a)) { /* is_digit() had a hit */ while(is_zero(a)) a = next; /* if current one is not a digit anymore we've just skipped the value 0 and need to take that back to not remove that value */ if(!is_digit(a)) /* no need to check if there is a prev value -- we skipped at least one 0. If not we still have a digit. */ a = prev; } That's how did it now (see FS#10030). We're on embedded, and thus slow systems. Your would surely work well on a desktop app, but for mp3-players we need fast and small code. The gain has to justify the code, and I don't think it does it in this example. We're talking about high-level functionality here. There's nothing timing-critical, and even on the Archos players I'm confident doing this properly wouldn't cause a serious slowdown compared to the current state, but feel free to measure it and present numbers. We can play mp3 files at <45MHz (at least on coldfire, don't have all the numbers at hand right now) which is much more calculating-intensive than doing a few additional comparisons on a list of maybe some 100 files. You're basically saying we shouldn't fix the functionality because it's too expensive runtime-wise. If it's really too expensive to have a functionality working properly (which I doubt) we shouldn't ship the functionality at all. - Dominik The problem with his proposal was, that it looked 3 times at every char. That's hardly optimal. The original algorithm looks only once through each char. And the version you proposed does it too (except in the case where it goes back 1 char). That's why it's not much more expensive than normal strcmp. And I think that a file-listing is relatively timing critical. I wouldn't want to have noticeably delay just due to sorting on each folder I enter.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Thu, Mar 19, 2009 at 9:21 AM, Paul Louden wrote: > I've stated my position several times: I think we should decide whether we > want to mimic the file browsers or not. If we do, I think we should mimic > all their sorting quirks that we can, rather than suggest we're "like" them > but with our own choices as to where to go our own way. Well, I think we still have two options here: 1. completely mimic the file browser. In this case I agree with you that we should mimic all quirks the browser used as reference has. 2. only mimic the browser in regards of numbers (or any other subset). I think this is a viable alternative, though not all might agree. Doing it this way we're not "like" the reference browser but simply doing a (the most commonly noticed?) subset. If we chose to minic a browser completely we immediately come across the question of which browser to use as reference. I'm quite sure Explorer / Konqueror / Nautilus behave differently in regards of prefixes like space, dot and underscore. Which is a reason why I'd go for mimicing the part of this "natural sorting" that's common among them. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Thu, Mar 19, 2009 at 4:04 PM, Thomas Martitz wrote: > Now imagine this for every char in a string, and for every string in a file > list (with some 100 files). It's three-times (or even more) more complexity > than just. > while (is_zero(a)) > a = next; This "natural sorting" is more complex than ASCII sorting anyway. And what's the added complexity by simply going back by one to get the last 0? That's only an added check, and everything else only if you hit a digit. if(is_digit(a)) { /* is_digit() had a hit */ while(is_zero(a)) a = next; /* if current one is not a digit anymore we've just skipped the value 0 and need to take that back to not remove that value */ if(!is_digit(a)) /* no need to check if there is a prev value -- we skipped at least one 0. If not we still have a digit. */ a = prev; } > We're on embedded, and thus slow systems. Your would surely work well on a > desktop app, but for mp3-players we need fast and small code. The gain has > to justify the code, and I don't think it does it in this example. We're talking about high-level functionality here. There's nothing timing-critical, and even on the Archos players I'm confident doing this properly wouldn't cause a serious slowdown compared to the current state, but feel free to measure it and present numbers. We can play mp3 files at <45MHz (at least on coldfire, don't have all the numbers at hand right now) which is much more calculating-intensive than doing a few additional comparisons on a list of maybe some 100 files. You're basically saying we shouldn't fix the functionality because it's too expensive runtime-wise. If it's really too expensive to have a functionality working properly (which I doubt) we shouldn't ship the functionality at all. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: Ok, I've implemented ignoring very leading zeros now, and fixed FS#10031, in my local repo. It could be committed, I think. It seems the consensus is reached. Alternatively we can also think about ignoring chars like . and _ (and possibly more) in the beginning of a file name (e.g. .rockbox is sorted under r). Just an idea. It doesn't really add complexity, but would definitely do more than the setting advertises. But, this is also something windows/nautilus/more do. I've uploaded the patch to FS#10030.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Thu, Mar 19, 2009 at 4:34 PM, Thomas Martitz wrote: >> I've found a simpler solution for this. Trying the code raises the >> following problem: >> >> 00 < 0b < 01 < 1 [...] > Nautilus has this problem too. I don't know what windows does in this case. I don't see any problem here. You just need to distinguish between strings and values on sorting: a. 00 -> value 0 b. 0b -> value 0, followed by string "b" c. 01 -> value 1 d. 1 -> value 1 so while the strcmp() is the tie-breaker between c. and d., sorting of a. and b. is also rather simple -- you sort by the leading numbers first. This makes a. and b. come before the others. Then, as a. and b. are a "starting with zero"-group you have to resort that again as there is a tie with the numbers. Thus b. comes after a. That's how ASCII-sorting would do it (and also how windows explorer does it). You can't simply sort by leading numbers and ignore that the string has other characters in it too. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Mike Holden wrote: Thomas Martitz wrote: Thomas Martitz wrote: I've found a simpler solution for this. Trying the code raises the following problem: 00 < 0b < 01 < 1 Zeros before except the final zeros are ignored, and the final zero before characters is not ignored. But the leading zeros of numbers are (so that 01 is 1). Obviously 0 sorts before 1. Nautilus has this problem too. I don't know what windows does in this case. I thought we'd already established that those 4 files are in the right order? Windows orders them as below, which is the same as above: 00 0b 01 1 I didn't establish anything, the mail about nautilus was sent before I received yours. But if Nautilus and Windows sort this way, (and we want to mimic it), then it's right, indeed.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: > Thomas Martitz wrote: >> I've found a simpler solution for this. Trying the code raises the >> following problem: >> >> 00 < 0b < 01 < 1 >> >> Zeros before except the final zeros are ignored, and the final zero >> before characters is not ignored. But the leading zeros of numbers are >> (so that 01 is 1). Obviously 0 sorts before 1. > > > Nautilus has this problem too. I don't know what windows does in this > case. > I thought we'd already established that those 4 files are in the right order? Windows orders them as below, which is the same as above: 00 0b 01 1 -- Mike Holden http://www.by-ang.com - the place to shop for all manner of hand crafted items, including Jewellery, Greetings Cards and Gifts
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: I've found a simpler solution for this. Trying the code raises the following problem: 00 < 0b < 01 < 1 Zeros before except the final zeros are ignored, and the final zero before characters is not ignored. But the leading zeros of numbers are (so that 01 is 1). Obviously 0 sorts before 1. Nautilus has this problem too. I don't know what windows does in this case.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: > Bryan VanDyke wrote: >> Thomas Martitz wrote: >> >>> Bryan VanDyke wrote: >>> Thomas Martitz wrote: > Linus Nielsen Feltzing wrote: > >> Mike Holden wrote: >> >>> Maybe leading zeros should only be stripped if another digit follows >>> them? >>> >>> I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists >>> that I >>> have created (as opposed to original artist albums), and the leading >>> zerozero is deliberately there to sort them at the top. >>> >> That's an interesting observation. I believe leading zeroes are >> treated like whitespace in the current code, but in this case I think >> that the final zero should be kept. >> >> Linus >> > That's not trivial, and adds complexity. You basically need to > look at > the current, the next, and one more for this, instead of just the > current char. > > Actually it not that bad. Pseudo code: get current get next while (current != null && next != null && current == '0' && next is a number) { current = next next = get next } >>> Now imagine this for every char in a string, and for every string in a >>> file list (with some 100 files). It's three-times (or even more) more >>> complexity than just. >>> while (is_zero(a)) >>>a = next; >>> >>> We're on embedded, and thus slow systems. Your would surely work well on >>> a desktop app, but for mp3-players we need fast and small code. The gain >>> has to justify the code, and I don't think it does it in this example. >>> >>> >> >> What about something like this. Taking in consideration the isspace >> function/comparison was removed? And isdigit is supposed to give nonzero >> on nodigit values. >> >> /* skip over leading zeros */ >> while ('0' == ca && nat_isdigit(ca_next) ) >> { >> ca = to_int(a[++ai]); >> ca_next = to_int(a[ai+1]); >> } >> >> >> > I've found a simpler solution for this. Trying the code raises the > following problem: > > 00 < 0b < 01 < 1 That look right. Zero is a valid number. A leading zero before a zero is still zero. 00 -> 0 0b -> 0b 01 -> 1 1 -> 1 01 == 1 strcmp -> 01 < 1 right? > > Zeros before except the final zeros are ignored, and the final zero > before characters is not ignored. But the leading zeros of numbers are > (so that 01 is 1). Obviously 0 sorts before 1. >
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: > Sounds to me that you're better off using ascii sort. Well that is what I currently use, but there's no reason why natural sorting shouldn't be appropriate if it works in the right way! > Windows does it in the same way as nautilus and other major file > browsers. It ignores leading zeros. Well it doesn't completely ignore them, they have some significance (see my other email a short while ago). -- Mike Holden http://www.by-ang.com - the place to shop for all manner of hand crafted items, including Jewellery, Greetings Cards and Gifts
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Bryan VanDyke wrote: Thomas Martitz wrote: Bryan VanDyke wrote: Thomas Martitz wrote: Linus Nielsen Feltzing wrote: Mike Holden wrote: Maybe leading zeros should only be stripped if another digit follows them? I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I have created (as opposed to original artist albums), and the leading zerozero is deliberately there to sort them at the top. That's an interesting observation. I believe leading zeroes are treated like whitespace in the current code, but in this case I think that the final zero should be kept. Linus That's not trivial, and adds complexity. You basically need to look at the current, the next, and one more for this, instead of just the current char. Actually it not that bad. Pseudo code: get current get next while (current != null && next != null && current == '0' && next is a number) { current = next next = get next } Now imagine this for every char in a string, and for every string in a file list (with some 100 files). It's three-times (or even more) more complexity than just. while (is_zero(a)) a = next; We're on embedded, and thus slow systems. Your would surely work well on a desktop app, but for mp3-players we need fast and small code. The gain has to justify the code, and I don't think it does it in this example. What about something like this. Taking in consideration the isspace function/comparison was removed? And isdigit is supposed to give nonzero on nodigit values. /* skip over leading zeros */ while ('0' == ca && nat_isdigit(ca_next) ) { ca = to_int(a[++ai]); ca_next = to_int(a[ai+1]); } I've found a simpler solution for this. Trying the code raises the following problem: 00 < 0b < 01 < 1 Zeros before except the final zeros are ignored, and the final zero before characters is not ignored. But the leading zeros of numbers are (so that 01 is 1). Obviously 0 sorts before 1.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: > Bryan VanDyke wrote: >> Thomas Martitz wrote: >> >>> Linus Nielsen Feltzing wrote: >>> Mike Holden wrote: > Maybe leading zeros should only be stripped if another digit follows > them? > > I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists > that I > have created (as opposed to original artist albums), and the leading > zerozero is deliberately there to sort them at the top. > That's an interesting observation. I believe leading zeroes are treated like whitespace in the current code, but in this case I think that the final zero should be kept. Linus >>> That's not trivial, and adds complexity. You basically need to look at >>> the current, the next, and one more for this, instead of just the >>> current char. >>> >>> >> >> Actually it not that bad. >> >> Pseudo code: >> >> get current >> get next >> while (current != null && next != null && current == '0' && next is a >> number) >> { >> current = next >> next = get next >> } >> >> >> > Now imagine this for every char in a string, and for every string in a > file list (with some 100 files). It's three-times (or even more) more > complexity than just. > while (is_zero(a)) >a = next; > > We're on embedded, and thus slow systems. Your would surely work well on > a desktop app, but for mp3-players we need fast and small code. The gain > has to justify the code, and I don't think it does it in this example. > What about something like this. Taking in consideration the isspace function/comparison was removed? And isdigit is supposed to give nonzero on nodigit values. /* skip over leading zeros */ while ('0' == ca && nat_isdigit(ca_next) ) { ca = to_int(a[++ai]); ca_next = to_int(a[ai+1]); }
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: Mike Holden wrote: Thomas Martitz wrote: After this discussion and the ones in IRC, it seems to me that the majority is in favor of ignoring leading zeros. This would also match with Nautilus' and Windows Explorer's sorting. And we can do that. Give that the usual browsers do it that way, it's also what the user expects, so it can't be bad. FS#10031 needs changing the algorithm anyway. So, should we do that? It at least seems to be the opinion of most people. Just had a quick look in My Computer on an XP box to see how this does it. 1. 001 sorts before 01 and 01 sorts before 1, giving the conclusion that more leading zeroes sorts before less leading zeroes, where the underlying number is the same (i.e. 000x < 00x for any number x). 2. 1 sorts before 2, 2 before 10, 10 befoer 11 etc, as expected giving numeric sorting. 3. 001 sorts before 10 and 010, again giving expected sorting for the numeric part. 4. Introducing letters into the equation, we can see that 00A sorts before 00aa, 001 and 1. This satisfies my expectation that leading zeroes before letters should sort first in the list, and not be sorted among the letter part only. All of these individual items line up to give a file listing that doesn't produce any surpries for me, so I would be happy with this set of rules. This is what we'll be doing too. comparing 001 and 1, will yield 001 < 1, because if strnatcmp sorts the same, strcmp is asked. Err, I guess point 4) isn't covered with my modification.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Bryan VanDyke wrote: Thomas Martitz wrote: Linus Nielsen Feltzing wrote: Mike Holden wrote: Maybe leading zeros should only be stripped if another digit follows them? I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I have created (as opposed to original artist albums), and the leading zerozero is deliberately there to sort them at the top. That's an interesting observation. I believe leading zeroes are treated like whitespace in the current code, but in this case I think that the final zero should be kept. Linus That's not trivial, and adds complexity. You basically need to look at the current, the next, and one more for this, instead of just the current char. Actually it not that bad. Pseudo code: get current get next while (current != null && next != null && current == '0' && next is a number) { current = next next = get next } Now imagine this for every char in a string, and for every string in a file list (with some 100 files). It's three-times (or even more) more complexity than just. while (is_zero(a)) a = next; We're on embedded, and thus slow systems. Your would surely work well on a desktop app, but for mp3-players we need fast and small code. The gain has to justify the code, and I don't think it does it in this example.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Linus Nielsen Feltzing wrote: Bryan VanDyke wrote: 1. Numbers sort before Non-numbers. - Leading zeros are striped. A leading zero on a zero is still zero. - 000 becomes 0. - Some of the code that has been used has trouble with this. 2. Lesser number before greater. - 1,2,3,4 etc 3. Anything else strcmp. Sounds simple and sane, and seems to be the way Windows Explorer works as well. Linus Well, this is what SVN does, currently. And if we we want 02 after 1, just a (relatively) small modification is needed, without messing up decimal numbers like 1.02 and 1.1
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Mike Holden wrote: Thomas Martitz wrote: After this discussion and the ones in IRC, it seems to me that the majority is in favor of ignoring leading zeros. This would also match with Nautilus' and Windows Explorer's sorting. And we can do that. Give that the usual browsers do it that way, it's also what the user expects, so it can't be bad. FS#10031 needs changing the algorithm anyway. So, should we do that? It at least seems to be the opinion of most people. Just had a quick look in My Computer on an XP box to see how this does it. 1. 001 sorts before 01 and 01 sorts before 1, giving the conclusion that more leading zeroes sorts before less leading zeroes, where the underlying number is the same (i.e. 000x < 00x for any number x). 2. 1 sorts before 2, 2 before 10, 10 befoer 11 etc, as expected giving numeric sorting. 3. 001 sorts before 10 and 010, again giving expected sorting for the numeric part. 4. Introducing letters into the equation, we can see that 00A sorts before 00aa, 001 and 1. This satisfies my expectation that leading zeroes before letters should sort first in the list, and not be sorted among the letter part only. All of these individual items line up to give a file listing that doesn't produce any surpries for me, so I would be happy with this set of rules. This is what we'll be doing too. comparing 001 and 1, will yield 001 < 1, because if strnatcmp sorts the same, strcmp is asked.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Bryan VanDyke wrote: 1. Numbers sort before Non-numbers. - Leading zeros are striped. A leading zero on a zero is still zero. - 000 becomes 0. - Some of the code that has been used has trouble with this. 2. Lesser number before greater. - 1,2,3,4 etc 3. Anything else strcmp. Sounds simple and sane, and seems to be the way Windows Explorer works as well. Linus
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: > After this discussion and the ones in IRC, it seems to me that the > majority is in favor of ignoring leading zeros. This would also match > with Nautilus' and Windows Explorer's sorting. > > And we can do that. Give that the usual browsers do it that way, it's > also what the user expects, so it can't be bad. FS#10031 needs changing > the algorithm anyway. > > So, should we do that? It at least seems to be the opinion of most people. > Just had a quick look in My Computer on an XP box to see how this does it. 1. 001 sorts before 01 and 01 sorts before 1, giving the conclusion that more leading zeroes sorts before less leading zeroes, where the underlying number is the same (i.e. 000x < 00x for any number x). 2. 1 sorts before 2, 2 before 10, 10 befoer 11 etc, as expected giving numeric sorting. 3. 001 sorts before 10 and 010, again giving expected sorting for the numeric part. 4. Introducing letters into the equation, we can see that 00A sorts before 00aa, 001 and 1. This satisfies my expectation that leading zeroes before letters should sort first in the list, and not be sorted among the letter part only. All of these individual items line up to give a file listing that doesn't produce any surpries for me, so I would be happy with this set of rules. -- Mike Holden http://www.by-ang.com - the place to shop for all manner of hand crafted items, including Jewellery, Greetings Cards and Gifts
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: > Linus Nielsen Feltzing wrote: >> Mike Holden wrote: >>> Maybe leading zeros should only be stripped if another digit follows >>> them? >>> >>> I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I >>> have created (as opposed to original artist albums), and the leading >>> zerozero is deliberately there to sort them at the top. >> >> That's an interesting observation. I believe leading zeroes are >> treated like whitespace in the current code, but in this case I think >> that the final zero should be kept. >> >> Linus > That's not trivial, and adds complexity. You basically need to look at > the current, the next, and one more for this, instead of just the > current char. > Actually it not that bad. Pseudo code: get current get next while (current != null && next != null && current == '0' && next is a number) { current = next next = get next }
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: > Hi, > > I started wondering how the value "as whole numbers" for the setting > "interpret numbers while sorting" is intended to work. Currently it > seems to get changed in svn quite often. However, I haven't seen a > consensus how this feature is supposed to work (read: sort) gathered, > especially before committing. A recent discussion is here: > http://www.rockbox.org/irc/log-20090317#17:53:35 > > Maybe I've missed such a consensus -- in this case someone please > point me to the right direction and ignore this mail :) Changing the > behaviour of this setting frequently is a rather bad thing IMO. We > need to specify how we want it to work and implement it that way. > Doing this kind of "discussion in svn" is a bad thing and can only > lead to confusion among users. We didn't do this for the "study mode" > feature either, even if there was a consensus to change it. > > Now, how should this feature sort? From my point of view, I'd expect it to > - treat digits as numbers. A value of "00" equals to zero and thus > gets sorted before the number 1, regardless if that is "1", "01" or > "001". Completely skipping the zero (as it's only leading zeros) is as > broken as to not strip leading zeros -- "003" should equal to 3 and > "01" to 1, thus the latter sorting before the former. A situation > where a folder can contain files starting with "02" and "4" the same > time is something that could happen and still not being intentional > (just think of copying files from various albums to a mix folder). > Either treat digits as number or don't treat them as numbers at all. > - Spaces shouldn't get collapsed. A space is a space, and "interpret > numbers" doesn't tell anything about spaces. At least at some point > during the lifetime of this setting spaces were collapsed. Nothing > that is a number ... > > This still leaves some open issues I'm not sure how to deal about: > - how are floating-point numbers to be treated? "1.001" is smaller as > "1.01" when treating as numbers, so on the one hand I'd expect them to > sort that way. On the other hand, recognizing the dot as decimal > separator is broken as well -- not all languages use it as decimal > separator (like german using the comma). Stopping the number-treating > at dots is also kinda broken -- how should a naming be handled as > "discnumber.tracknumber", i.e. like "1.2", "1.10" -- which one has to > be sorted first? The best solution here might be to treat all numbers > as single numbers, regardless if they might be floating point numbers > -- I guess it's more common to have a "1.3" numbering to mark > discnumber.track instead of a floating point number "1.003". > > I'm pretty sure I've missed some of my points right now :) What do > people think about this sorting thing? > > > - Dominik > I think it should just be the simplest and easiest to understand. Any consecutive run of numbers [0-9] are treated as a its value for sorting purposes. This means any non-digit is treated like a separator. Which would include punctuation, spaces, etc. This also avoid trying to figure out what the person meant by using a period. Was it a separator, equivalent to US comma, region setting, real number, etc? That's just a road nobody is going to agree on. Same thing if a person is using punctuation, leading zeros, etc to control the sort order. There's no way to read the persons mind on what they intended. In all likelihood they're going to use the ASCII sort anyways. The various implementation that have been used by RB have tried to eat spaces so a 1 a001 a 01 Are all equal to a1. I say throw that out too. 1. Numbers sort before Non-numbers. - Leading zeros are striped. A leading zero on a zero is still zero. - 000 becomes 0. - Some of the code that has been used has trouble with this. 2. Lesser number before greater. - 1,2,3,4 etc 3. Anything else strcmp.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: Dominik Riebeling schrieb: Maybe I've missed such a consensus -- in this case someone please point me to the right direction and ignore this mail :) After this discussion and the ones in IRC, it seems to me that the majority is in favor of ignoring leading zeros. This would also match with Nautilus' and Windows Explorer's sorting. And we can do that. Give that the usual browsers do it that way, it's also what the user expects, so it can't be bad. FS#10031 needs changing the algorithm anyway. So, should we do that? It at least seems to be the opinion of most people. Ok, I've implemented ignoring very leading zeros now, and fixed FS#10031, in my local repo. It could be committed, I think. It seems the consensus is reached. Alternatively we can also think about ignoring chars like . and _ (and possibly more) in the beginning of a file name (e.g. .rockbox is sorted under r). Just an idea. It doesn't really add complexity, but would definitely do more than the setting advertises. But, this is also something windows/nautilus/more do.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Linus Nielsen Feltzing wrote: Mike Holden wrote: Maybe leading zeros should only be stripped if another digit follows them? I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I have created (as opposed to original artist albums), and the leading zerozero is deliberately there to sort them at the top. That's an interesting observation. I believe leading zeroes are treated like whitespace in the current code, but in this case I think that the final zero should be kept. Linus That's not trivial, and adds complexity. You basically need to look at the current, the next, and one more for this, instead of just the current char.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Mike Holden wrote: Maybe leading zeros should only be stripped if another digit follows them? I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I have created (as opposed to original artist albums), and the leading zerozero is deliberately there to sort them at the top. That's an interesting observation. I believe leading zeroes are treated like whitespace in the current code, but in this case I think that the final zero should be kept. Linus
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Mike Holden wrote: Maybe leading zeros should only be stripped if another digit follows them? I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I have created (as opposed to original artist albums), and the leading zerozero is deliberately there to sort them at the top. At the moment using natural sorting, 00RockFaves.m3u is sorted among the "R" entries, totally defeating my intention in choosing that naming (and not "natural" to my view!). 01Rock should sort before 02Rock, agreed, but should 01Rock sort before A or before S? Sounds to me that you're better off using ascii sort. In any case, it would be interesting to see how windows does the sorting, as the most users will be used to that way of doing it. Maybe many, but we shouldn't assume that is the case. I personally have no idea how Windows does it, and I wouldn't necessarily agree that just because MS does it that that is the _right_ way to do it. Windows does it in the same way as nautilus and other major file browsers. It ignores leading zeros.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: > well, treating a number as such includes stripping leading zeros from > it, at least from my understanding. It won't do any harm on properly > named files, and I don't see a reason why a user would want to prefix > with 0 just to change sorting. Maybe leading zeros should only be stripped if another digit follows them? I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I have created (as opposed to original artist albums), and the leading zerozero is deliberately there to sort them at the top. At the moment using natural sorting, 00RockFaves.m3u is sorted among the "R" entries, totally defeating my intention in choosing that naming (and not "natural" to my view!). 01Rock should sort before 02Rock, agreed, but should 01Rock sort before A or before S? > In any case, it would be interesting to see how windows does the > sorting, as the most users will be used to that way of doing it. Maybe many, but we shouldn't assume that is the case. I personally have no idea how Windows does it, and I wouldn't necessarily agree that just because MS does it that that is the _right_ way to do it. -- Mike Holden http://www.by-ang.com - the place to shop for all manner of hand crafted items, including Jewellery, Greetings Cards and Gifts
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Paul Louden wrote: I've stated my position several times: I think we should decide whether we want to mimic the file browsers or not. If we do, I think we should mimic all their sorting quirks that we can, rather than suggest we're "like" them but with our own choices as to where to go our own way. Sure. I don't see a problem with this, as long as it doesn't make the code overly complicated or slow. Linus
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Linus Nielsen Feltzing wrote: Or, instead of doing a poll, why not do it in the same way as the major file browsers do? After all, "normal" people would probably expect Rockbox to sort the files in the same order as the computer file browser does, wouldn't they? I've stated my position several times: I think we should decide whether we want to mimic the file browsers or not. If we do, I think we should mimic all their sorting quirks that we can, rather than suggest we're "like" them but with our own choices as to where to go our own way. If we're not going to mimic them, I think respecting intentional 0s is the way to go. Personal opinion, yes, but at least some people I know seem to think "04" is not the same as "4" linguistically, or really at all outside of actually doing math on it.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Paul Louden wrote: The problem is, now you're arguing "mathematical rules." We've already established people don't think in mathematical rules. I doubt people see "04" and think "four". They think "oh-four." The zero is not an insignificant and ignored digit in the way people speak, read, or think the number. Except in math. But we're talking "normal people" here. You have a point there. I'm afraid most of my friends don't count as "normal" people. ;-) Instead of us trying to think about them, if we're going to base this on "normal people" let's do a poll. At least this way we're not extrapolating our opinion on them based on *mathematics*, something few people think in. Or, instead of doing a poll, why not do it in the same way as the major file browsers do? After all, "normal" people would probably expect Rockbox to sort the files in the same order as the computer file browser does, wouldn't they? Linus
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Linus Nielsen Feltzing wrote: If it sorts 007 after 6, I fail to see how it would be surprising to the user in any way. It is after all a well-known mathematical rule, and a rule that the major file browsers follow. If we claim to sort numbers, we should do so, and not change the fundamental rules of mathematics. Just as a counterpoint to this - People don't normally put 0s before a number. I would expect a lot of people would think "007" is "00 and 7" not "7" and that leading zeros are "not part of the number." I know an informal study of "all of my friends online right now" (none of whom are computers scientists and many of whom are artists or fairly nontechnical people) as told me that they expect that "04" would come before "2" because of the zero. It was presented this way "if you had a list 2, 3, 4, 5, and you were to add 04 to it, where would you put it?" so I don't think my question was presented in a leading way. The problem is, now you're arguing "mathematical rules." We've already established people don't think in mathematical rules. I doubt people see "04" and think "four". They think "oh-four." The zero is not an insignificant and ignored digit in the way people speak, read, or think the number. Except in math. But we're talking "normal people" here. Instead of us trying to think about them, if we're going to base this on "normal people" let's do a poll. At least this way we're not extrapolating our opinion on them based on *mathematics*, something few people think in.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Paul Louden wrote: No, this isn't. This is "having intuitive handling of numbers as normally written by people." People don't normally precede numbers with a 0 unless there's a specific reason to. I'd think that many files will have names with leading zeros, especially if they are copied from a player that doesn't support natural sorting, where the user will have added leading zeros to force a correct sorting. Also, you seem to forget the very reason that we implement natural sorting in the first place, which is to sort numbers in a natural way, so the user finds numbered files where he expect them to be, without having to change the file names. Further, natural sorting strives to sort numbers in a way that humans *expect* them to be sorted. Leading zeros are insignificant when treating numbers, that is a mathematical rule that the vast majority of people knows. I dare to say that people in general expect the browser to ignore leading zeros. If it sorts 007 after 6, I fail to see how it would be surprising to the user in any way. It is after all a well-known mathematical rule, and a rule that the major file browsers follow. If we claim to sort numbers, we should do so, and not change the fundamental rules of mathematics. Linus
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: Now, how should this feature sort? From my point of view, I'd expect it to - treat digits as numbers. A value of "00" equals to zero and thus gets sorted before the number 1, regardless if that is "1", "01" or "001". Completely skipping the zero (as it's only leading zeros) is as broken as to not strip leading zeros -- "003" should equal to 3 and "01" to 1, thus the latter sorting before the former. I totally agree. Leading zeros should be ignored when comparing numbers. It is in my opinion the least surprising way of doing it. Linus
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: That's totally flawed reasoning in this case. Use ascii-sort if you want simple and explicit. And, this is basically "Having part of their functionality, and not all of it, will lead to expected behaviour that's missing." at it's best. No, this isn't. This is "having intuitive handling of numbers as normally written by people." People don't normally precede numbers with a 0 unless there's a specific reason to. Oh, and they surely will expect that what they know from Windows explorer or nautilus, regardless of which algorithm we use. So we can either make it clear it's different and "our own way" or we can try to make it similar with the differences more subtle and thus more likely to be surprising (in a bad way). Which one's more fair to users - one where they know it's different, or one where they don't?
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Am 19.03.2009 01:52, schrieb Paul Louden: Thomas Martitz wrote: I agree. We shouldn't force a sorting which is contradictory to the sorting of all major file browsers. And yet every proposal so far is to pick and choose which aspects of major file browser sorting we like, and throw out the rest. If we're going to use them to justify changes, we should strive to actually mimic them. Having part of their functionality, and not all of it, will lead to expected behaviour that's missing. Which we would be doing by ignoring leading zeros. It's not about mimicing, but rather be consistent with what the vast majority of people knows of their PC browser. Meanwhile, if we just make ours simple and explicit, people won't expect other aspects of it that are missing. That's totally flawed reasoning in this case. Use ascii-sort if you want simple and explicit. And, this is basically "Having part of their functionality, and not all of it, will lead to expected behaviour that's missing." at it's best. Oh, and they surely will expect that what they know from Windows explorer or nautilus, regardless of which algorithm we use.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Thomas Martitz wrote: I agree. We shouldn't force a sorting which is contradictory to the sorting of all major file browsers. And yet every proposal so far is to pick and choose which aspects of major file browser sorting we like, and throw out the rest. If we're going to use them to justify changes, we should strive to actually mimic them. Having part of their functionality, and not all of it, will lead to expected behaviour that's missing. Meanwhile, if we just make ours simple and explicit, people won't expect other aspects of it that are missing.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling schrieb: Besides, from the users point of view I'd prefer to be in line with the major OSes. Not doing so will cause confusion among users. I don't think Windows treats leading zeros as intentional, and that is still the major OS. - Dominik I agree. We shouldn't force a sorting which is contradictory to the sorting of all major file browsers.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling schrieb: And I'm not allowed to compare with how it currently works? And I'm not allowed to say "Hey, this feature what you want, it does this already"? It doesn't help the case a tiny bit to present the current state if we are talking about the intention. It doesn't help the case if you only start to get defensive just because someone disagrees with the way "your" feature works. That's the point. I didn't disagree, nor you disagreed. because it already works like that (in the case of the decimal numbers).
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: Besides, from the users point of view I'd prefer to be in line with the major OSes. Not doing so will cause confusion among users. I don't think Windows treats leading zeros as intentional, and that is still the major OS. It sounds like we're going to be ignoring plenty of things windows does anyway. This change should, at least, be unnoticed by most users except those who actually want the behaviour. Except in your rather awkward theoretical case. I'm still not entirely sure why people would physically reorganize their files rather than just creating a playlist.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
2009/3/18 Dominik Riebeling : > On Wed, Mar 18, 2009 at 11:38 PM, Jonathan Gordon wrote: >> it doesnt make sense to me to have a sorted mix folder... so I agree >> with Paul here, I would tihnk that 9/10 times you play a mix folder >> you have it on random > > If someone sorts his mix folder he most likely wants it to get played > in a specific order, wouldn't he? > Think of some mix folder that has different styles of music and walks > through them -- that's a different thing than simply throwing them in > in random order. A mix folder can very well have a wanted track order. > > > - Dominik > why on earth would anyone put different styles into a single mix folder? that really makes no sense.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Thu, Mar 19, 2009 at 12:18 AM, Paul Louden wrote: > Meanwhile, we're willing to break intentional numbering because of this rare > case? Do you really think it's going to occur often enough to plan for? My personal opinion is that users will do this, and they will do it often enough to justify it -- people who want a specific sorting will use ASCII sorting anyway. Well, at least the majority of them. Besides, from the users point of view I'd prefer to be in line with the major OSes. Not doing so will cause confusion among users. I don't think Windows treats leading zeros as intentional, and that is still the major OS. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: James Bond is also a special case. As on the GoldenQuotes wiki page: Llorean: he is not named 007 to be sorted in a specific order it's because he has a license to kill Pretty much says all :) So, if they're both special cases, which is more common? People who might have 007 movies or l337-speak filenames or might prefix a 0 to a number to get it first (which is a concept even many of my non-technical friends have been trained to expect to work just because of ASCII sort in enough things, so they'll thrown on a zero and IF that doesn't work try something else, but the file rename in Rockbox is something we shouldn't expect people to be willing to use multiple times per file), or people who create mix folders on their PC that just happen to be in right-enough order that they don't have to rename their occasional mixed numbered filename?
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:37 PM, Paul Louden wrote: > So he numbers the mix manually for most of the songs, but in the case of > some songs that are already the right track number, he doesn't renumber > them? This seems like a rather special case to justify throwing out user > data. That might be a special case, but > Because I expect to see the folder "007 - James Bond" before the folder "5th > Element" even if I have natural sorting on for my tracks, among other > things. It's not "7 - James Bond" it's "Double Oh 7". They're significant > numbers, intentional and not accidental. James Bond is also a special case. As on the GoldenQuotes wiki page: Llorean: he is not named 007 to be sorted in a specific order it's because he has a license to kill Pretty much says all :) - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: If Windows Explorer / Nautilus / whatever browser the user is using sort it "correctly" as in "3", "05", "7"? Then he hasn't skipped it but our way of sorting is "wrong". So in this one single case of the person creating an organized mix folder, renaming them on his PC, choosing to skip over songs that appear in the right place, etc, we have a case where this sort is favourable. Meanwhile, we're willing to break intentional numbering because of this rare case? Do you really think it's going to occur often enough to plan for?
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:57 PM, Paul Louden wrote: > playlist). I don't think it's particularly likely that, while renaming > songs, they'll just choose to skip ones that are named differently and hope > they'll be in the right place. If Windows Explorer / Nautilus / whatever browser the user is using sort it "correctly" as in "3", "05", "7"? Then he hasn't skipped it but our way of sorting is "wrong". - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le wrote: In my understanding, the intention of natsort is to change the rules of how strings (file names) are sorted. As a side effect, it alos fixes the problem with 1, 10, 2. But it's not only about that. It's more general. What it does is more general. What it was intended to _fix_ was that specific type of case. That's what prompted initial discussion of it, and that case was the focus around which every proposal (that I remember) for sorting to be "improved" or "changed" was based on. Basically, the logic went "users expect a computer to know that the series of numbers 1, 2, 3, 4, 5, 6, 7, 8 9, 10 goes in that order, and seeing them in ASCII order is unexpected." My personal position is also that if a user adds a 0 before a number, they expect it to change something, rather than being ignored. I think, on average, more 0s (in lists meant to be sorted) will be intentional than "accidental." If you want the list sorted you either name the files, or use a set of files named already to be sorted. I think it's exceptionally rare that you'll have a list of files that a user has created and intended to be sorted that have 3, 04, 5 in them and mean it in that order. Meanwhile, it's exceedingly _rare_ in my opinion that people would intend 1, 10, 2, 3, 4 as their sorting order. And in that case we're not throwing out any data they added, just trying to "read" what is written, rather than treat it as a string of unique characters. I think 004 being treated as "00, then 4" is the same as 4a being treated as "4, then a" rather than the string "4a". Otherwise we may as well say "numbers need a space after them to denote they aren't part of strings" or something. For example, l337-speak named files currently may be sorted extremely awkwardly. B007Y for example. We should probably assume zeros are intentional there (in my opinion). I think it's just more consistent if we don't throw out any characters.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:38 PM, Al Le wrote: > But here not anymore. I think the verbal description should be first, and > then the implementation of it. You say "we take an implementation (as the I completely agree here. If done the other way round we get endless discussions and changes of the behaviour again and again. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 19.03.2009 00:00, Thomas Martitz wrote: And doing it correctly means special casing leading zeros at the very beginning. No, doing it correctly (in my view) means "interpret any sequence of digits as a number". That would automatically treat the leading zeroes.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:52 PM, Thomas Martitz wrote: > I'm telling that they don't need leading zeros for proper numerical sorting > anymore. I don't see bad reasoning in that. So, because leading zeros aren't required anymore leading zeros immediately become intentional? This is broken reasoning to me. Just because leading zeros aren't needed anymore there's no change if leading zeros are intentional or not or our knowledge if they are or not. > Hence I asked you whether we should change the discription or the way it > sorts. Just answer that instead of getting angry at me. Errr? I was asking about how this feature is *intended* to work and how people *expect* it to work. Not how it's currently done. The current implementation might differ from the way it's intended to work. > And I'm not allowed to compare with how it currently works? And I'm not > allowed to say "Hey, this feature what you want, it does this already"? It doesn't help the case a tiny bit to present the current state if we are talking about the intention. It doesn't help the case if you only start to get defensive just because someone disagrees with the way "your" feature works. > Don't be ignorant please. We've had an *awful* lot of discussion before. Do > you really think I forgot about those? There are always pros and cons. Well, then it seems there wasn't enough discussion. Just let me point to http://www.rockbox.org/irc/log-20090318#20:50:13 in the "light" of this thread. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 23:58, Paul Louden wrote: In my understanding, the intention of "natsort" was to fix 1, 10, 2, 3, 4, 5, 6. It can still do this while respecting intentional leading zeros, and my described "simple rule" still fixes that problem just fine. In my understanding, the intention of natsort is to change the rules of how strings (file names) are sorted. As a side effect, it alos fixes the problem with 1, 10, 2. But it's not only about that. It's more general.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le schrieb: On 18.03.2009 23:53, Thomas Martitz wrote: I can't understand what you mean. What break? Your general rule was in SVN a few days ago. Look at FS#10029 what it caused. No, if the rule were implemented correctly, the files would be sorded correctly as well. It must have been a flaw in the implementation, not in the idea. And doing it correctly means special casing leading zeros at the very beginning. The implementation looks at every series of digits atomically, so each has it's own leading zeros. We can only ignore the very first leading zeros.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le wrote: Yes, it's a simple rule, and from this point of view I could very well live with it. But it puts "04" before "3" which wasn't the intention of the natsort in the beginning (if I understand correctly). For example, Nautilus puts "03" before "4". In my understanding, the intention of "natsort" was to fix 1, 10, 2, 3, 4, 5, 6. It can still do this while respecting intentional leading zeros, and my described "simple rule" still fixes that problem just fine.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 23:53, Thomas Martitz wrote: I can't understand what you mean. What break? Your general rule was in SVN a few days ago. Look at FS#10029 what it caused. No, if the rule were implemented correctly, the files would be sorded correctly as well. It must have been a flaw in the implementation, not in the idea.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: If someone sorts his mix folder he most likely wants it to get played in a specific order, wouldn't he? Think of some mix folder that has different styles of music and walks through them -- that's a different thing than simply throwing them in in random order. A mix folder can very well have a wanted track order. But you're suggesting it has a wanted track order where, for some reason or other, they haven't actually named the songs themselves? I mean, step one is "copy them into the folder." Step two is "name them so they're in the right order." Right? (And this is assuming they create the mix folder on their PC, rather than just inserting the songs into a playlist). I don't think it's particularly likely that, while renaming songs, they'll just choose to skip ones that are named differently and hope they'll be in the right place.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 23:44, Paul Louden wrote: Al Le wrote: To that simple rule (treating a sequence of numbers as a number) I'd probably add the rule that many subsequent spaces are folded to one. E.g. "A space space B" would be equal to "A space B" modulo natsort. strcmp would be used to resolve the case. This seems another arbitrary "I think it should be done this way" addition. I thought you wanted a simple rule? I agree here. It's a bit arbitrary but would fit the "natural". But I wouldn't insist on it since the setting is only about numbers. How about "Don't require leading zeros." Described as "Numbers after leading zeros will be interpreted as whole numbers, rather than a series of digits." A simple rule, and one that lest people know that zeros in the middle of strings won't randomly be ignored (which they will be in currently proposed systems). It's a simple rule, can be described in on sentence, includes an option name that's descriptive, and doesn't ignore user provided parts of the filenames. Yes, it's a simple rule, and from this point of view I could very well live with it. But it puts "04" before "3" which wasn't the intention of the natsort in the beginning (if I understand correctly). For example, Nautilus puts "03" before "4".
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:38 PM, Jonathan Gordon wrote: > it doesnt make sense to me to have a sorted mix folder... so I agree > with Paul here, I would tihnk that 9/10 times you play a mix folder > you have it on random If someone sorts his mix folder he most likely wants it to get played in a specific order, wouldn't he? Think of some mix folder that has different styles of music and walks through them -- that's a different thing than simply throwing them in in random order. A mix folder can very well have a wanted track order. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le schrieb: On 18.03.2009 23:43, Thomas Martitz wrote: And, we can only special case very leading zeros. They wouldn't have to be treated specially, since the rule is general enough to handle them. Only if you want to break decimal numbers or discnumber.tracknumber (or any other numbers which have a constant prefix in the strings to be compared). I can't understand what you mean. What break? Your general rule was in SVN a few days ago. Look at FS#10029 what it caused.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling schrieb: On Wed, Mar 18, 2009 at 11:18 PM, Thomas Martitz wrote: People can rely on ommitting leading zeros now since we can sort it correctly numerically. That makes me think that any leading zero may very well be intended. People can rely on the way it was implemented so because of that you consider leading zeros intentional? As in it was implemented that way so you can consider leading zeros intentional? What kind of reasoning is *that*? I'm telling that they don't need leading zeros for proper numerical sorting anymore. I don't see bad reasoning in that. I don't see what's wrong with ignoring spaces. It's obvious that spaces aren't real part of the names when it comes to sorting (as in 1 and 2 spaces should be sorted differently). The setting doesn't tell anything about spaces. It talks about number. Thus it has to deal with numbers, not spaces. Everything else is misleading, wrong and confusing. Why would anyone want to sort by spaces anyway? This doesn't make any sense to me. It doesn't need to make sense to you but I'm sure you'll find someone out there that prefers this. That's definitely no good reason for hiding a space-eating "feature" in number-aware sort. Hence I asked you whether we should change the discription or the way it sorts. Just answer that instead of getting angry at me. Decimal numbers and discnumber.tracknumber works with the current svn. This discussion isn't about the way it works with current svn. It's about how this feature is *supposed* to work and how people *expect* it to work. And I'm not allowed to compare with how it currently works? And I'm not allowed to say "Hey, this feature what you want, it does this already"? Really? Well, someone commiting such a feature could have though about the possibility others having a different view and expectation of such a feature. Those FS entries must have had a reason, don't they? - Dominik Don't be ignorant please. We've had an *awful* lot of discussion before. Do you really think I forgot about those? There are always pros and cons. That's no reasoning to let something rot or something. And please calm down please and stay friendly. No reason for getting at me.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 23:43, Thomas Martitz wrote: And, we can only special case very leading zeros. They wouldn't have to be treated specially, since the rule is general enough to handle them. Only if you want to break decimal numbers or discnumber.tracknumber (or any other numbers which have a constant prefix in the strings to be compared). I can't understand what you mean. What break?
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling schrieb: On Wed, Mar 18, 2009 at 11:20 PM, Thomas Martitz wrote: There's no special treatment of dots at all. How do you come to that idea? Zeros are not special treated either. Currently only spaces are a special case, as they are ignored. Sorry? Are you reading the thread? This is discussion about how it *should* work, and thus also about how it *should not* work. Not about how the current implementation works. - Dominik He said what it should not do, and I told him that it already doesn't do it as of now (except for the spaces).
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le wrote: But here not anymore. I think the verbal description should be first, and then the implementation of it. You say "we take an implementation (as the author did it) and try to describe it". I say "we define a simple rule (but which sorts the names as users expect it) and implement it. If the original algorithm would have to be modified then we modify it". To that simple rule (treating a sequence of numbers as a number) I'd probably add the rule that many subsequent spaces are folded to one. E.g. "A space space B" would be equal to "A space B" modulo natsort. strcmp would be used to resolve the case. This seems another arbitrary "I think it should be done this way" addition. I thought you wanted a simple rule? How about "Don't require leading zeros." Described as "Numbers after leading zeros will be interpreted as whole numbers, rather than a series of digits." A simple rule, and one that lest people know that zeros in the middle of strings won't randomly be ignored (which they will be in currently proposed systems). It's a simple rule, can be described in on sentence, includes an option name that's descriptive, and doesn't ignore user provided parts of the filenames.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:20 PM, Thomas Martitz wrote: > There's no special treatment of dots at all. How do you come to that > idea? Zeros are not special treated either. Currently only spaces are a > special case, as they are ignored. Sorry? Are you reading the thread? This is discussion about how it *should* work, and thus also about how it *should not* work. Not about how the current implementation works. - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le schrieb: On 18.03.2009 23:20, Thomas Martitz wrote: Right. We cannot surely predict whether it is intentional or accidental. Here I'm with you Which is why we decided to go with the original implementation and say "the author of the code made it like that". But here not anymore. I think the verbal description should be first, and then the implementation of it. You say "we take an implementation (as the author did it) and try to describe it". I say "we define a simple rule (but which sorts the names as users expect it) and implement it. If the original algorithm would have to be modified then we modify it". Yes, I can live with that. To that simple rule (treating a sequence of numbers as a number) I'd probably add the rule that many subsequent spaces are folded to one. E.g. "A space space B" would be equal to "A space B" modulo natsort. strcmp would be used to resolve the case. And, we can only special case very leading zeros. They wouldn't have to be treated specially, since the rule is general enough to handle them. Only if you want to break decimal numbers or discnumber.tracknumber (or any other numbers which have a constant prefix in the strings to be compared). This is what we had, and it turned out to be flawed. We cannot ignore leading zeros of numbers within the string, only at the very beginning.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le wrote: I don't see why "we can describe it" is a reason to use our own method - we can describe methods we get the code for from elsewhere too. But, as you pointed out above, such a complicated logic requires a long description with many examples illustrating all the quirks. Which makes such a description pointless since nobody would grasp it. I disagree. With sorting you can give examples. Saying "nobody would grasp it" is overly broad when you don't even know how long the set of rules even is, yet.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:18 PM, Thomas Martitz wrote: > People can rely on ommitting leading zeros now since we can sort it > correctly numerically. That makes me think that any leading zero may > very well be intended. People can rely on the way it was implemented so because of that you consider leading zeros intentional? As in it was implemented that way so you can consider leading zeros intentional? What kind of reasoning is *that*? > I don't see what's wrong with ignoring spaces. It's obvious that spaces > aren't real part of the names when it comes to sorting (as in 1 and 2 > spaces should be sorted differently). The setting doesn't tell anything about spaces. It talks about number. Thus it has to deal with numbers, not spaces. Everything else is misleading, wrong and confusing. > Why would anyone want to sort by spaces anyway? This doesn't make any > sense to me. It doesn't need to make sense to you but I'm sure you'll find someone out there that prefers this. That's definitely no good reason for hiding a space-eating "feature" in number-aware sort. > Decimal numbers and discnumber.tracknumber works with the current svn. This discussion isn't about the way it works with current svn. It's about how this feature is *supposed* to work and how people *expect* it to work. > If you search for logs, we had a discussion yesterday starting here: > http://www.rockbox.org/irc/log-20090317#17:53:35 and today starting > here: http://www.rockbox.org/irc/log-20090318#19:25:26 If you'd read my inital mail you'd noticed that I linked the first log myself. Still I don't see a consensus how this *exactly* should work. > Both are Flyspray-bugreport induced, and I can't remember another > discussion other than those before the initial commit. Well, someone commiting such a feature could have though about the possibility others having a different view and expectation of such a feature. Those FS entries must have had a reason, don't they? - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 23:20, Thomas Martitz wrote: Right. We cannot surely predict whether it is intentional or accidental. Here I'm with you Which is why we decided to go with the original implementation and say "the author of the code made it like that". But here not anymore. I think the verbal description should be first, and then the implementation of it. You say "we take an implementation (as the author did it) and try to describe it". I say "we define a simple rule (but which sorts the names as users expect it) and implement it. If the original algorithm would have to be modified then we modify it". To that simple rule (treating a sequence of numbers as a number) I'd probably add the rule that many subsequent spaces are folded to one. E.g. "A space space B" would be equal to "A space B" modulo natsort. strcmp would be used to resolve the case. And, we can only special case very leading zeros. They wouldn't have to be treated specially, since the rule is general enough to handle them. Going back to what was before yesterday Again: the idea is primary (and stable), the implementation is secondary
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
2009/3/18 Dominik Riebeling : > Just consider this scenario: someone is creating a mix folder by > copying various files to it. He wants a given order. Now some files > are named with 0 prefixes, others not. He decides to not use 0 > prefixes, possibly because of laziness, but leaves them on the files > he doesn't need to change at all -- if "05" is to become track 5 he > doesn't need to change anything. As for track "09" which should become > 3 he'd use "3". Too far-fetched? > it doesnt make sense to me to have a sorted mix folder... so I agree with Paul here, I would tihnk that 9/10 times you play a mix folder you have it on random
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: Just consider this scenario: someone is creating a mix folder by copying various files to it. He wants a given order. Now some files are named with 0 prefixes, others not. He decides to not use 0 prefixes, possibly because of laziness, but leaves them on the files he doesn't need to change at all -- if "05" is to become track 5 he doesn't need to change anything. As for track "09" which should become 3 he'd use "3". Too far-fetched? So he numbers the mix manually for most of the songs, but in the case of some songs that are already the right track number, he doesn't renumber them? This seems like a rather special case to justify throwing out user data. well, treating a number as such includes stripping leading zeros from it, at least from my understanding. Where in my example did leading zeros show up that require stripping, then? It won't do any harm on properly named files, and I don't see a reason why a user would want to prefix with 0 just to change sorting. Because I expect to see the folder "007 - James Bond" before the folder "5th Element" even if I have natural sorting on for my tracks, among other things. It's not "7 - James Bond" it's "Double Oh 7". They're significant numbers, intentional and not accidental.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le schrieb: For example, is "1.001" one point zero zero one, Not this in natsort. In natsort it would be "the number 1", a dot, "the number 1". The interpretation of the numbers is beyond the scope. Natsort will sort this before 1.002 and before 1.01. It does this, because it does not ignore leading zeros. or one thousand and one? No since it assume a country specific separator. Nobody assumes a separator. This is just a dot, a non-digit character. Any non-digit character will act as a separator.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling schrieb: Maybe I've missed such a consensus -- in this case someone please point me to the right direction and ignore this mail :) After this discussion and the ones in IRC, it seems to me that the majority is in favor of ignoring leading zeros. This would also match with Nautilus' and Windows Explorer's sorting. And we can do that. Give that the usual browsers do it that way, it's also what the user expects, so it can't be bad. FS#10031 needs changing the algorithm anyway. So, should we do that? It at least seems to be the opinion of most people.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 23:12, Paul Louden wrote: I'd rather have an "absolute" than a "relative" definition. We already have an absolute: ASCII sort. Yes, but it doesn't treat the case "1, 2, ..., 10" in the way many users would expect (ot like) it. Hence the natsort. Our "natural" sorting is entirely a mishmash of rules as to how numbers should be treated, and other characters. It wouldn't be such a mishmash if we'd implement just that simple rule (which is given a very well name): "interpret numbers as ...". Any non-digit character (also a dot and a comma) are just characters, i.e. we only consider integer numbers. For example, is "1.001" one point zero zero one, Not in natsort or disk one track one, Not this in natsort. In natsort it would be "the number 1", a dot, "the number 1". The interpretation of the numbers is beyond the scope. or one thousand and one? No since it assume a country specific separator. If we make up our own non-standard way, yes, we can describe it. We can a few paragraphs in the manual detailing how people can expect their files to be sorted, since no other program does it like we do. Actually we wouldn't need a very long description if the rule would be simple enough. I don't see why "we can describe it" is a reason to use our own method - we can describe methods we get the code for from elsewhere too. But, as you pointed out above, such a complicated logic requires a long description with many examples illustrating all the quirks. Which makes such a description pointless since nobody would grasp it.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 11:01 PM, Paul Louden wrote: > You really believe a person, while naming files, would number them "3, 05, > 7"? Why would they add the 0 onto the 5 one if they're already typing > single-digit numbers? Just consider this scenario: someone is creating a mix folder by copying various files to it. He wants a given order. Now some files are named with 0 prefixes, others not. He decides to not use 0 prefixes, possibly because of laziness, but leaves them on the files he doesn't need to change at all -- if "05" is to become track 5 he doesn't need to change anything. As for track "09" which should become 3 he'd use "3". Too far-fetched? > We were fixing, previously, the case where people had chosen names of 1, 2, > 3, 4, 5, 6, 7, 8, 9, 10 and not known they *must* use leading zeros to > prevent bad sorting. But I don't think it's a fair assumption to go as far > as saying "when they add a character, they didn't mean to type it." well, treating a number as such includes stripping leading zeros from it, at least from my understanding. It won't do any harm on properly named files, and I don't see a reason why a user would want to prefix with 0 just to change sorting. We're restricting digit-postfixes to numbers this way (which most people will consider less problematic but we still restrict the users: "01", "02", "020", "03" won't work anymore while it does when sorting ASCII. The user will still think of 2-digit numbers). In any case, it would be interesting to see how windows does the sorting, as the most users will be used to that way of doing it. > "A" doesn't come before "1" so a1 to come before 1 doesn't work. 01 could, > except we would be preventing it. I don't see why we should force people not > to use zeros. Sorry, got my thoughts mixed up. I use letter postfixes to get it sorted after that specific number, i.e. 01, 03a, 03b, 04 but prefixed to get it sorted at the top or bottom of the list (a1 would come after numbers, _1 usually before it -- unless the sorting treats _ as space and ignores spaces). - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Am 18.03.2009 22:57, schrieb Al Le: I think we can't tell for sure what's intentional and what's not. All we have is a bunch of files, and it's not our task to infer how it has come to it. Guessing what the intention was is an intelligence of a higher degree than the natsort! Right. We cannot surely predict whether it is intentional or accidental. Which is why we decided to go with the original implementation and say "the author of the code made it like that". And, we can only special case very leading zeros. Going back to what was before yesterday will break zeros within the string again (such as discnumbers and xxx_00_abc vs xxx_01_abc. See FS#10029. But that means costumizing the code again (I'm not against it at all). Either treat digits as number or don't treat them as numbers at all. I'm absolutely with you, Dominik. This way the thing the algorithm does can be captured in few words, which we accurately did in the setting names ("Interpret numbers as ..."). Special treatment of leading zeroes, spaces and dots is too much for a usual human being to understand. There's no special treatment of dots at all. How do you come to that idea? Zeros are not special treated either. Currently only spaces are a special case, as they are ignored. Also, which comes first: 001 or 01? If strnatcmp tells two strings are equal then strcmp is called which always delivers a perfectly predictable result. Yes, but do we really want them to sort the same? I'm really not sure about that. Particularly when they have some prefix.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Am 18.03.2009 22:22, schrieb Paul Louden: Dominik Riebeling wrote: A situation where a folder can contain files starting with "02" and "4" the same time is something that could happen and still not being intentional (just think of copying files from various albums to a mix folder). I agree that it could be unintentional, but disagree that "numeric sorting" matters in this case - if you have a mix of random songs, why does 03 need to be between 2 and 4? Meanwhile, if someone intentionally prefixes something with a 0, they intend for it to be first, so it should be. This sounds like a case of "let's make the unimportant case work one way, while choosing to break the case where people make intentional changes." I agree with Llorean, this case is purely personal preferences I think. 03 and 2 in the same folder, could be accidental (from mixed albums), or intentional. We sort it, but I'm not sure if there's really *one* correct way for this case. People can rely on ommitting leading zeros now since we can sort it correctly numerically. That makes me think that any leading zero may very well be intended. Either treat digits as number or don't treat them as numbers at all. - Spaces shouldn't get collapsed. A space is a space, and "interpret numbers" doesn't tell anything about spaces. At least at some point during the lifetime of this setting spaces were collapsed. Nothing that is a number ... I don't see what's wrong with ignoring spaces. It's obvious that spaces aren't real part of the names when it comes to sorting (as in 1 and 2 spaces should be sorted differently). Why would anyone want to sort by spaces anyway? This doesn't make any sense to me. But yes, the option doesn't tell about that. Should we change the description, or how it's working? This still leaves some open issues I'm not sure how to deal about: - how are floating-point numbers to be treated? "1.001" is smaller as "1.01" when treating as numbers, so on the one hand I'd expect them to sort that way. On the other hand, recognizing the dot as decimal separator is broken as well -- not all languages use it as decimal separator (like german using the comma). Stopping the number-treating at dots is also kinda broken -- how should a naming be handled as "discnumber.tracknumber", i.e. like "1.2", "1.10" -- which one has to be sorted first? The best solution here might be to treat all numbers as single numbers, regardless if they might be floating point numbers -- I guess it's more common to have a "1.3" numbering to mark discnumber.track instead of a floating point number "1.003". I expect people will number disks 1.01 - 1.12 rather than 1.1, 1.2, 1.10, 1.12. This is only my personal assumption, but if that's the case, our current method works for it. Decimal numbers and discnumber.tracknumber works with the current svn. 1.1 is sorted after 1.01, as well as 1.10 is sorted after 1.1 (or 1.2). And it doesn't take the dot specially as seperator, but any non-digit, so it will work for commata too. I'm pretty sure I've missed some of my points right now :) What do people think about this sorting thing? Well, we're currently using an "existing" algorithm. One that *may* be used in other FLOSS (I don't know, and haven't investigated). To me, the two sides of the argument are basically "do we want to use it as-is, such that our sorted lists look the same as lists in other applications, or do we want to define our own rules for 'natural' list sorting?" Of course, this is dependent upon research I haven't done (specifically, do any other applications use this sort algorithm). Maybe we should just see if various FLOSS file browsers have a common "natural" sort, and use it, so that our files are likely to show up in the host's browser the same order as they show up in ours? If you search for logs, we had a discussion yesterday starting here: http://www.rockbox.org/irc/log-20090317#17:53:35 and today starting here: http://www.rockbox.org/irc/log-20090318#19:25:26 Both are Flyspray-bugreport induced, and I can't remember another discussion other than those before the initial commit. I think the only remaining problem is FS#10031, which would be an relatively easy fix (it sorts filenames starting with chars between 'Z' and 'a' differently than normal strcmp, regardless of numbers in the name, because it uses toupper instead of tolower for case-insensitive sorting), but it would require to leave the path of using the original algorithm without changing again.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le wrote: Yes, it's easier because it's a simple rule. If the browsers use a complicated logic (which may change with a release) how would you describe this for Rockbox? "It does it like BrowserX"? The next question would be then "and how does BrowserX do it"? I'd rather have an "absolute" than a "relative" definition. We already have an absolute: ASCII sort. Our "natural" sorting is entirely a mishmash of rules as to how numbers should be treated, and other characters. For example, is "1.001" one point zero zero one, or disk one track one, or one thousand and one? If we make up our own non-standard way, yes, we can describe it. We can a few paragraphs in the manual detailing how people can expect their files to be sorted, since no other program does it like we do. Or we can use a "standard" way, describe it in the manual anyway, and have most people *not* need to look it up in the manual because the list is the same as they usually see. I don't see why "we can describe it" is a reason to use our own method - we can describe methods we get the code for from elsewhere too.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 23:02, Paul Louden wrote: Al Le wrote: Well, we're currently using an "existing" algorithm. One that *may* be used in other FLOSS (I don't know, and haven't investigated). To me, the two sides of the argument are basically "do we want to use it as-is, such that our sorted lists look the same as lists in other applications, or do we want to define our own rules for 'natural' list sorting?" I'd opt for the latter, because it's easy to understand. "It sorts in this strange way we've made up" is easier to understand than "It sorts like Nautilus, FileBrowserX and FileBrowserY which you may already use?" Yes, it's easier because it's a simple rule. If the browsers use a complicated logic (which may change with a release) how would you describe this for Rockbox? "It does it like BrowserX"? The next question would be then "and how does BrowserX do it"? I'd rather have an "absolute" than a "relative" definition.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Al Le wrote: Well, we're currently using an "existing" algorithm. One that *may* be used in other FLOSS (I don't know, and haven't investigated). To me, the two sides of the argument are basically "do we want to use it as-is, such that our sorted lists look the same as lists in other applications, or do we want to define our own rules for 'natural' list sorting?" I'd opt for the latter, because it's easy to understand. "It sorts in this strange way we've made up" is easier to understand than "It sorts like Nautilus, FileBrowserX and FileBrowserY which you may already use?"
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: On Wed, Mar 18, 2009 at 10:22 PM, Paul Louden wrote: I agree that it could be unintentional, but disagree that "numeric sorting" matters in this case - if you have a mix of random songs, why does 03 need to be between 2 and 4? Meanwhile, if someone intentionally prefixes something with a 0, they intend for it to be first, so it should be. This sounds like a case of "let's make the unimportant case work one way, while choosing to break the case where people make intentional changes." You have a point, but if we assume people to add leading zeros *intentionally* can't we assume those people to number their files correctly in the first place, thus not having any need for this "natural" sorting anyway? I'm pretty confident that there are users careless enough to name files "3", "05", "7" and expecting them to get sorted as "3", "05", "7". If we consider leading zeros as intentional do we need this strnatcmp at all? If we not skip leading zeros are we treating digits as numbers at all? I wouldn't say so. Besides, couldn't that also create a sorting like "10", "13", "02", "04" as 0 is a character here and thus sorted separately? Such a sorting would be wrong if one names his files always using leading zeros, especially if numbers are always sorted first. You really believe a person, while naming files, would number them "3, 05, 7"? Why would they add the 0 onto the 5 one if they're already typing single-digit numbers? Can you give me a realistic case where someone wants their files in the order "3, 05, 7" and has named them themselves? We were fixing, previously, the case where people had chosen names of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and not known they *must* use leading zeros to prevent bad sorting. But I don't think it's a fair assumption to go as far as saying "when they add a character, they didn't mean to type it." Does "alter the sorting" require you to use digits? At least I usually prepend character if I want something to get sorted at the top or bottom or indexes like "a", "b" after the leading number. "A" doesn't come before "1" so a1 to come before 1 doesn't work. 01 could, except we would be preventing it. I don't see why we should force people not to use zeros. Good point. Though if we only address the naming issue we kinda create our own sorting, don't we? I don't know what you mean by "only address the naming issue."
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On 18.03.2009 22:22, Paul Louden wrote: Dominik Riebeling wrote: A situation where a folder can contain files starting with "02" and "4" the same time is something that could happen and still not being intentional (just think of copying files from various albums to a mix folder). I agree that it could be unintentional, but disagree that "numeric sorting" matters in this case - if you have a mix of random songs, why does 03 need to be between 2 and 4? Meanwhile, if someone intentionally prefixes something with a 0, they intend for it to be first I think we can't tell for sure what's intentional and what's not. All we have is a bunch of files, and it's not our task to infer how it has come to it. Guessing what the intention was is an intelligence of a higher degree than the natsort! Either treat digits as number or don't treat them as numbers at all. I'm absolutely with you, Dominik. This way the thing the algorithm does can be captured in few words, which we accurately did in the setting names ("Interpret numbers as ..."). Special treatment of leading zeroes, spaces and dots is too much for a usual human being to understand. Also, which comes first: 001 or 01? If strnatcmp tells two strings are equal then strcmp is called which always delivers a perfectly predictable result. Well, we're currently using an "existing" algorithm. One that *may* be used in other FLOSS (I don't know, and haven't investigated). To me, the two sides of the argument are basically "do we want to use it as-is, such that our sorted lists look the same as lists in other applications, or do we want to define our own rules for 'natural' list sorting?" I'd opt for the latter, because it's easy to understand.
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
On Wed, Mar 18, 2009 at 10:22 PM, Paul Louden wrote: > I agree that it could be unintentional, but disagree that "numeric sorting" > matters in this case - if you have a mix of random songs, why does 03 need > to be between 2 and 4? Meanwhile, if someone intentionally prefixes > something with a 0, they intend for it to be first, so it should be. This > sounds like a case of "let's make the unimportant case work one way, while > choosing to break the case where people make intentional changes." You have a point, but if we assume people to add leading zeros *intentionally* can't we assume those people to number their files correctly in the first place, thus not having any need for this "natural" sorting anyway? I'm pretty confident that there are users careless enough to name files "3", "05", "7" and expecting them to get sorted as "3", "05", "7". If we consider leading zeros as intentional do we need this strnatcmp at all? If we not skip leading zeros are we treating digits as numbers at all? I wouldn't say so. Besides, couldn't that also create a sorting like "10", "13", "02", "04" as 0 is a character here and thus sorted separately? Such a sorting would be wrong if one names his files always using leading zeros, especially if numbers are always sorted first. > ignore leading zeros. I don't think we should ever assume any part of a > filename is unintentional. I think assuming numbers are written as a human in that case why do we need to add additional "brain" to the naming the user chose? > normally does is fine (1, 2, 3, 10, 11, 12) but if someone chooses to add > something to alter sorting we should still respect it. You don't Does "alter the sorting" require you to use digits? At least I usually prepend character if I want something to get sorted at the top or bottom or indexes like "a", "b" after the leading number. > accidentally add a 0, and if there are random zeros in a mix folder the > order of playback almost certainly isn't meant to be 2, 03, 4, but rather > "whatever order" if they just chose to haphazardly mix them. Ok, that was a bad example :) > Also, which comes first: 001 or 01? If we're going to recognize that 001 has > one more zero than 01, why don't we recognize that 00number has more zeros > than 0number, even if the two numbers are different? well, in that case (as both strings will evaluate to the number 1) a strcmp would be in place to break the tie. It's a corner-case as both numbers are identical (and 00 isn't "worth" more than "0", is it?). Thus I don't think this is much of an issue as long as it is deterministic. > I expect people will number disks 1.01 - 1.12 rather than 1.1, 1.2, 1.10, > 1.12. This is only my personal assumption, but if that's the case, our > current method works for it. I'd expect the same too, but I'm among the people that don't need strnatcmp anyway as I properly name my files ;-) > Maybe we should just see if various FLOSS file browsers have a common > "natural" sort, and use it, so that our files are likely to show up in the > host's browser the same order as they show up in ours? Good point. Though if we only address the naming issue we kinda create our own sorting, don't we? - Dominik
Re: how is strnatcmp aka "Interpret numbers while sorting" supposed to sort?
Dominik Riebeling wrote: A situation where a folder can contain files starting with "02" and "4" the same time is something that could happen and still not being intentional (just think of copying files from various albums to a mix folder). I agree that it could be unintentional, but disagree that "numeric sorting" matters in this case - if you have a mix of random songs, why does 03 need to be between 2 and 4? Meanwhile, if someone intentionally prefixes something with a 0, they intend for it to be first, so it should be. This sounds like a case of "let's make the unimportant case work one way, while choosing to break the case where people make intentional changes." Either treat digits as number or don't treat them as numbers at all. - Spaces shouldn't get collapsed. A space is a space, and "interpret numbers" doesn't tell anything about spaces. At least at some point during the lifetime of this setting spaces were collapsed. Nothing that is a number ... This is originally based on a natural sorting algorithm, which does a lot more than numbers it seems. My understanding was the original intent was to simply fix 1, 10, 2, 3, 4 into 1, 2, 3, 4, 10. I don't see why this *should* ignore leading zeros. I don't think we should ever assume any part of a filename is unintentional. I think assuming numbers are written as a human normally does is fine (1, 2, 3, 10, 11, 12) but if someone chooses to add something to alter sorting we should still respect it. You don't accidentally add a 0, and if there are random zeros in a mix folder the order of playback almost certainly isn't meant to be 2, 03, 4, but rather "whatever order" if they just chose to haphazardly mix them. Also, which comes first: 001 or 01? If we're going to recognize that 001 has one more zero than 01, why don't we recognize that 00number has more zeros than 0number, even if the two numbers are different? This still leaves some open issues I'm not sure how to deal about: - how are floating-point numbers to be treated? "1.001" is smaller as "1.01" when treating as numbers, so on the one hand I'd expect them to sort that way. On the other hand, recognizing the dot as decimal separator is broken as well -- not all languages use it as decimal separator (like german using the comma). Stopping the number-treating at dots is also kinda broken -- how should a naming be handled as "discnumber.tracknumber", i.e. like "1.2", "1.10" -- which one has to be sorted first? The best solution here might be to treat all numbers as single numbers, regardless if they might be floating point numbers -- I guess it's more common to have a "1.3" numbering to mark discnumber.track instead of a floating point number "1.003". I expect people will number disks 1.01 - 1.12 rather than 1.1, 1.2, 1.10, 1.12. This is only my personal assumption, but if that's the case, our current method works for it. I'm pretty sure I've missed some of my points right now :) What do people think about this sorting thing? Well, we're currently using an "existing" algorithm. One that *may* be used in other FLOSS (I don't know, and haven't investigated). To me, the two sides of the argument are basically "do we want to use it as-is, such that our sorted lists look the same as lists in other applications, or do we want to define our own rules for 'natural' list sorting?" Of course, this is dependent upon research I haven't done (specifically, do any other applications use this sort algorithm). Maybe we should just see if various FLOSS file browsers have a common "natural" sort, and use it, so that our files are likely to show up in the host's browser the same order as they show up in ours?