Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Linus Nielsen Feltzing

Paul Louden wrote:
No, this isn't. This is having intuitive handling of numbers as 
normally written by people. People don't normally precede numbers with 
a 0 unless there's a specific reason to.


I'd think that many files will have names with leading zeros, especially 
if they are copied from a player that doesn't support natural sorting, 
where the user will have added leading zeros to force a correct sorting.


Also, you seem to forget the very reason that we implement natural 
sorting in the first place, which is to sort numbers in a natural way, 
so the user finds numbered files where he expect them to be, without 
having to change the file names.


Further, natural sorting strives to sort numbers in a way that humans 
*expect* them to be sorted. Leading zeros are insignificant when 
treating numbers, that is a mathematical rule that the vast majority of 
people knows. I dare to say that people in general expect the browser to 
ignore leading zeros.


If it sorts 007 after 6, I fail to see how it would be surprising to the 
user in any way. It is after all a well-known mathematical rule, and a 
rule that the major file browsers follow. If we claim to sort numbers, 
we should do so, and not change the fundamental rules of mathematics.


Linus



Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Paul Louden

Linus Nielsen Feltzing wrote:


If it sorts 007 after 6, I fail to see how it would be surprising to 
the user in any way. It is after all a well-known mathematical rule, 
and a rule that the major file browsers follow. If we claim to sort 
numbers, we should do so, and not change the fundamental rules of 
mathematics.
Just as a counterpoint to this - People don't normally put 0s before a 
number. I would expect a lot of people would think 007 is 00 and 7 
not 7 and that leading zeros are not part of the number. I know an 
informal study of all of my friends online right now (none of whom are 
computers scientists and many of whom are artists or fairly nontechnical 
people) as told me that they expect that 04 would come before 2 
because of the zero. It was presented this way if you had a list 2, 3, 
4, 5, and you were to add 04 to it, where would you put it? so I don't 
think my question was presented in a leading way.


The problem is, now you're arguing mathematical rules. We've already 
established people don't think in mathematical rules. I doubt people see 
04 and think four. They think oh-four. The zero is not an 
insignificant and ignored digit in the way people speak, read, or think 
the number. Except in math. But we're talking normal people here.


Instead of us trying to think about them, if we're going to base this on 
normal people let's do a poll. At least this way we're not 
extrapolating our opinion on them based on *mathematics*, something few 
people think in.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Mike Holden
Dominik Riebeling wrote:
 well, treating a number as such includes stripping leading zeros from
 it, at least from my understanding. It won't do any harm on properly
 named files, and I don't see a reason why a user would want to prefix
 with 0 just to change sorting.

Maybe leading zeros should only be stripped if another digit follows them?

I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I
have created (as opposed to original artist albums), and the leading
zerozero is deliberately there to sort them at the top.

At the moment using natural sorting, 00RockFaves.m3u is sorted among the
R entries, totally defeating my intention in choosing that naming (and
not natural to my view!).

01Rock should sort before 02Rock, agreed, but should 01Rock sort before A
or before S?

 In any case, it would be interesting to see how windows does the
 sorting, as the most users will be used to that way of doing it.

Maybe many, but we shouldn't assume that is the case. I personally have no
idea how Windows does it, and I wouldn't necessarily agree that just
because MS does it that that is the _right_ way to do it.

-- 
Mike Holden

http://www.by-ang.com - the place to shop for all manner of hand crafted
items, including Jewellery, Greetings Cards and Gifts





Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Mike Holden wrote:


Maybe leading zeros should only be stripped if another digit follows them?

I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I
have created (as opposed to original artist albums), and the leading
zerozero is deliberately there to sort them at the top.

At the moment using natural sorting, 00RockFaves.m3u is sorted among the
R entries, totally defeating my intention in choosing that naming (and
not natural to my view!).

01Rock should sort before 02Rock, agreed, but should 01Rock sort before A
or before S?

  


Sounds to me that you're better off using ascii sort.

In any case, it would be interesting to see how windows does the
sorting, as the most users will be used to that way of doing it.



Maybe many, but we shouldn't assume that is the case. I personally have no
idea how Windows does it, and I wouldn't necessarily agree that just
because MS does it that that is the _right_ way to do it.
  


Windows does it in the same way as nautilus and other major file 
browsers. It ignores leading zeros.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Linus Nielsen Feltzing wrote:

Mike Holden wrote:
Maybe leading zeros should only be stripped if another digit follows 
them?


I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I
have created (as opposed to original artist albums), and the leading
zerozero is deliberately there to sort them at the top.


That's an interesting observation. I believe leading zeroes are 
treated like whitespace in the current code, but in this case I think 
that the final zero should be kept.


Linus
That's not trivial, and adds complexity.  You basically need to look at 
the current, the next, and one more for this, instead of just the 
current char.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Thomas Martitz wrote:

Dominik Riebeling schrieb:

Maybe I've missed such a consensus -- in this case someone please
point me to the right direction and ignore this mail :)


After this discussion and the ones in IRC, it seems to me that the 
majority is in favor of ignoring leading zeros. This would also match 
with Nautilus' and Windows Explorer's sorting.


And we can do that. Give that the usual browsers do it that way, it's 
also what the user expects, so it can't be bad. FS#10031 needs 
changing the algorithm anyway.


So, should we do that? It at least seems to be the opinion of most 
people.


Ok, I've implemented ignoring very leading zeros now, and fixed 
FS#10031, in my local repo. It could be committed, I think. It seems the 
consensus is reached.


Alternatively we can also think about ignoring chars like . and _ (and 
possibly more) in the beginning of a file name (e.g. .rockbox is sorted 
under r). Just an idea. It doesn't really add complexity, but would 
definitely do more than the setting advertises. But, this is also 
something windows/nautilus/more do.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Bryan VanDyke
Dominik Riebeling wrote:
 Hi,
 
 I started wondering how the value as whole numbers for the setting
 interpret numbers while sorting is intended to work. Currently it
 seems to get changed in svn quite often. However, I haven't seen a
 consensus how this feature is supposed to work (read: sort) gathered,
 especially before committing. A recent discussion is here:
 http://www.rockbox.org/irc/log-20090317#17:53:35
 
 Maybe I've missed such a consensus -- in this case someone please
 point me to the right direction and ignore this mail :) Changing the
 behaviour of this setting frequently is a rather bad thing IMO. We
 need to specify how we want it to work and implement it that way.
 Doing this kind of discussion in svn is a bad thing and can only
 lead to confusion among users. We didn't do this for the study mode
 feature either, even if there was a consensus to change it.
 
 Now, how should this feature sort? From my point of view, I'd expect it to
 - treat digits as numbers. A value of 00 equals to zero and thus
 gets sorted before the number 1, regardless if that is 1, 01 or
 001. Completely skipping the zero (as it's only leading zeros) is as
 broken as to not strip leading zeros -- 003 should equal to 3 and
 01 to 1, thus the latter sorting before the former. A situation
 where a folder can contain files starting with 02 and 4 the same
 time is something that could happen and still not being intentional
 (just think of copying files from various albums to a mix folder).
 Either treat digits as number or don't treat them as numbers at all.
 - Spaces shouldn't get collapsed. A space is a space, and interpret
 numbers doesn't tell anything about spaces. At least at some point
 during the lifetime of this setting spaces were collapsed. Nothing
 that is a number ...
 
 This still leaves some open issues I'm not sure how to deal about:
 - how are floating-point numbers to be treated? 1.001 is smaller as
 1.01 when treating as numbers, so on the one hand I'd expect them to
 sort that way. On the other hand, recognizing the dot as decimal
 separator is broken as well -- not all languages use it as decimal
 separator (like german using the comma). Stopping the number-treating
 at dots is also kinda broken -- how should a naming be handled as
 discnumber.tracknumber, i.e. like 1.2, 1.10 -- which one has to
 be sorted first? The best solution here might be to treat all numbers
 as single numbers, regardless if they might be floating point numbers
 -- I guess it's more common to have a 1.3 numbering to mark
 discnumber.track instead of a floating point number 1.003.
 
 I'm pretty sure I've missed some of my points right now :) What do
 people think about this sorting thing?
 
 
  - Dominik
 


I think it should just be the simplest and easiest to understand.

Any consecutive run of numbers [0-9] are treated as a its value for
sorting purposes.

This means any non-digit is treated like a separator. Which would
include punctuation, spaces, etc.  This also avoid trying to figure out
what the person meant by using a period. Was it a separator, equivalent
to US comma, region setting, real number, etc? That's just a road nobody
is going to agree on.

Same thing if a person is using punctuation, leading zeros, etc to
control the sort order. There's no way to read the persons mind on what
 they intended. In all likelihood they're going to use the ASCII sort
anyways.

The various implementation that have been used by RB have tried to eat
spaces so
a 1
a001
a  01
Are all equal to a1. I say throw that out too.


1. Numbers sort before Non-numbers.
- Leading zeros are striped. A leading zero on a zero is still zero.
- 000 becomes 0.
- Some of the code that has been used has trouble with this.
2. Lesser number before greater.
- 1,2,3,4 etc
3. Anything else strcmp.
















Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Bryan VanDyke
Thomas Martitz wrote:
 Linus Nielsen Feltzing wrote:
 Mike Holden wrote:
 Maybe leading zeros should only be stripped if another digit follows
 them?

 I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I
 have created (as opposed to original artist albums), and the leading
 zerozero is deliberately there to sort them at the top.

 That's an interesting observation. I believe leading zeroes are
 treated like whitespace in the current code, but in this case I think
 that the final zero should be kept.

 Linus
 That's not trivial, and adds complexity.  You basically need to look at
 the current, the next, and one more for this, instead of just the
 current char.
 

Actually it not that bad.

Pseudo code:

get current
get next
while (current != null  next != null  current == '0'  next is a
number)
{
current = next
next = get next
}




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Mike Holden
Thomas Martitz wrote:
 After this discussion and the ones in IRC, it seems to me that the
 majority is in favor of ignoring leading zeros. This would also match
 with Nautilus' and Windows Explorer's sorting.

 And we can do that. Give that the usual browsers do it that way, it's
 also what the user expects, so it can't be bad. FS#10031 needs changing
 the algorithm anyway.

 So, should we do that? It at least seems to be the opinion of most people.


Just had a quick look in My Computer on an XP box to see how this does it.

1. 001 sorts before 01 and 01 sorts before 1, giving the conclusion that
more leading zeroes sorts before less leading zeroes, where the underlying
number is the same (i.e. 000x  00x for any number x).

2. 1 sorts before 2, 2 before 10, 10 befoer 11 etc, as expected giving
numeric sorting.

3. 001 sorts before 10 and 010, again giving expected sorting for the
numeric part.

4. Introducing letters into the equation, we can see that 00A sorts before
00aa, 001 and 1. This satisfies my expectation that leading zeroes before
letters should sort first in the list, and not be sorted among the letter
part only.

All of these individual items line up to give a file listing that doesn't
produce any surpries for me, so I would be happy with this set of rules.
-- 
Mike Holden

http://www.by-ang.com - the place to shop for all manner of hand crafted
items, including Jewellery, Greetings Cards and Gifts




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Linus Nielsen Feltzing

Bryan VanDyke wrote:

1. Numbers sort before Non-numbers.
- Leading zeros are striped. A leading zero on a zero is still zero.
- 000 becomes 0.
- Some of the code that has been used has trouble with this.
2. Lesser number before greater.
- 1,2,3,4 etc
3. Anything else strcmp.


Sounds simple and sane, and seems to be the way Windows Explorer works 
as well.


Linus



Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Mike Holden wrote:

Thomas Martitz wrote:
  

After this discussion and the ones in IRC, it seems to me that the
majority is in favor of ignoring leading zeros. This would also match
with Nautilus' and Windows Explorer's sorting.

And we can do that. Give that the usual browsers do it that way, it's
also what the user expects, so it can't be bad. FS#10031 needs changing
the algorithm anyway.

So, should we do that? It at least seems to be the opinion of most people.




Just had a quick look in My Computer on an XP box to see how this does it.

1. 001 sorts before 01 and 01 sorts before 1, giving the conclusion that
more leading zeroes sorts before less leading zeroes, where the underlying
number is the same (i.e. 000x  00x for any number x).

2. 1 sorts before 2, 2 before 10, 10 befoer 11 etc, as expected giving
numeric sorting.

3. 001 sorts before 10 and 010, again giving expected sorting for the
numeric part.

4. Introducing letters into the equation, we can see that 00A sorts before
00aa, 001 and 1. This satisfies my expectation that leading zeroes before
letters should sort first in the list, and not be sorted among the letter
part only.

All of these individual items line up to give a file listing that doesn't
produce any surpries for me, so I would be happy with this set of rules.
  
This is what we'll be doing too. comparing 001 and 1, will yield 001  
1, because if strnatcmp sorts the same, strcmp is asked.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Linus Nielsen Feltzing wrote:

Bryan VanDyke wrote:

1. Numbers sort before Non-numbers.
- Leading zeros are striped. A leading zero on a zero is still zero.
- 000 becomes 0.
- Some of the code that has been used has trouble with this.
2. Lesser number before greater.
- 1,2,3,4 etc
3. Anything else strcmp.


Sounds simple and sane, and seems to be the way Windows Explorer works 
as well.


Linus

Well, this is what SVN does, currently. And if we we want 02 after 1, 
just a (relatively) small modification is needed, without messing up 
decimal numbers like 1.02 and 1.1


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Bryan VanDyke wrote:

Thomas Martitz wrote:
  

Linus Nielsen Feltzing wrote:


Mike Holden wrote:
  

Maybe leading zeros should only be stripped if another digit follows
them?

I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists that I
have created (as opposed to original artist albums), and the leading
zerozero is deliberately there to sort them at the top.


That's an interesting observation. I believe leading zeroes are
treated like whitespace in the current code, but in this case I think
that the final zero should be kept.

Linus
  

That's not trivial, and adds complexity.  You basically need to look at
the current, the next, and one more for this, instead of just the
current char.




Actually it not that bad.

Pseudo code:

get current
get next
while (current != null  next != null  current == '0'  next is a
number)
{
current = next
next = get next
}


  
Now imagine this for every char in a string, and for every string in a 
file list (with some 100 files). It's three-times (or even more) more 
complexity than just.

while (is_zero(a))
   a = next;

We're on embedded, and thus slow systems. Your would surely work well on 
a desktop app, but for mp3-players we need fast and small code. The gain 
has to justify the code, and I don't think it does it in this example.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Thomas Martitz wrote:

Mike Holden wrote:

Thomas Martitz wrote:
 

After this discussion and the ones in IRC, it seems to me that the
majority is in favor of ignoring leading zeros. This would also match
with Nautilus' and Windows Explorer's sorting.

And we can do that. Give that the usual browsers do it that way, it's
also what the user expects, so it can't be bad. FS#10031 needs changing
the algorithm anyway.

So, should we do that? It at least seems to be the opinion of most 
people.





Just had a quick look in My Computer on an XP box to see how this 
does it.


1. 001 sorts before 01 and 01 sorts before 1, giving the conclusion that
more leading zeroes sorts before less leading zeroes, where the 
underlying

number is the same (i.e. 000x  00x for any number x).

2. 1 sorts before 2, 2 before 10, 10 befoer 11 etc, as expected giving
numeric sorting.

3. 001 sorts before 10 and 010, again giving expected sorting for the
numeric part.

4. Introducing letters into the equation, we can see that 00A sorts 
before
00aa, 001 and 1. This satisfies my expectation that leading zeroes 
before
letters should sort first in the list, and not be sorted among the 
letter

part only.

All of these individual items line up to give a file listing that 
doesn't

produce any surpries for me, so I would be happy with this set of rules.
  
This is what we'll be doing too. comparing 001 and 1, will yield 001  
1, because if strnatcmp sorts the same, strcmp is asked.

Err, I guess point 4) isn't covered with my modification.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Bryan VanDyke
Thomas Martitz wrote:
 Bryan VanDyke wrote:
 Thomas Martitz wrote:
  
 Linus Nielsen Feltzing wrote:

 Mike Holden wrote:
  
 Maybe leading zeros should only be stripped if another digit follows
 them?

 I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists
 that I
 have created (as opposed to original artist albums), and the leading
 zerozero is deliberately there to sort them at the top.
 
 That's an interesting observation. I believe leading zeroes are
 treated like whitespace in the current code, but in this case I think
 that the final zero should be kept.

 Linus
   
 That's not trivial, and adds complexity.  You basically need to look at
 the current, the next, and one more for this, instead of just the
 current char.

 

 Actually it not that bad.

 Pseudo code:

 get current
 get next
 while (current != null  next != null  current == '0'  next is a
 number)
 {
 current = next
 next = get next
 }


   
 Now imagine this for every char in a string, and for every string in a
 file list (with some 100 files). It's three-times (or even more) more
 complexity than just.
 while (is_zero(a))
a = next;
 
 We're on embedded, and thus slow systems. Your would surely work well on
 a desktop app, but for mp3-players we need fast and small code. The gain
 has to justify the code, and I don't think it does it in this example.
 

What about something like this. Taking in consideration the isspace
function/comparison was removed? And isdigit is supposed to give nonzero
on nodigit values.

/* skip over leading zeros */
while ('0' == ca  nat_isdigit(ca_next) )
{
ca = to_int(a[++ai]);
ca_next = to_int(a[ai+1]);
}




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Bryan VanDyke wrote:

Thomas Martitz wrote:
  

Bryan VanDyke wrote:


Thomas Martitz wrote:
 
  

Linus Nielsen Feltzing wrote:
   


Mike Holden wrote:
 
  

Maybe leading zeros should only be stripped if another digit follows
them?

I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists
that I
have created (as opposed to original artist albums), and the leading
zerozero is deliberately there to sort them at the top.



That's an interesting observation. I believe leading zeroes are
treated like whitespace in the current code, but in this case I think
that the final zero should be kept.

Linus
  
  

That's not trivial, and adds complexity.  You basically need to look at
the current, the next, and one more for this, instead of just the
current char.




Actually it not that bad.

Pseudo code:

get current
get next
while (current != null  next != null  current == '0'  next is a
number)
{
current = next
next = get next
}


  
  

Now imagine this for every char in a string, and for every string in a
file list (with some 100 files). It's three-times (or even more) more
complexity than just.
while (is_zero(a))
   a = next;

We're on embedded, and thus slow systems. Your would surely work well on
a desktop app, but for mp3-players we need fast and small code. The gain
has to justify the code, and I don't think it does it in this example.




What about something like this. Taking in consideration the isspace
function/comparison was removed? And isdigit is supposed to give nonzero
on nodigit values.

/* skip over leading zeros */
while ('0' == ca  nat_isdigit(ca_next) )
{
ca = to_int(a[++ai]);
ca_next = to_int(a[ai+1]);
}


  
I've found a simpler solution for this. Trying the code raises the 
following problem:


00  0b  01  1

Zeros before except the final zeros are ignored, and the final zero 
before characters is not ignored. But the leading zeros of numbers are 
(so that 01 is 1). Obviously 0 sorts before 1.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Mike Holden
Thomas Martitz wrote:
 Sounds to me that you're better off using ascii sort.

Well that is what I currently use, but there's no reason why natural
sorting shouldn't be appropriate if it works in the right way!

 Windows does it in the same way as nautilus and other major file
 browsers. It ignores leading zeros.

Well it doesn't completely ignore them, they have some significance (see
my other email a short while ago).
-- 
Mike Holden

http://www.by-ang.com - the place to shop for all manner of hand crafted
items, including Jewellery, Greetings Cards and Gifts





Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Bryan VanDyke
Thomas Martitz wrote:
 Bryan VanDyke wrote:
 Thomas Martitz wrote:
  
 Bryan VanDyke wrote:

 Thomas Martitz wrote:
  
  
 Linus Nielsen Feltzing wrote:
   
 Mike Holden wrote:
   
 Maybe leading zeros should only be stripped if another digit follows
 them?

 I use names like 00RockFaves.m3u, 00ClassicRock.m3u for playlists
 that I
 have created (as opposed to original artist albums), and the leading
 zerozero is deliberately there to sort them at the top.
 
 That's an interesting observation. I believe leading zeroes are
 treated like whitespace in the current code, but in this case I think
 that the final zero should be kept.

 Linus
 
 That's not trivial, and adds complexity.  You basically need to
 look at
 the current, the next, and one more for this, instead of just the
 current char.

 
 Actually it not that bad.

 Pseudo code:

 get current
 get next
 while (current != null  next != null  current == '0'  next is a
 number)
 {
 current = next
 next = get next
 }


 
 Now imagine this for every char in a string, and for every string in a
 file list (with some 100 files). It's three-times (or even more) more
 complexity than just.
 while (is_zero(a))
a = next;

 We're on embedded, and thus slow systems. Your would surely work well on
 a desktop app, but for mp3-players we need fast and small code. The gain
 has to justify the code, and I don't think it does it in this example.

 

 What about something like this. Taking in consideration the isspace
 function/comparison was removed? And isdigit is supposed to give nonzero
 on nodigit values.

 /* skip over leading zeros */
 while ('0' == ca  nat_isdigit(ca_next) )
 {
 ca = to_int(a[++ai]);
 ca_next = to_int(a[ai+1]);
 }


   
 I've found a simpler solution for this. Trying the code raises the
 following problem:
 
 00  0b  01  1

That look right. Zero is a valid number. A leading zero before a zero is
still zero.

00 - 0
0b - 0b
01 - 1
1  - 1

01 == 1
strcmp - 01  1

right?



 
 Zeros before except the final zeros are ignored, and the final zero
 before characters is not ignored. But the leading zeros of numbers are
 (so that 01 is 1). Obviously 0 sorts before 1.
 



Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Thomas Martitz wrote:
I've found a simpler solution for this. Trying the code raises the 
following problem:


00  0b  01  1

Zeros before except the final zeros are ignored, and the final zero 
before characters is not ignored. But the leading zeros of numbers are 
(so that 01 is 1). Obviously 0 sorts before 1.



Nautilus has this problem too. I don't know what windows does in this case.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Mike Holden
Thomas Martitz wrote:
 Thomas Martitz wrote:
 I've found a simpler solution for this. Trying the code raises the
 following problem:

 00  0b  01  1

 Zeros before except the final zeros are ignored, and the final zero
 before characters is not ignored. But the leading zeros of numbers are
 (so that 01 is 1). Obviously 0 sorts before 1.


 Nautilus has this problem too. I don't know what windows does in this
 case.


I thought we'd already established that those 4 files are in the right order?

Windows orders them as below, which is the same as above:
00
0b
01
1
-- 
Mike Holden

http://www.by-ang.com - the place to shop for all manner of hand crafted
items, including Jewellery, Greetings Cards and Gifts





Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Mike Holden wrote:

Thomas Martitz wrote:
  

Thomas Martitz wrote:


I've found a simpler solution for this. Trying the code raises the
following problem:

00  0b  01  1

Zeros before except the final zeros are ignored, and the final zero
before characters is not ignored. But the leading zeros of numbers are
(so that 01 is 1). Obviously 0 sorts before 1.
  

Nautilus has this problem too. I don't know what windows does in this
case.




I thought we'd already established that those 4 files are in the right order?

Windows orders them as below, which is the same as above:
00
0b
01
1
  
I didn't establish anything, the mail about nautilus was sent before I 
received yours. But if Nautilus and Windows sort this way, (and we want 
to mimic it), then it's right, indeed.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Dominik Riebeling
On Thu, Mar 19, 2009 at 4:34 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
 I've found a simpler solution for this. Trying the code raises the
 following problem:

 00  0b  01  1
[...]
 Nautilus has this problem too. I don't know what windows does in this case.

I don't see any problem here. You just need to distinguish between
strings and values on sorting:
a. 00 - value 0
b. 0b - value 0, followed by string b
c. 01 - value 1
d. 1 - value 1

so while the strcmp() is the tie-breaker between c. and d.,  sorting
of a. and b. is also rather simple -- you sort by the leading numbers
first. This makes a. and b. come before the others. Then, as a. and b.
are a starting with zero-group you have to resort that again as
there is a tie with the numbers. Thus b. comes after a. That's how
ASCII-sorting would do it (and also how windows explorer does it).

You can't simply sort by leading numbers and ignore that the string
has other characters in it too.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Thomas Martitz wrote:


Ok, I've implemented ignoring very leading zeros now, and fixed 
FS#10031, in my local repo. It could be committed, I think. It seems 
the consensus is reached.


Alternatively we can also think about ignoring chars like . and _ (and 
possibly more) in the beginning of a file name (e.g. .rockbox is 
sorted under r). Just an idea. It doesn't really add complexity, but 
would definitely do more than the setting advertises. But, this is 
also something windows/nautilus/more do.


I've uploaded the patch to FS#10030.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Dominik Riebeling
On Thu, Mar 19, 2009 at 4:04 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
 Now imagine this for every char in a string, and for every string in a file
 list (with some 100 files). It's three-times (or even more) more complexity
 than just.
 while (is_zero(a))
   a = next;

This natural sorting is more complex than ASCII sorting anyway. And
what's the added complexity by simply going back by one to get the
last 0? That's only an added check, and everything else only if you
hit a digit.

if(is_digit(a)) {
/* is_digit() had a hit */
while(is_zero(a))
a = next;
/* if current one is not a digit anymore we've just skipped the
value 0 and need to take that back to not remove that value */
if(!is_digit(a)) /* no need to check if there is a prev value --
we skipped at least one 0. If not we still have a digit. */
a = prev;
}

 We're on embedded, and thus slow systems. Your would surely work well on a
 desktop app, but for mp3-players we need fast and small code. The gain has
 to justify the code, and I don't think it does it in this example.

We're talking about high-level functionality here. There's nothing
timing-critical, and even on the Archos players I'm confident doing
this properly wouldn't cause a serious slowdown compared to the
current state, but feel free to measure it and present numbers. We can
play mp3 files at 45MHz (at least on coldfire, don't have all the
numbers at hand right now) which is much more calculating-intensive
than doing a few additional comparisons on a list of maybe some 100
files.

You're basically saying we shouldn't fix the functionality because
it's too expensive runtime-wise. If it's really too expensive to have
a functionality working properly (which I doubt) we shouldn't ship the
functionality at all.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Dominik Riebeling
On Thu, Mar 19, 2009 at 9:21 AM, Paul Louden paulthen...@gmail.com wrote:
 I've stated my position several times: I think we should decide whether we
 want to mimic the file browsers or not. If we do, I think we should mimic
 all their sorting quirks that we can, rather than suggest we're like them
 but with our own choices as to where to go our own way.

Well, I think we still have two options here:
1. completely mimic the file browser. In this case I agree with you
that we should mimic all quirks the browser used as reference has.
2. only mimic the browser in regards of numbers (or any other subset).
I think this is a viable alternative, though not all might agree.
Doing it this way we're not like the reference browser but simply
doing a (the most commonly noticed?) subset. If we chose to minic a
browser completely we immediately come across the question of which
browser to use as reference. I'm quite sure Explorer / Konqueror /
Nautilus behave differently in regards of prefixes like space, dot and
underscore. Which is a reason why I'd go for mimicing the part of this
natural sorting that's common among them.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Dominik Riebeling wrote:

On Thu, Mar 19, 2009 at 4:04 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
  

Now imagine this for every char in a string, and for every string in a file
list (with some 100 files). It's three-times (or even more) more complexity
than just.
while (is_zero(a))
  a = next;



This natural sorting is more complex than ASCII sorting anyway. And
what's the added complexity by simply going back by one to get the
last 0? That's only an added check, and everything else only if you
hit a digit.

if(is_digit(a)) {
/* is_digit() had a hit */
while(is_zero(a))
a = next;
/* if current one is not a digit anymore we've just skipped the
value 0 and need to take that back to not remove that value */
if(!is_digit(a)) /* no need to check if there is a prev value --
we skipped at least one 0. If not we still have a digit. */
a = prev;
}

  

That's how did it now (see FS#10030).

We're on embedded, and thus slow systems. Your would surely work well on a
desktop app, but for mp3-players we need fast and small code. The gain has
to justify the code, and I don't think it does it in this example.



We're talking about high-level functionality here. There's nothing
timing-critical, and even on the Archos players I'm confident doing
this properly wouldn't cause a serious slowdown compared to the
current state, but feel free to measure it and present numbers. We can
play mp3 files at 45MHz (at least on coldfire, don't have all the
numbers at hand right now) which is much more calculating-intensive
than doing a few additional comparisons on a list of maybe some 100
files.

You're basically saying we shouldn't fix the functionality because
it's too expensive runtime-wise. If it's really too expensive to have
a functionality working properly (which I doubt) we shouldn't ship the
functionality at all.


 - Dominik
  


The problem with his proposal was, that it looked 3 times at every char. 
That's hardly optimal.


The original algorithm looks only once through each char. And the 
version you proposed does it too (except in the case where it goes back 
1 char). That's why it's not much more expensive than normal strcmp.


And I think that a file-listing is relatively timing critical. I 
wouldn't want to have noticeably delay just due to sorting on each 
folder I enter.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread codemonkey

Are you guys aware that there's a quasi-standard regarding this in
the GNU libraries?  See the following excerpt from Fedora info ls
and man strverscmp.

~ray

PS: I've found that ls -v works well for sorting MP3s with track
numbering, etc.  I don't know if it handles all of the cases described in
this thread though.  Maybe GNU's implementation is worth borrowing for
rockbox?

--

$ info ls

(...excerpt...)

10.1.4 More details about version sort
--

The version sort takes into account the fact that file names frequently
include indices or version numbers.  Standard sorting functions usually
do not produce the ordering that people expect because comparisons are
made on a character-by-character basis.  The version sort addresses
this problem, and is especially useful when browsing directories that
contain many files with indices/version numbers in their names:

 $ ls -1$ ls -1v
 foo.zml-1.gz   foo.zml-1.gz
 foo.zml-100.gz foo.zml-2.gz
 foo.zml-12.gz  foo.zml-6.gz
 foo.zml-13.gz  foo.zml-12.gz
 foo.zml-2.gz   foo.zml-13.gz
 foo.zml-25.gz  foo.zml-25.gz
 foo.zml-6.gz   foo.zml-100.gz

   Note also that numeric parts with leading zeros are considered as
fractional one:

 $ ls -1$ ls -1v
 abc-1.007.tgz  abc-1.007.tgz
 abc-1.012b.tgz abc-1.01a.tgz
 abc-1.01a.tgz  abc-1.012b.tgz

   This functionality is implemented using the `strverscmp' function.

--

$ man strverscmp

STRVERSCMP(3)  Linux Programmer’s Manual
STRVERSCMP(3)

NAME
   strverscmp - compare two version strings

SYNOPSIS
   #define _GNU_SOURCE
   #include string.h

   int strverscmp(const char *s1, const char *s2);

DESCRIPTION
   Often  one  has  files  jan1,  jan2, ..., jan9, jan10, ...  and it
feels
   wrong when ls(1) orders them jan1, jan10, ..., jan2, ..., jan9.  In
   order to rectify this, GNU introduced the -v option to ls(1), which
is
   implemented using versionsort(3), which again uses strverscmp().

   Thus,  the  task  of  strverscmp() is to compare two strings and
find
   the right order, while strcmp(3) only finds the lexicographic
order.
   This function does not use the locale category LC_COLLATE, so  is 
meant
   mostly  for  situations where the strings are expected to be in
ASCII.

   What  this  function  does is the following.  If both strings are
equal,
   return 0.  Otherwise find the position between two bytes with the
   property that before it both strings are equal, while directly after
it
   there is a difference.  Find the largest consecutive digit strings
   containing (or starting at, or ending at) this position.  If one or
both
   of these is empty, then return what strcmp(3) would have returned
   (numerical ordering of byte values).  Otherwise, compare  both digit
   strings numerically, where digit strings with one or more leading
zeroes
   are interpreted as if they have a decimal point in front (so that in
   particular digit strings with more leading zeroes come before digit
   strings  with  fewer leading zeroes).  Thus, the ordering is 000,
00,
   01, 010, 09, 0, 1, 9, 10.

RETURN VALUE
   The strverscmp() function returns an integer less than, equal to, or
   greater than zero if s1 is found, respectively, to be earlier than, 
   equal to, or later than s2.

CONFORMING TO
   This function is a GNU extension.

SEE ALSO
   rename(1), strcasecmp(3), strcmp(3), strcoll(3),
feature_test_macros(7)

GNU   2001-12-19
STRVERSCMP(3)



Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

codemonkey wrote:

Are you guys aware that there's a quasi-standard regarding this in
the GNU libraries?  See the following excerpt from Fedora info ls
and man strverscmp.

~ray

PS: I've found that ls -v works well for sorting MP3s with track
numbering, etc.  I don't know if it handles all of the cases described in
this thread though.  Maybe GNU's implementation is worth borrowing for
rockbox?

--

$ info ls

(...excerpt...)

10.1.4 More details about version sort
--

The version sort takes into account the fact that file names frequently
include indices or version numbers.  Standard sorting functions usually
do not produce the ordering that people expect because comparisons are
made on a character-by-character basis.  The version sort addresses
this problem, and is especially useful when browsing directories that
contain many files with indices/version numbers in their names:

 $ ls -1$ ls -1v
 foo.zml-1.gz   foo.zml-1.gz
 foo.zml-100.gz foo.zml-2.gz
 foo.zml-12.gz  foo.zml-6.gz
 foo.zml-13.gz  foo.zml-12.gz
 foo.zml-2.gz   foo.zml-13.gz
 foo.zml-25.gz  foo.zml-25.gz
 foo.zml-6.gz   foo.zml-100.gz

   Note also that numeric parts with leading zeros are considered as
fractional one:

 $ ls -1$ ls -1v
 abc-1.007.tgz  abc-1.007.tgz
 abc-1.012b.tgz abc-1.01a.tgz
 abc-1.01a.tgz  abc-1.012b.tgz

   This functionality is implemented using the `strverscmp' function.

--

$ man strverscmp

STRVERSCMP(3)  Linux Programmer’s Manual
STRVERSCMP(3)


NAME
   strverscmp - compare two version strings

SYNOPSIS
   #define _GNU_SOURCE
   #include string.h

   int strverscmp(const char *s1, const char *s2);

DESCRIPTION
   Often  one  has  files  jan1,  jan2, ..., jan9, jan10, ...  and it
feels
   wrong when ls(1) orders them jan1, jan10, ..., jan2, ..., jan9.  In
   order to rectify this, GNU introduced the -v option to ls(1), which
is
   implemented using versionsort(3), which again uses strverscmp().

   Thus,  the  task  of  strverscmp() is to compare two strings and
find
   the right order, while strcmp(3) only finds the lexicographic
order.
   This function does not use the locale category LC_COLLATE, so  is 
meant

   mostly  for  situations where the strings are expected to be in
ASCII.

   What  this  function  does is the following.  If both strings are
equal,
   return 0.  Otherwise find the position between two bytes with the
   property that before it both strings are equal, while directly after
it
   there is a difference.  Find the largest consecutive digit strings
   containing (or starting at, or ending at) this position.  If one or
both
   of these is empty, then return what strcmp(3) would have returned
   (numerical ordering of byte values).  Otherwise, compare  both digit
   strings numerically, where digit strings with one or more leading
zeroes
   are interpreted as if they have a decimal point in front (so that in
   particular digit strings with more leading zeroes come before digit
   strings  with  fewer leading zeroes).  Thus, the ordering is 000,
00,
   01, 010, 09, 0, 1, 9, 10.

RETURN VALUE
   The strverscmp() function returns an integer less than, equal to, or
   greater than zero if s1 is found, respectively, to be earlier than, 
   equal to, or later than s2.


CONFORMING TO
   This function is a GNU extension.

SEE ALSO
   rename(1), strcasecmp(3), strcmp(3), strcoll(3),
feature_test_macros(7)

GNU   2001-12-19
STRVERSCMP(3)


  

Sounds exactly like strnatcmp. It behaves the same for the two examples.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Bryan VanDyke
codemonkey wrote:
 Are you guys aware that there's a quasi-standard regarding this in
 the GNU libraries?  See the following excerpt from Fedora info ls
 and man strverscmp.
 
 ~ray
 
 PS: I've found that ls -v works well for sorting MP3s with track
 numbering, etc.  I don't know if it handles all of the cases described in
 this thread though.  Maybe GNU's implementation is worth borrowing for
 rockbox?
 
 --
 
 $ info ls
 
 (...excerpt...)
 
 10.1.4 More details about version sort
 --
 
 The version sort takes into account the fact that file names frequently
 include indices or version numbers.  Standard sorting functions usually
 do not produce the ordering that people expect because comparisons are
 made on a character-by-character basis.  The version sort addresses
 this problem, and is especially useful when browsing directories that
 contain many files with indices/version numbers in their names:
 
  $ ls -1$ ls -1v
  foo.zml-1.gz   foo.zml-1.gz
  foo.zml-100.gz foo.zml-2.gz
  foo.zml-12.gz  foo.zml-6.gz
  foo.zml-13.gz  foo.zml-12.gz
  foo.zml-2.gz   foo.zml-13.gz
  foo.zml-25.gz  foo.zml-25.gz
  foo.zml-6.gz   foo.zml-100.gz
 
Note also that numeric parts with leading zeros are considered as
 fractional one:
 
  $ ls -1$ ls -1v
  abc-1.007.tgz  abc-1.007.tgz
  abc-1.012b.tgz abc-1.01a.tgz
  abc-1.01a.tgz  abc-1.012b.tgz
 
This functionality is implemented using the `strverscmp' function.
 
 --
 
 $ man strverscmp
 
 STRVERSCMP(3)  Linux Programmer’s Manual
 STRVERSCMP(3)
 
 NAME
strverscmp - compare two version strings
 
 SYNOPSIS
#define _GNU_SOURCE
#include string.h
 
int strverscmp(const char *s1, const char *s2);
 
 DESCRIPTION
Often  one  has  files  jan1,  jan2, ..., jan9, jan10, ...  and it
 feels
wrong when ls(1) orders them jan1, jan10, ..., jan2, ..., jan9.  In
order to rectify this, GNU introduced the -v option to ls(1), which
 is
implemented using versionsort(3), which again uses strverscmp().
 
Thus,  the  task  of  strverscmp() is to compare two strings and
 find
the right order, while strcmp(3) only finds the lexicographic
 order.
This function does not use the locale category LC_COLLATE, so  is 
 meant
mostly  for  situations where the strings are expected to be in
 ASCII.
 
What  this  function  does is the following.  If both strings are
 equal,
return 0.  Otherwise find the position between two bytes with the
property that before it both strings are equal, while directly after
 it
there is a difference.  Find the largest consecutive digit strings
containing (or starting at, or ending at) this position.  If one or
 both
of these is empty, then return what strcmp(3) would have returned
(numerical ordering of byte values).  Otherwise, compare  both digit
strings numerically, where digit strings with one or more leading
 zeroes
are interpreted as if they have a decimal point in front (so that in
particular digit strings with more leading zeroes come before digit
strings  with  fewer leading zeroes).  Thus, the ordering is 000,
 00,
01, 010, 09, 0, 1, 9, 10.
 
 RETURN VALUE
The strverscmp() function returns an integer less than, equal to, or
greater than zero if s1 is found, respectively, to be earlier than, 
equal to, or later than s2.
 
 CONFORMING TO
This function is a GNU extension.
 
 SEE ALSO
rename(1), strcasecmp(3), strcmp(3), strcoll(3),
 feature_test_macros(7)
 
 GNU   2001-12-19
 STRVERSCMP(3)
 
 

Seems very close. My understanding is natural sort would interpret as:
000, 00, 0, 01, 1, 09, 9, 010, 10.




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Dominik Riebeling
On Thu, Mar 19, 2009 at 6:47 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
 The problem with his proposal was, that it looked 3 times at every char.
 That's hardly optimal.

Which doesn't imply that it can't be done better. An inefficient
solution is therefore no reasoning against fixing a feature.

 And I think that a file-listing is relatively timing critical. I wouldn't
 want to have noticeably delay just due to sorting on each folder I enter.

Then you need to define what time-critical means. For me, this is
interrupting some real-time process (playback, communication with a
chip on a bus, data corruption etc). A file browser should be fast,
but it's definitely not time-critical -- nothing bad will happen if it
takes slightly longer. Except maybe the user getting annoyed. But I'd
rather call that time-relevant. It's not critical.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Thomas Martitz

Thomas Martitz wrote:


I've uploaded the patch to FS#10030.
Any final comments? If not, I consider to commit it before release. I'm 
not sure whether it should be backported too, though.


Please also read the recent comments on the task.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Paul Louden

Thomas Martitz wrote:

Thomas Martitz wrote:


I've uploaded the patch to FS#10030.
Any final comments? If not, I consider to commit it before release. 
I'm not sure whether it should be backported too, though.


Please also read the recent comments on the task.
In my opinion, we should disable the option and revert to basic ASCII 
sort for this release, and make natural sorting a feature of the next 
one. We shouldn't change the default sort method in a release version 
until we have a more or less final algorithm, and it certainly seems 
like even outside of my own objections, there's still several opinions 
on how this should go.


People have got by with ASCII before, they can wait with it 3 more 
months (or use current builds) until we've settled our algorithm issue.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Al Le

On 19.03.2009 00:13, Paul Louden wrote:

Al Le wrote:


My personal position is also that if a user adds a 0 before a number, 
they expect it to change something, rather than being ignored. I think, 
on average, more 0s (in lists meant to be sorted) will be intentional 
than accidental.


Paul, I think we can agree that there are different cases. There are 
cases where a leading zero is intentional and there are cases where it's 
just there (because you used a wrong setting in the ripping software or 
because you copied the file from somewhere else). The problem is that a 
single natural sort won't fit all. Maybe we should have two natural 
sort procedures? One would ignore the leading zeroes, i.e. just consider 
numbers as in mathematics (it would put 007 after 6) and the other 
wouldn't (it would put 007 before 6).


The major file browsers (since produced by a techies :-) operate on just 
numbers, without special treatment of the leading zeroes.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Paul Louden

Al Le wrote:


Paul, I think we can agree that there are different cases. There are 
cases where a leading zero is intentional and there are cases where 
it's just there (because you used a wrong setting in the ripping 
software or because you copied the file from somewhere else). The 
problem is that a single natural sort won't fit all. Maybe we should 
have two natural sort procedures? One would ignore the leading zeroes, 
i.e. just consider numbers as in mathematics (it would put 007 after 
6) and the other wouldn't (it would put 007 before 6).
Neither of your described cases would result in a mix of leading zeros 
and no leading zeros though. If you set the wrong setting, all your 
files would have leading zeros and still sort fine. So what's the problem?


This is what I don't get - nobody's described a real case where 
acknowledging leading zeros causes a *bad* sort except the one mix 
folder case where the user chooses to rename some, but not all, of his 
files.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-19 Thread Marianne Arnold

Thomas Martitz wrote:

Alternatively we can also think about ignoring chars like . and _ (and 
possibly more) in the beginning of a file name (e.g. .rockbox is sorted 
under r). Just an idea. It doesn't really add complexity, but would 
definitely do more than the setting advertises. But, this is also 
something windows/nautilus/more do.




Huh? Windows Explorer at least *does not* ignore _ and does not sort _something among the s but puts it at the top before a, even before numbers. The .rockbox folder is also sorted above the Music folder in my simdisk directory... 


It also respects number of spaces and *does not* collapse it to one. Resulting 
order in a small test:
!  Test
! A
If it would ignore the second space it would sort !  Test last. 


In reply to an earlier mail here, I am one who occasionally (ab)uses this function of 
sorting to put temporary data at the top, somwhere outside the rest of the list. This is 
not specific to Rockbox because I don't use it for my music collection but for other 
files on my computer. I like the described way for file/directory names starting with 
special characters and think they should be treated like this in Rockbox, too (as they 
are currently?). And as you said yourself, adding this will do more than 
advertised; the same thing applies to spaces as well (as Dominik pointed 
out), so the setting either has a wrong name or should be fixed in this regard.

My preference would also be to treat leading zeros as intentional, just looking at a list in 
explorer, 01, 02, 3, 04 seemed weird to me - guess it's the same effect Paul described 
and reported about his friends. About the mathematical rule: that's true without a doubt but if I 
read 04 + 03 = 07, I would suspect some weird reasoning behind (something intentional 
that I only don't know about). One just doesn't write a leading zero if (s)he doesn't have to.

To summarise: first a strong wish that more than one space is treated as such and no ignoring of 
even more special chars - also to comply with the setting name. Perhaps don't ignore 
leading zeros, although I could understand the reason all major file browsers do, so should 
we; so far I found Paul's example in favour of not ignoring them more realistic though.


Regards, Marianne.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Dominik Riebeling wrote:

A situation
where a folder can contain files starting with 02 and 4 the same
time is something that could happen and still not being intentional
(just think of copying files from various albums to a mix folder).
  
I agree that it could be unintentional, but disagree that numeric 
sorting matters in this case - if you have a mix of random songs, why 
does 03 need to be between 2 and 4? Meanwhile, if someone intentionally 
prefixes something with a 0, they intend for it to be first, so it 
should be. This sounds like a case of  let's make the unimportant case 
work one way, while choosing to break the case where people make 
intentional changes.

Either treat digits as number or don't treat them as numbers at all.
- Spaces shouldn't get collapsed. A space is a space, and interpret
numbers doesn't tell anything about spaces. At least at some point
during the lifetime of this setting spaces were collapsed. Nothing
that is a number ...
  
This is originally based on a natural sorting algorithm, which does a 
lot more than numbers it seems. My understanding was the original intent 
was to simply fix 1, 10, 2, 3, 4 into 1, 2, 3, 4, 10. I don't see why 
this *should* ignore leading zeros. I don't think we should ever assume 
any part of a filename is unintentional. I think assuming numbers are 
written as a human normally does is fine (1, 2, 3, 10, 11, 12) but if 
someone chooses to add something to alter sorting we should still 
respect it. You don't accidentally add a 0, and if there are random 
zeros in a mix folder the order of playback almost certainly isn't meant 
to be 2, 03, 4, but rather whatever order if they just chose to 
haphazardly mix them.


Also, which comes first: 001 or 01? If we're going to recognize that 001 
has one more zero than 01, why don't we recognize that 00number has more 
zeros than 0number, even if the two numbers are different?

This still leaves some open issues I'm not sure how to deal about:
- how are floating-point numbers to be treated? 1.001 is smaller as
1.01 when treating as numbers, so on the one hand I'd expect them to
sort that way. On the other hand, recognizing the dot as decimal
separator is broken as well -- not all languages use it as decimal
separator (like german using the comma). Stopping the number-treating
at dots is also kinda broken -- how should a naming be handled as
discnumber.tracknumber, i.e. like 1.2, 1.10 -- which one has to
be sorted first? The best solution here might be to treat all numbers
as single numbers, regardless if they might be floating point numbers
-- I guess it's more common to have a 1.3 numbering to mark
discnumber.track instead of a floating point number 1.003.

  
I expect people will number disks 1.01 - 1.12 rather than 1.1, 1.2, 
1.10, 1.12. This is only my personal assumption, but if that's the case, 
our current method works for it.

I'm pretty sure I've missed some of my points right now :) What do
people think about this sorting thing?

  
Well, we're currently using an existing algorithm. One that *may* be 
used in other FLOSS (I don't know, and haven't investigated). To me, the 
two sides of the argument are basically do we want to use it as-is, 
such that our sorted lists look the same as lists in other applications, 
or do we want to define our own rules for 'natural' list sorting? Of 
course, this is dependent upon research I haven't done (specifically, do 
any other applications use this sort algorithm).


Maybe we should just see if various FLOSS file browsers have a common 
natural sort, and use it, so that our files are likely to show up in 
the host's browser the same order as they show up in ours?


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 10:22 PM, Paul Louden paulthen...@gmail.com wrote:
 I agree that it could be unintentional, but disagree that numeric sorting
 matters in this case - if you have a mix of random songs, why does 03 need
 to be between 2 and 4? Meanwhile, if someone intentionally prefixes
 something with a 0, they intend for it to be first, so it should be. This
 sounds like a case of  let's make the unimportant case work one way, while
 choosing to break the case where people make intentional changes.

You have a point, but if we assume people to add leading zeros
*intentionally* can't we assume those people to number their files
correctly in the first place, thus not having any need for this
natural sorting anyway? I'm pretty confident that there are users
careless enough to name files 3, 05, 7 and expecting them to get
sorted as 3, 05, 7. If we consider leading zeros as intentional
do we need this strnatcmp at all? If we not skip leading zeros are we
treating digits as numbers at all? I wouldn't say so.
Besides, couldn't that also create a sorting like 10, 13, 02,
04 as 0 is a character here and thus sorted separately? Such a
sorting would be wrong if one names his files always using leading
zeros, especially if numbers are always sorted first.

 ignore leading zeros. I don't think we should ever assume any part of a
 filename is unintentional. I think assuming numbers are written as a human

in that case why do we need to add additional brain to the naming
the user chose?

 normally does is fine (1, 2, 3, 10, 11, 12) but if someone chooses to add
 something to alter sorting we should still respect it. You don't

Does alter the sorting require you to use digits? At least I usually
prepend character if I want something to get sorted at the top or
bottom or indexes like a, b after the leading number.

 accidentally add a 0, and if there are random zeros in a mix folder the
 order of playback almost certainly isn't meant to be 2, 03, 4, but rather
 whatever order if they just chose to haphazardly mix them.

Ok, that was a bad example :)

 Also, which comes first: 001 or 01? If we're going to recognize that 001 has
 one more zero than 01, why don't we recognize that 00number has more zeros
 than 0number, even if the two numbers are different?

well, in that case (as both strings will evaluate to the number 1) a
strcmp would be in place to break the tie. It's a corner-case as both
numbers are identical (and 00 isn't worth more than 0, is it?).
Thus I don't think this is much of an issue as long as it is
deterministic.

 I expect people will number disks 1.01 - 1.12 rather than 1.1, 1.2, 1.10,
 1.12. This is only my personal assumption, but if that's the case, our
 current method works for it.

I'd expect the same too, but I'm among the people that don't need
strnatcmp anyway as I properly name my files ;-)

 Maybe we should just see if various FLOSS file browsers have a common
 natural sort, and use it, so that our files are likely to show up in the
 host's browser the same order as they show up in ours?

Good point. Though if we only address the naming issue we kinda create
our own sorting, don't we?


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le

On 18.03.2009 22:22, Paul Louden wrote:

Dominik Riebeling wrote:

A situation
where a folder can contain files starting with 02 and 4 the same
time is something that could happen and still not being intentional
(just think of copying files from various albums to a mix folder).
  
I agree that it could be unintentional, but disagree that numeric 
sorting matters in this case - if you have a mix of random songs, why 
does 03 need to be between 2 and 4? Meanwhile, if someone intentionally 
prefixes something with a 0, they intend for it to be first


I think we can't tell for sure what's intentional and what's not. All we 
have is a bunch of files, and it's not our task to infer how it has come 
to it. Guessing what the intention was is an intelligence of a higher 
degree than the natsort!




Either treat digits as number or don't treat them as numbers at all.


I'm absolutely with you, Dominik. This way the thing the algorithm does 
can be captured in few words, which we accurately did in the setting 
names (Interpret numbers as ...). Special treatment of leading zeroes, 
spaces and dots is too much for a usual human being to understand.



Also, which comes first: 001 or 01?


If strnatcmp tells two strings are equal then strcmp is called which 
always delivers a perfectly predictable result.


Well, we're currently using an existing algorithm. One that *may* be 
used in other FLOSS (I don't know, and haven't investigated). To me, the 
two sides of the argument are basically do we want to use it as-is, 
such that our sorted lists look the same as lists in other applications, 
or do we want to define our own rules for 'natural' list sorting?


I'd opt for the latter, because it's easy to understand.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Dominik Riebeling wrote:

On Wed, Mar 18, 2009 at 10:22 PM, Paul Louden paulthen...@gmail.com wrote:
  

I agree that it could be unintentional, but disagree that numeric sorting
matters in this case - if you have a mix of random songs, why does 03 need
to be between 2 and 4? Meanwhile, if someone intentionally prefixes
something with a 0, they intend for it to be first, so it should be. This
sounds like a case of  let's make the unimportant case work one way, while
choosing to break the case where people make intentional changes.



You have a point, but if we assume people to add leading zeros
*intentionally* can't we assume those people to number their files
correctly in the first place, thus not having any need for this
natural sorting anyway? I'm pretty confident that there are users
careless enough to name files 3, 05, 7 and expecting them to get
sorted as 3, 05, 7. If we consider leading zeros as intentional
do we need this strnatcmp at all? If we not skip leading zeros are we
treating digits as numbers at all? I wouldn't say so.
Besides, couldn't that also create a sorting like 10, 13, 02,
04 as 0 is a character here and thus sorted separately? Such a
sorting would be wrong if one names his files always using leading
zeros, especially if numbers are always sorted first.

  
You really believe a person, while naming files, would number them 3, 
05, 7? Why would they add the 0 onto the 5 one if they're already 
typing single-digit numbers?


Can you give me a realistic case where someone wants their files in the 
order 3, 05, 7 and has named them themselves?


We were fixing, previously, the case where people had chosen names of 1, 
2, 3, 4, 5, 6, 7, 8, 9, 10 and not known they *must* use leading zeros 
to prevent bad sorting. But I don't think it's a fair assumption to go 
as far as saying when they add a character, they didn't mean to type it.

Does alter the sorting require you to use digits? At least I usually
prepend character if I want something to get sorted at the top or
bottom or indexes like a, b after the leading number.

  
A doesn't come before 1 so a1 to come before 1 doesn't work. 01 
could, except we would be preventing it. I don't see why we should force 
people not to use zeros.

Good point. Though if we only address the naming issue we kinda create
our own sorting, don't we?


  

I don't know what you mean by only address the naming issue.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Al Le wrote:


Well, we're currently using an existing algorithm. One that *may* 
be used in other FLOSS (I don't know, and haven't investigated). To 
me, the two sides of the argument are basically do we want to use it 
as-is, such that our sorted lists look the same as lists in other 
applications, or do we want to define our own rules for 'natural' 
list sorting?


I'd opt for the latter, because it's easy to understand.
It sorts in this strange way we've made up is easier to understand 
than It sorts like Nautilus, FileBrowserX and FileBrowserY which you 
may already use?




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le

On 18.03.2009 23:02, Paul Louden wrote:

Al Le wrote:


Well, we're currently using an existing algorithm. One that *may* 
be used in other FLOSS (I don't know, and haven't investigated). To 
me, the two sides of the argument are basically do we want to use it 
as-is, such that our sorted lists look the same as lists in other 
applications, or do we want to define our own rules for 'natural' 
list sorting?


I'd opt for the latter, because it's easy to understand.
It sorts in this strange way we've made up is easier to understand 
than It sorts like Nautilus, FileBrowserX and FileBrowserY which you 
may already use?


Yes, it's easier because it's a simple rule. If the browsers use a 
complicated logic (which may change with a release) how would you 
describe this for Rockbox? It does it like BrowserX? The next question 
would be then and how does BrowserX do it? I'd rather have an 
absolute than a relative definition.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Al Le wrote:


Yes, it's easier because it's a simple rule. If the browsers use a 
complicated logic (which may change with a release) how would you 
describe this for Rockbox? It does it like BrowserX? The next 
question would be then and how does BrowserX do it? I'd rather have 
an absolute than a relative definition.

We already have an absolute: ASCII sort.

Our natural sorting is entirely a mishmash of rules as to how numbers 
should be treated, and other characters.


For example, is 1.001 one point zero zero one, or disk one track one, 
or one thousand and one?


If we make up our own non-standard way, yes, we can describe it. We can 
a few paragraphs in the manual detailing how people can expect their 
files to be sorted, since no other program does it like we do. Or we can 
use a standard way, describe it in the manual anyway, and have most 
people *not* need to look it up in the manual because the list is the 
same as they usually see.


I don't see why we can describe it is a reason to use our own method - 
we can describe methods we get the code for from elsewhere too.




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Am 18.03.2009 22:22, schrieb Paul Louden:

Dominik Riebeling wrote:

A situation
where a folder can contain files starting with 02 and 4 the same
time is something that could happen and still not being intentional
(just think of copying files from various albums to a mix folder).
I agree that it could be unintentional, but disagree that numeric 
sorting matters in this case - if you have a mix of random songs, why 
does 03 need to be between 2 and 4? Meanwhile, if someone 
intentionally prefixes something with a 0, they intend for it to be 
first, so it should be. This sounds like a case of  let's make the 
unimportant case work one way, while choosing to break the case where 
people make intentional changes.


I agree with Llorean, this case is purely personal preferences I think.
03 and 2 in the same folder, could be accidental (from mixed albums), or
intentional. We sort it, but I'm not sure if there's really *one*
correct way for this case.

People can rely on ommitting leading zeros now since we can sort it
correctly numerically. That makes me think that any leading zero may
very well be intended.

Either treat digits as number or don't treat them as numbers at all.
- Spaces shouldn't get collapsed. A space is a space, and interpret
numbers doesn't tell anything about spaces. At least at some point
during the lifetime of this setting spaces were collapsed. Nothing
that is a number ...

I don't see what's wrong with ignoring spaces. It's obvious that spaces
aren't real part of the names when it comes to sorting (as in 1 and 2
spaces should be sorted differently).
Why would anyone want to sort by spaces anyway? This doesn't make any
sense to me.
But yes, the option doesn't tell about that. Should we change the
description, or how it's working?

This still leaves some open issues I'm not sure how to deal about:
- how are floating-point numbers to be treated? 1.001 is smaller as
1.01 when treating as numbers, so on the one hand I'd expect them to
sort that way. On the other hand, recognizing the dot as decimal
separator is broken as well -- not all languages use it as decimal
separator (like german using the comma). Stopping the number-treating
at dots is also kinda broken -- how should a naming be handled as
discnumber.tracknumber, i.e. like 1.2, 1.10 -- which one has to
be sorted first? The best solution here might be to treat all numbers
as single numbers, regardless if they might be floating point numbers
-- I guess it's more common to have a 1.3 numbering to mark
discnumber.track instead of a floating point number 1.003.

I expect people will number disks 1.01 - 1.12 rather than 1.1, 1.2, 
1.10, 1.12. This is only my personal assumption, but if that's the 
case, our current method works for it.


Decimal numbers and discnumber.tracknumber works with the current svn.
1.1 is sorted after 1.01, as well as 1.10 is sorted after 1.1 (or 1.2).
And it doesn't take the dot specially as seperator, but any non-digit,
so it will work for commata too.


I'm pretty sure I've missed some of my points right now :) What do
people think about this sorting thing?

Well, we're currently using an existing algorithm. One that *may* be 
used in other FLOSS (I don't know, and haven't investigated). To me, 
the two sides of the argument are basically do we want to use it 
as-is, such that our sorted lists look the same as lists in other 
applications, or do we want to define our own rules for 'natural' list 
sorting? Of course, this is dependent upon research I haven't done 
(specifically, do any other applications use this sort algorithm).


Maybe we should just see if various FLOSS file browsers have a common 
natural sort, and use it, so that our files are likely to show up in 
the host's browser the same order as they show up in ours?


If you search for logs, we had a discussion yesterday starting here:
http://www.rockbox.org/irc/log-20090317#17:53:35 and today starting
here: http://www.rockbox.org/irc/log-20090318#19:25:26
Both are Flyspray-bugreport induced, and I can't remember another
discussion other than those before the initial commit.

I think the only remaining problem is FS#10031, which would be an
relatively easy fix (it sorts filenames starting with chars between 'Z'
and 'a' differently than normal strcmp, regardless of numbers in the
name, because it uses toupper instead of tolower for case-insensitive
sorting), but it would require to leave the path of using the original
algorithm without changing again.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:01 PM, Paul Louden paulthen...@gmail.com wrote:
 You really believe a person, while naming files, would number them 3, 05,
 7? Why would they add the 0 onto the 5 one if they're already typing
 single-digit numbers?

Just consider this scenario: someone is creating a mix folder by
copying various files to it. He wants a given order. Now some files
are named with 0 prefixes, others not. He decides to not use 0
prefixes, possibly because of laziness, but leaves them on the files
he doesn't need to change at all -- if 05 is to become track 5 he
doesn't need to change anything. As for track 09 which should become
3 he'd use 3. Too far-fetched?

 We were fixing, previously, the case where people had chosen names of 1, 2,
 3, 4, 5, 6, 7, 8, 9, 10 and not known they *must* use leading zeros to
 prevent bad sorting. But I don't think it's a fair assumption to go as far
 as saying when they add a character, they didn't mean to type it.

well, treating a number as such includes stripping leading zeros from
it, at least from my understanding. It won't do any harm on properly
named files, and I don't see a reason why a user would want to prefix
with 0 just to change sorting. We're restricting digit-postfixes to
numbers this way (which most people will consider less problematic but
we still restrict the users: 01, 02, 020, 03 won't work
anymore while it does when sorting ASCII. The user will still think of
2-digit numbers).

In any case, it would be interesting to see how windows does the
sorting, as the most users will be used to that way of doing it.

 A doesn't come before 1 so a1 to come before 1 doesn't work. 01 could,
 except we would be preventing it. I don't see why we should force people not
 to use zeros.

Sorry, got my thoughts mixed up. I use letter postfixes to get it
sorted after that specific number, i.e. 01, 03a, 03b, 04 but prefixed
to get it sorted at the top or bottom of the list (a1 would come after
numbers, _1 usually before it -- unless the sorting treats _ as space
and ignores spaces).


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le



On 18.03.2009 23:12, Paul Louden wrote:

I'd rather have an absolute than a relative definition.

We already have an absolute: ASCII sort.


Yes, but it doesn't treat the case 1, 2, ..., 10 in the way many users 
would expect (ot like) it. Hence the natsort.


Our natural sorting is entirely a mishmash of rules as to how numbers 
should be treated, and other characters.


It wouldn't be such a mishmash if we'd implement just that simple rule 
(which is given a very well name): interpret numbers as  Any 
non-digit character (also a dot and a comma) are just characters, i.e. 
we only consider integer numbers.



For example, is 1.001 one point zero zero one,


Not in natsort



or disk one track one,


Not this in natsort. In natsort it would be the number 1, a dot, the 
number 1. The interpretation of the numbers is beyond the scope.



or one thousand and one?


No since it assume a country specific separator.

If we make up our own non-standard way, yes, we can describe it. We can 
a few paragraphs in the manual detailing how people can expect their 
files to be sorted, since no other program does it like we do.


Actually we wouldn't need a very long description if the rule would be 
simple enough.


I don't see why we can describe it is a reason to use our own method - 
we can describe methods we get the code for from elsewhere too.


But, as you pointed out above, such a complicated logic requires a long 
description with many examples illustrating all the quirks. Which makes 
such a description pointless since nobody would grasp it.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Dominik Riebeling schrieb:

Maybe I've missed such a consensus -- in this case someone please
point me to the right direction and ignore this mail :)


After this discussion and the ones in IRC, it seems to me that the 
majority is in favor of ignoring leading zeros. This would also match 
with Nautilus' and Windows Explorer's sorting.


And we can do that. Give that the usual browsers do it that way, it's 
also what the user expects, so it can't be bad. FS#10031 needs changing 
the algorithm anyway.


So, should we do that? It at least seems to be the opinion of most people.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Al Le schrieb:



For example, is 1.001 one point zero zero one,


Not this in natsort. In natsort it would be the number 1, a dot, 
the number 1. The interpretation of the numbers is beyond the scope.
Natsort will sort this before 1.002 and before 1.01. It does this, 
because it does not ignore leading zeros.



or one thousand and one?


No since it assume a country specific separator.


Nobody assumes a separator. This is just a dot, a non-digit character. 
Any non-digit character will act as a separator.





Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Dominik Riebeling wrote:


Just consider this scenario: someone is creating a mix folder by
copying various files to it. He wants a given order. Now some files
are named with 0 prefixes, others not. He decides to not use 0
prefixes, possibly because of laziness, but leaves them on the files
he doesn't need to change at all -- if 05 is to become track 5 he
doesn't need to change anything. As for track 09 which should become
3 he'd use 3. Too far-fetched?

  
So he numbers the mix manually for most of the songs, but in the case of 
some songs that are already the right track number, he doesn't renumber 
them? This seems like a rather special case to justify throwing out user 
data.


well, treating a number as such includes stripping leading zeros from
it, at least from my understanding. 

Where in my example did leading zeros show up that require stripping, then?


It won't do any harm on properly
named files, and I don't see a reason why a user would want to prefix
with 0 just to change sorting.
Because I expect to see the folder 007 - James Bond before the folder 
5th Element even if I have natural sorting on for my tracks, among 
other things. It's not 7 - James Bond it's Double Oh 7. They're 
significant numbers, intentional and not accidental.




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Jonathan Gordon
2009/3/18 Dominik Riebeling dominik.riebel...@gmail.com:
 Just consider this scenario: someone is creating a mix folder by
 copying various files to it. He wants a given order. Now some files
 are named with 0 prefixes, others not. He decides to not use 0
 prefixes, possibly because of laziness, but leaves them on the files
 he doesn't need to change at all -- if 05 is to become track 5 he
 doesn't need to change anything. As for track 09 which should become
 3 he'd use 3. Too far-fetched?


it doesnt make sense to me to have a sorted mix folder... so I agree
with Paul here, I would tihnk that 9/10 times you play a mix folder
you have it on random


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le

On 18.03.2009 23:20, Thomas Martitz wrote:

Right. We cannot surely predict whether it is intentional or accidental.


Here I'm with you


Which is why we decided to go with the original implementation and say
the author of the code made it like that.


But here not anymore. I think the verbal description should be first, 
and then the implementation of it. You say we take an implementation 
(as the author did it) and try to describe it. I say we define a 
simple rule (but which sorts the names as users expect it) and implement 
it. If the original algorithm would have to be modified then we modify it.


To that simple rule (treating a sequence of numbers as a number) I'd 
probably add the rule that many subsequent spaces are folded to one. 
E.g. A space space B would be equal to A space B modulo natsort. 
strcmp would be used to resolve the case.



And, we can only special case very leading zeros.


They wouldn't have to be treated specially, since the rule is general 
enough to handle them.




Going back to what was before yesterday


Again: the idea is primary (and stable), the implementation is secondary



Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:18 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
 People can rely on ommitting leading zeros now since we can sort it
 correctly numerically. That makes me think that any leading zero may
 very well be intended.

People can rely on the way it was implemented so because of that you
consider leading zeros intentional? As in it was implemented that way
so you can consider leading zeros intentional? What kind of reasoning
is *that*?

 I don't see what's wrong with ignoring spaces. It's obvious that spaces
 aren't real part of the names when it comes to sorting (as in 1 and 2
 spaces should be sorted differently).

The setting doesn't tell anything about spaces. It talks about number.
Thus it has to deal with numbers, not spaces. Everything else is
misleading, wrong and confusing.

 Why would anyone want to sort by spaces anyway? This doesn't make any
 sense to me.

It doesn't need to make sense to you but I'm sure you'll find someone
out there that prefers this. That's definitely no good reason for
hiding a space-eating feature in number-aware sort.

 Decimal numbers and discnumber.tracknumber works with the current svn.

This discussion isn't about the way it works with current svn. It's
about how this feature is *supposed* to work and how people *expect*
it to work.

 If you search for logs, we had a discussion yesterday starting here:
 http://www.rockbox.org/irc/log-20090317#17:53:35 and today starting
 here: http://www.rockbox.org/irc/log-20090318#19:25:26

If you'd read my inital mail you'd noticed that I linked the first log
myself. Still I don't see a consensus how this *exactly* should work.

 Both are Flyspray-bugreport induced, and I can't remember another
 discussion other than those before the initial commit.

Well, someone commiting such a feature could have though about the
possibility others having a different view and expectation of such a
feature. Those FS entries must have had a reason, don't they?


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Al Le wrote:
I don't see why we can describe it is a reason to use our own 
method - we can describe methods we get the code for from elsewhere too.


But, as you pointed out above, such a complicated logic requires a 
long description with many examples illustrating all the quirks. Which 
makes such a description pointless since nobody would grasp it.


I disagree. With sorting you can give examples. Saying nobody would 
grasp it is overly broad when you don't even know how long the set of 
rules even is, yet.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Al Le schrieb:

On 18.03.2009 23:20, Thomas Martitz wrote:

Right. We cannot surely predict whether it is intentional or accidental.


Here I'm with you


Which is why we decided to go with the original implementation and say
the author of the code made it like that.


But here not anymore. I think the verbal description should be first, 
and then the implementation of it. You say we take an implementation 
(as the author did it) and try to describe it. I say we define a 
simple rule (but which sorts the names as users expect it) and 
implement it. If the original algorithm would have to be modified then 
we modify it.

Yes, I can live with that.


To that simple rule (treating a sequence of numbers as a number) I'd 
probably add the rule that many subsequent spaces are folded to one. 
E.g. A space space B would be equal to A space B modulo natsort. 
strcmp would be used to resolve the case.



And, we can only special case very leading zeros.


They wouldn't have to be treated specially, since the rule is general 
enough to handle them.


Only if you want to break decimal numbers or discnumber.tracknumber (or 
any other numbers which have a constant prefix in the strings to be 
compared).
This is what we had, and it turned out to be flawed. We cannot ignore 
leading zeros of numbers within the string, only at the very beginning.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:20 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
 There's no special treatment of dots at all. How do you come to that
 idea? Zeros are not special treated either. Currently only spaces are a
 special case, as they are ignored.

Sorry? Are you reading the thread? This is discussion about how it
*should* work, and thus also about how it *should not* work. Not about
how the current implementation works.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Al Le wrote:


But here not anymore. I think the verbal description should be first, 
and then the implementation of it. You say we take an implementation 
(as the author did it) and try to describe it. I say we define a 
simple rule (but which sorts the names as users expect it) and 
implement it. If the original algorithm would have to be modified then 
we modify it.


To that simple rule (treating a sequence of numbers as a number) I'd 
probably add the rule that many subsequent spaces are folded to one. 
E.g. A space space B would be equal to A space B modulo natsort. 
strcmp would be used to resolve the case.
This seems another arbitrary I think it should be done this way 
addition. I thought you wanted a simple rule?


How about Don't require leading zeros. Described as Numbers after 
leading zeros will be interpreted as whole numbers, rather than a series 
of digits. A simple rule, and one that lest people know that zeros in 
the middle of strings won't randomly be ignored (which they will be in 
currently proposed systems).


It's a simple rule, can be described in on sentence, includes an option 
name that's descriptive, and doesn't ignore user provided parts of the 
filenames.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Dominik Riebeling schrieb:

On Wed, Mar 18, 2009 at 11:20 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
  

There's no special treatment of dots at all. How do you come to that
idea? Zeros are not special treated either. Currently only spaces are a
special case, as they are ignored.



Sorry? Are you reading the thread? This is discussion about how it
*should* work, and thus also about how it *should not* work. Not about
how the current implementation works.


 - Dominik
  
He said what it should not do, and I told him that it already doesn't do 
it as of now (except for the spaces).


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le

On 18.03.2009 23:43, Thomas Martitz wrote:

And, we can only special case very leading zeros.


They wouldn't have to be treated specially, since the rule is general 
enough to handle them.


Only if you want to break decimal numbers or discnumber.tracknumber (or 
any other numbers which have a constant prefix in the strings to be 
compared).



I can't understand what you mean. What break?


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Dominik Riebeling schrieb:

On Wed, Mar 18, 2009 at 11:18 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
  

People can rely on ommitting leading zeros now since we can sort it
correctly numerically. That makes me think that any leading zero may
very well be intended.



People can rely on the way it was implemented so because of that you
consider leading zeros intentional? As in it was implemented that way
so you can consider leading zeros intentional? What kind of reasoning
is *that*?
  


I'm telling that they don't need leading zeros for proper numerical 
sorting anymore. I don't see bad reasoning in that.
  

I don't see what's wrong with ignoring spaces. It's obvious that spaces
aren't real part of the names when it comes to sorting (as in 1 and 2
spaces should be sorted differently).



The setting doesn't tell anything about spaces. It talks about number.
Thus it has to deal with numbers, not spaces. Everything else is
misleading, wrong and confusing.

  

Why would anyone want to sort by spaces anyway? This doesn't make any
sense to me.



It doesn't need to make sense to you but I'm sure you'll find someone
out there that prefers this. That's definitely no good reason for
hiding a space-eating feature in number-aware sort.
  


Hence I asked you whether we should change the discription or the way it 
sorts. Just answer that instead of getting angry at me.
  

Decimal numbers and discnumber.tracknumber works with the current svn.



This discussion isn't about the way it works with current svn. It's
about how this feature is *supposed* to work and how people *expect*
it to work.

  


And I'm not allowed to compare with how it currently works? And I'm not 
allowed to say Hey, this feature what you want, it does this already? 
Really?




Well, someone commiting such a feature could have though about the
possibility others having a different view and expectation of such a
feature. Those FS entries must have had a reason, don't they?


 - Dominik
  


Don't be ignorant please. We've had an *awful* lot of discussion before. 
Do you really think I forgot about those? There are always pros and 
cons. That's no reasoning to let something rot or something.


And please calm down please and stay friendly. No reason for getting at me.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Al Le schrieb:

On 18.03.2009 23:43, Thomas Martitz wrote:

And, we can only special case very leading zeros.


They wouldn't have to be treated specially, since the rule is 
general enough to handle them.


Only if you want to break decimal numbers or discnumber.tracknumber 
(or any other numbers which have a constant prefix in the strings to 
be compared).



I can't understand what you mean. What break?


Your general rule was in SVN a few days ago. Look at FS#10029 what it 
caused.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:38 PM, Jonathan Gordon jdgo...@gmail.com wrote:
 it doesnt make sense to me to have a sorted mix folder... so I agree
 with Paul here, I would tihnk that 9/10 times you play a mix folder
 you have it on random

If someone sorts his mix folder he most likely wants it to get played
in a specific order, wouldn't he?
Think of some mix folder that has different styles of music and walks
through them -- that's a different thing than simply throwing them in
in random order. A mix folder can very well have a wanted track order.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le

On 18.03.2009 23:44, Paul Louden wrote:

Al Le wrote:


To that simple rule (treating a sequence of numbers as a number) I'd 
probably add the rule that many subsequent spaces are folded to one. 
E.g. A space space B would be equal to A space B modulo natsort. 
strcmp would be used to resolve the case.
This seems another arbitrary I think it should be done this way 
addition. I thought you wanted a simple rule?


I agree here. It's a bit arbitrary but would fit the natural. But I 
wouldn't insist on it since the setting is only about numbers.



How about Don't require leading zeros. Described as Numbers after 
leading zeros will be interpreted as whole numbers, rather than a series 
of digits. A simple rule, and one that lest people know that zeros in 
the middle of strings won't randomly be ignored (which they will be in 
currently proposed systems).


It's a simple rule, can be described in on sentence, includes an option 
name that's descriptive, and doesn't ignore user provided parts of the 
filenames.


Yes, it's a simple rule, and from this point of view I could very well 
live with it. But it puts 04 before 3 which wasn't the intention of 
the natsort in the beginning (if I understand correctly). For example, 
Nautilus puts 03 before 4.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Dominik Riebeling wrote:

If someone sorts his mix folder he most likely wants it to get played
in a specific order, wouldn't he?
Think of some mix folder that has different styles of music and walks
through them -- that's a different thing than simply throwing them in
in random order. A mix folder can very well have a wanted track order.
  
But you're suggesting it has a wanted track order where, for some reason 
or other, they haven't actually named the songs themselves?


I mean, step one is copy them into the folder. Step two is name them 
so they're in the right order. Right? (And this is assuming they create 
the mix folder on their PC, rather than just inserting the songs into a 
playlist). I don't think it's particularly likely that, while renaming 
songs, they'll just choose to skip ones that are named differently and 
hope they'll be in the right place.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le

On 18.03.2009 23:53, Thomas Martitz wrote:

I can't understand what you mean. What break?


Your general rule was in SVN a few days ago. Look at FS#10029 what it 
caused.


No, if the rule were implemented correctly, the files would be sorded 
correctly as well. It must have been a flaw in the implementation, not 
in the idea.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Al Le wrote:


Yes, it's a simple rule, and from this point of view I could very well 
live with it. But it puts 04 before 3 which wasn't the intention 
of the natsort in the beginning (if I understand correctly). For 
example, Nautilus puts 03 before 4.


In my understanding, the intention of natsort was to fix 1, 10, 2, 3, 
4, 5, 6. It can still do this while respecting intentional leading 
zeros, and my described simple rule still fixes that problem just fine.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:52 PM, Thomas Martitz
thomas.mart...@fhtw-berlin.de wrote:
 I'm telling that they don't need leading zeros for proper numerical sorting
 anymore. I don't see bad reasoning in that.

So, because leading zeros aren't required anymore leading zeros
immediately become intentional? This is broken reasoning to me. Just
because leading zeros aren't needed anymore there's no change if
leading zeros are intentional or not or our knowledge if they are or
not.

 Hence I asked you whether we should change the discription or the way it
 sorts. Just answer that instead of getting angry at me.

Errr? I was asking about how this feature is *intended* to work and
how people *expect* it to work. Not how it's currently done. The
current implementation might differ from the way it's intended to
work.

 And I'm not allowed to compare with how it currently works? And I'm not
 allowed to say Hey, this feature what you want, it does this already?

It doesn't help the case a tiny bit to present the current state if we
are talking about the intention. It doesn't help the case if you only
start to get defensive just because someone disagrees with the way
your feature works.

 Don't be ignorant please. We've had an *awful* lot of discussion before. Do
 you really think I forgot about those? There are always pros and cons.

Well, then it seems there wasn't enough discussion. Just let me point
to http://www.rockbox.org/irc/log-20090318#20:50:13 in the light of
this thread.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Al Le

On 19.03.2009 00:00, Thomas Martitz wrote:
And doing it correctly means special casing leading zeros at the very 
beginning.


No, doing it correctly (in my view) means interpret any sequence of 
digits as a number. That would automatically treat the leading zeroes.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:38 PM, Al Le al...@gmx.de wrote:
 But here not anymore. I think the verbal description should be first, and
 then the implementation of it. You say we take an implementation (as the

I completely agree here. If done the other way round we get endless
discussions and changes of the behaviour again and again.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Al Le wrote:


In my understanding, the intention of natsort is to change the rules 
of how strings (file names) are sorted. As a side effect, it alos 
fixes the problem with 1, 10, 2. But it's not only about that. It's 
more general.


What it does is more general. What it was intended to _fix_ was that 
specific type of case. That's what prompted initial discussion of it, 
and that case was the focus around which every proposal (that I 
remember) for sorting to be improved or changed was based on.


Basically, the logic went users expect a computer to know that the 
series of numbers 1, 2, 3, 4, 5, 6, 7, 8 9, 10 goes in that order, and 
seeing them in ASCII order is unexpected.


My personal position is also that if a user adds a 0 before a number, 
they expect it to change something, rather than being ignored. I think, 
on average, more 0s (in lists meant to be sorted) will be intentional 
than accidental. If you want the list sorted you either name the 
files, or use a set of files named already to be sorted. I think it's 
exceptionally rare that you'll have a list of files that a user has 
created and intended to be sorted that have 3, 04, 5 in them and mean it 
in that order. Meanwhile, it's exceedingly _rare_ in my opinion that 
people would intend 1, 10, 2, 3, 4 as their sorting order. And in that 
case we're not throwing out any data they added, just trying to read 
what is written, rather than treat it as a string of unique characters. 
I think 004 being treated as 00, then 4 is the same as 4a being 
treated as 4, then a rather than the string 4a. Otherwise we may as 
well say numbers need a space after them to denote they aren't part of 
strings or something. For example, l337-speak named files currently may 
be sorted extremely awkwardly. B007Y for example. We should probably 
assume zeros are intentional there (in my opinion).


I think it's just more consistent if we don't throw out any characters.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:57 PM, Paul Louden paulthen...@gmail.com wrote:
 playlist). I don't think it's particularly likely that, while renaming
 songs, they'll just choose to skip ones that are named differently and hope
 they'll be in the right place.

If Windows Explorer / Nautilus / whatever browser the user is using
sort it correctly as in 3, 05, 7? Then he hasn't skipped it
but our way of sorting is wrong.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Dominik Riebeling wrote:

If Windows Explorer / Nautilus / whatever browser the user is using
sort it correctly as in 3, 05, 7? Then he hasn't skipped it
but our way of sorting is wrong.
  
So in this one single case of the person creating an organized mix 
folder, renaming them on his PC, choosing to skip over songs that appear 
in the right place, etc, we have a case where this sort is favourable.


Meanwhile, we're willing to break intentional numbering because of this 
rare case? Do you really think it's going to occur often enough to plan for?


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Wed, Mar 18, 2009 at 11:37 PM, Paul Louden paulthen...@gmail.com wrote:
 So he numbers the mix manually for most of the songs, but in the case of
 some songs that are already the right track number, he doesn't renumber
 them? This seems like a rather special case to justify throwing out user
 data.

That might be a special case, but

 Because I expect to see the folder 007 - James Bond before the folder 5th
 Element even if I have natural sorting on for my tracks, among other
 things. It's not 7 - James Bond it's Double Oh 7. They're significant
 numbers, intentional and not accidental.

James Bond is also a special case. As on the GoldenQuotes wiki page:

LinusN Llorean: he is not named 007 to be sorted in a specific order
LinusN it's because he has a license to kill

Pretty much says all :)


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Dominik Riebeling wrote:


James Bond is also a special case. As on the GoldenQuotes wiki page:

LinusN Llorean: he is not named 007 to be sorted in a specific order
LinusN it's because he has a license to kill

Pretty much says all :)

  
So, if they're both special cases, which is more common? People who 
might have 007 movies or l337-speak filenames or might prefix a 0 to a 
number to get it first (which is a concept even many of my non-technical 
friends have been trained to expect to work just because of ASCII sort 
in enough things, so they'll thrown on a zero and IF that doesn't work 
try something else, but the file rename in Rockbox is something we 
shouldn't expect people to be willing to use multiple times per file), 
or people who create mix folders on their PC that just happen to be in 
right-enough order that they don't have to rename their occasional mixed 
numbered filename?


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Dominik Riebeling
On Thu, Mar 19, 2009 at 12:18 AM, Paul Louden paulthen...@gmail.com wrote:
 Meanwhile, we're willing to break intentional numbering because of this rare
 case? Do you really think it's going to occur often enough to plan for?

My personal opinion is that users will do this, and they will do it
often enough to justify it -- people who want a specific sorting will
use ASCII sorting anyway. Well, at least the majority of them.

Besides, from the users point of view I'd prefer to be in line with
the major OSes. Not doing so will cause confusion among users. I don't
think Windows treats leading zeros as intentional, and that is still
the major OS.


 - Dominik


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Jonathan Gordon
2009/3/18 Dominik Riebeling dominik.riebel...@gmail.com:
 On Wed, Mar 18, 2009 at 11:38 PM, Jonathan Gordon jdgo...@gmail.com wrote:
 it doesnt make sense to me to have a sorted mix folder... so I agree
 with Paul here, I would tihnk that 9/10 times you play a mix folder
 you have it on random

 If someone sorts his mix folder he most likely wants it to get played
 in a specific order, wouldn't he?
 Think of some mix folder that has different styles of music and walks
 through them -- that's a different thing than simply throwing them in
 in random order. A mix folder can very well have a wanted track order.


  - Dominik


why on earth would anyone put different styles into a single mix
folder? that really makes no sense.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Dominik Riebeling wrote:

Besides, from the users point of view I'd prefer to be in line with
the major OSes. Not doing so will cause confusion among users. I don't
think Windows treats leading zeros as intentional, and that is still
the major OS.
  
It sounds like we're going to be ignoring plenty of things windows does 
anyway. This change should, at least, be unnoticed by most users except 
those who actually want the behaviour. Except in your rather awkward 
theoretical case. I'm still not entirely sure why people would 
physically reorganize their files rather than just creating a playlist.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Dominik Riebeling schrieb:

And I'm not allowed to compare with how it currently works? And I'm not
allowed to say Hey, this feature what you want, it does this already?



It doesn't help the case a tiny bit to present the current state if we
are talking about the intention. It doesn't help the case if you only
start to get defensive just because someone disagrees with the way
your feature works.
  


That's the point. I didn't disagree, nor you disagreed. because it 
already works like that (in the case of the decimal numbers).


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Dominik Riebeling schrieb:


Besides, from the users point of view I'd prefer to be in line with
the major OSes. Not doing so will cause confusion among users. I don't
think Windows treats leading zeros as intentional, and that is still
the major OS.


 - Dominik
  



I agree. We shouldn't force a sorting which is contradictory to the 
sorting of all major file browsers.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Thomas Martitz wrote:



I agree. We shouldn't force a sorting which is contradictory to the 
sorting of all major file browsers.
And yet every proposal so far is to pick and choose which aspects of 
major file browser sorting we like, and throw out the rest. If we're 
going to use them to justify changes, we should strive to actually mimic 
them. Having part of their functionality, and not all of it, will lead 
to expected behaviour that's missing.


Meanwhile, if we just make ours simple and explicit, people won't expect 
other aspects of it that are missing.




Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Thomas Martitz

Am 19.03.2009 01:52, schrieb Paul Louden:

Thomas Martitz wrote:



I agree. We shouldn't force a sorting which is contradictory to the 
sorting of all major file browsers.
And yet every proposal so far is to pick and choose which aspects of 
major file browser sorting we like, and throw out the rest. If we're 
going to use them to justify changes, we should strive to actually 
mimic them. Having part of their functionality, and not all of it, 
will lead to expected behaviour that's missing.

Which we would be doing by ignoring leading zeros. It's not about
mimicing, but rather be consistent with what the vast majority of people
knows of their PC browser.



Meanwhile, if we just make ours simple and explicit, people won't 
expect other aspects of it that are missing.




That's totally flawed reasoning in this case. Use ascii-sort if you want
simple and  explicit. And,  this is basically Having part of their
functionality, and not all of it, will lead to expected behaviour that's
missing. at it's best.

Oh, and they surely will expect that what they know from Windows 
explorer or nautilus, regardless of which algorithm we use.


Re: how is strnatcmp aka Interpret numbers while sorting supposed to sort?

2009-03-18 Thread Paul Louden

Thomas Martitz wrote:


That's totally flawed reasoning in this case. Use ascii-sort if you want
simple and  explicit. And,  this is basically Having part of their
functionality, and not all of it, will lead to expected behaviour that's
missing. at it's best.
No, this isn't. This is having intuitive handling of numbers as 
normally written by people. People don't normally precede numbers with 
a 0 unless there's a specific reason to.


Oh, and they surely will expect that what they know from Windows 
explorer or nautilus, regardless of which algorithm we use.
So we can either make it clear it's different and our own way or we 
can try to make it similar with the differences more subtle and thus 
more likely to be surprising (in a bad way). Which one's more fair to 
users - one where they know it's different, or one where they don't?