Alex Shinn wrote: > 2010/6/8 Pádraig Brady <p...@draigbrady.com>: >> On 07/06/10 06:19, Alex Shinn wrote: >>> >>> Ideally join should be able to handle files sorted in any order >>> that sort provides, but as a bare minimum it should at least >>> be able to join files sorted on numeric fields. >> >> Well if there were no aliases in the numbers, you could always >> sort the output numerically after the join if it was important. > > By first sorting lexicographically, you mean? > In the use case I had, the data was already sorted > numerically. So whenever I want to join two files, > currently I have to do: > > sort file1 > file1.tmp > sort file2 > file2.tmp > join file1.tmp file2.tmp | sort -n > out > rm -f file1.tmp file2.tmp > > instead of just > > join -n file1 file2 > out > > In the small tools philosophy you want to avoid adding > redundancy, but in this case join isn't doing the same > thing as sort, it's just working with it better. Not to mention > the fact that sort is an expensive operation to have to > perform multiple times, not just an extra O(n) filter > to throw in the middle of a pipeline. > >> However if you wanted to join "01" and "1" then your patch is required. >> Are numeric aliases common enough to warrant this? I think so. > > Leading zeros may not be so common, but don't forget > "1.0" and "1" or "1e2" and "100" and "100.0", etc. > >> I'd use -g, --general-numeric to correspond with `sort`. > > Yes, that's probably better.
There may be a fly in the ointment. When comparing floating point numbers how would join measure equality? Should it consider 1.000000000000001e2 to be equal to 100.0 ? What if the maximum precision available does not allow us to distinguish those two values? What about -0 and 0? (with IEEE 754, they'll compare equal)