Re: you are not going to be able to sort this by the fifth field.

2010-03-05 Thread jidanni
EB> Except that you can specify overlapping keys.  I find the idea of multiple
EB> separate lines of underscores, one per key, much easier to follow in
OK, any --debug=... is better than nothing.




Re: you are not going to be able to sort this by the fifth field.

2010-03-04 Thread Eric Blake
According to jida...@jidanni.org on 3/4/2010 7:11 PM:
> Thanks. I see I neglected the -b.
> On the info page in the `--field-separator=SEPARATOR' discussion, do
> mention the effects of -b on ' foo' etc.
> PB> $ LC_CTYPE=C sort --debug -sb -k5,5 < taichung_county_atm.htm
> (Use .txt, not .htms in examples.)
> Anyway, your --debug stuff would be clearer with just pipes added
> inline:
> $ echo 'a   b c'|sort --debug=show_fields
> a|   b| c
> or something like that.

Except that you can specify overlapping keys.  I find the idea of multiple
separate lines of underscores, one per key, much easier to follow in
understanding how each line is broken down into fields and keys, than I
would in trying to parse inline notations that change the line's contents.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: you are not going to be able to sort this by the fifth field.

2010-03-04 Thread jidanni
Thanks. I see I neglected the -b.
On the info page in the `--field-separator=SEPARATOR' discussion, do
mention the effects of -b on ' foo' etc.
PB> $ LC_CTYPE=C sort --debug -sb -k5,5 < taichung_county_atm.htm
(Use .txt, not .htms in examples.)
Anyway, your --debug stuff would be clearer with just pipes added
inline:
$ echo 'a   b c'|sort --debug=show_fields
a|   b| c
or something like that.




Re: you are not going to be able to sort this by the fifth field.

2010-03-04 Thread Pádraig Brady

On 04/03/10 19:59, jida...@jidanni.org wrote:

Try as you might, there is no way you are going to sort by this field,
$ LC_CTYPE=zh_TW.UTF-8 w3m -dump \
   
http://www.tcb-bank.com.tw/tcb/servicesloc/atm_location/taichung_county_atm.htm 
|
   perl -anlwe 'print $F[4] if exists $F[4]'|LC_CTYPE=C sort
without ripping it out of the table first using perl. Go ahead, try -t ... -k 
...,...
You won't be able to order that field in the same way one can after
ripping it out of the table.


This seems to work for me: LC_CTYPE=C sort -sb -k5,5
I confirmed by extracting the field as perl above _after_ the sort using:
sed 's/^ *//; s/ +/ /g; s/\r$//' | cut -d ' ' -s -f5 | sed '/^$/d'


sort (GNU coreutils) 8.4
P.S., perhaps add a --debug-fields mode which adds field boundary | pipe
symbols into the output.


Yes I agree it's very difficult to know exactly what's going on
with the field processing in sort. I actually proposed and
mostly implemented a --debug option. Here are some examples:

$ LC_CTYPE=C sort --debug -sb -k5,5 < taichung_county_atm.htm

** no match for key **

  


you are not going to be able to sort this by the fifth field.

2010-03-04 Thread jidanni
Try as you might, there is no way you are going to sort by this field,
$ LC_CTYPE=zh_TW.UTF-8 w3m -dump \
  
http://www.tcb-bank.com.tw/tcb/servicesloc/atm_location/taichung_county_atm.htm 
|
  perl -anlwe 'print $F[4] if exists $F[4]'|LC_CTYPE=C sort
without ripping it out of the table first using perl. Go ahead, try -t ... -k 
...,...
You won't be able to order that field in the same way one can after
ripping it out of the table.
sort (GNU coreutils) 8.4
P.S., perhaps add a --debug-fields mode which adds field boundary | pipe
symbols into the output.