Re: [gentoo-user] Re: [OT] Question about duplicate lines in file

2006-06-12 Thread Neil Bothwick
On Mon, 12 Jun 2006 17:52:21 -0500, Teresa and Dale wrote:

> >sort -u -k1,1 /etc/hosts >/etc/hosts.new
> >
> >avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
> >(space delimited) and -u remove lines where the sort field is the same.

> Well that removed a few, all of them to be exact.  The file was blank. 
> O_O  LOL  I'm learning though.

What's the format of the file? If it's a standard /etc/hosts layout, this
will removed duplicates based on IP address, but if you have another
field first, you need to change the key.


-- 
Neil Bothwick

"Bother," said Pooh, as he connected at 300 bps.


signature.asc
Description: PGP signature


Re: [gentoo-user] Re: [OT] Question about duplicate lines in file

2006-06-12 Thread Teresa and Dale
Neil Bothwick wrote:

>On Mon, 12 Jun 2006 20:39:20 +0200, Alan McKinnon wrote:
>
>  
>
>>If /etc/hosts has these lines:
>>127.0.0.1 localhost
>>127.0.0.1  localhost
>>uniq will see these as different even though they are actually the 
>>same entry. So he needs something like tr to squash spaces. This will 
>>do it (as root):
>>
>>cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new
>>
>>
>
>sort -u -k1,1 /etc/hosts >/etc/hosts.new
>
>avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
>(space delimited) and -u remove lines where the sort field is the same.
>
>
>  
>

Well that removed a few, all of them to be exact.  The file was blank. 
O_O  LOL  I'm learning though.

Dale
:-)
-- 
gentoo-user@gentoo.org mailing list



Re: [gentoo-user] Re: [OT] Question about duplicate lines in file

2006-06-12 Thread Neil Bothwick
On Mon, 12 Jun 2006 20:39:20 +0200, Alan McKinnon wrote:

> If /etc/hosts has these lines:
> 127.0.0.1 localhost
> 127.0.0.1  localhost
> uniq will see these as different even though they are actually the 
> same entry. So he needs something like tr to squash spaces. This will 
> do it (as root):
> 
> cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new

sort -u -k1,1 /etc/hosts >/etc/hosts.new

avoids the need to use cat, uniq or tr. -k1,1 sorts on the first field
(space delimited) and -u remove lines where the sort field is the same.


-- 
Neil Bothwick

Please rotate your phone 90 degrees and try again.


signature.asc
Description: PGP signature


Re: [gentoo-user] Re: [OT] Question about duplicate lines in file

2006-06-12 Thread Alan McKinnon
On Monday 12 June 2006 19:55, Christer Ekholm wrote:
> Teresa and Dale <[EMAIL PROTECTED]> writes:
> > Thanks, read the man page, it was short so it didn't take long. 
> > I tried this:
> >
> > uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
> >
> > It doesn't look like it did anything but copy the same thing
> > over. There are only 2 lines missing.  Does spaces count?  Some
> > put in a lot of spaces between the localhost and the web address.
> >  Maybe that has a affect??
>
> The problem with uniq is that it (according to the manpage),
>
>   "Discard all but one of successive identical lines"
>
> You need to have a sorted file for uniq to do what you want, or
> sort it with the -u  option
>
>   sort -u hosts > hostsort
>
> If you don't want to ruin your original order you have to do
> something else. This is one way of doing it with perl.
>
>   perl -ne 'print unless exists $h{$_}; $h{$_} = 1' hosts >
> hostsort


Almost there :-)

If /etc/hosts has these lines:
127.0.0.1 localhost
127.0.0.1  localhost
uniq will see these as different even though they are actually the 
same entry. So he needs something like tr to squash spaces. This will 
do it (as root):

cat /etc/hosts | tr -s ' ' | sort | uniq -i > /etc/hosts.new

If the new file is OK, use it to overwrite /etc/hosts

Explanation so Dale knows what I'm asking him to do:
cat send the file to tr
tr finds all cases of two or more consecutive spaces and replaces them 
with one space
sort does a sort
uniq finds consecutive lines that are the same and throws away the 
extra ones. The -i is there just in case two entries differ in case 
only (as FQDNs are strictly speaking case insensitive). As mentioned 
by others, uniq only matches consecutive dupes, so the list must be 
sorted first
> /etc/hosts.new writes the final output to the named disk file

Cheers,
alan

p.s. Those 15,000 entries in your hosts file are, um, a lot :-)


-- 
If only me, you and dead people understand hex, 
how many people understand hex?

Alan McKinnon
alan at linuxholdings dot co dot za
+27 82, double three seven, one nine three five
-- 
gentoo-user@gentoo.org mailing list



[gentoo-user] Re: [OT] Question about duplicate lines in file

2006-06-12 Thread Christer Ekholm
Teresa and Dale <[EMAIL PROTECTED]> writes:

>
>
> Thanks, read the man page, it was short so it didn't take long.  I tried
> this:
>
> uniq -u /home/dale/Desktop/hosts /home/dale/Desktop/hostsort
>
> It doesn't look like it did anything but copy the same thing over. 
> There are only 2 lines missing.  Does spaces count?  Some put in a lot
> of spaces between the localhost and the web address.  Maybe that has a
> affect??

The problem with uniq is that it (according to the manpage),

  "Discard all but one of successive identical lines"

You need to have a sorted file for uniq to do what you want, or sort
it with the -u  option

  sort -u hosts > hostsort

If you don't want to ruin your original order you have to do something
else. This is one way of doing it with perl.

  perl -ne 'print unless exists $h{$_}; $h{$_} = 1' hosts > hostsort

--
 Christer

-- 
gentoo-user@gentoo.org mailing list