Re: sorter script [was: Frustrated newbie question]

2003-12-10 Thread drieux
On Dec 9, 2003, at 4:20 PM, R. Joseph Newton wrote:
[..]
To me hashes are like sausage to a carnivore--I love the end product, 
but
have no desire to look too closely at the process.
[..]

first the last,
the schwartzian transformation I included
was from the "perldoc -q sort" as a way of
noting that there are some interesting things
that can be done, if done well, in memory.
So on top of the usual 'portability' sets
of issues there is also the problem that
some 'sort' utilities will 'fault' to an
I/O event to write out intermediary stage
results, so as to save on memory usage.
The PPT version of sort does not take a
-T  option to allow one to fan out
the intermediary stages, nor does it offer
a -ykmem  option... caveat emptor.
Warning: Serious GEEKING...

I like your idea of 'shape()' - in the abstract -
as a way of 'visualizing' the problem. Since that
does point back to the 'design elements' of a
well done 'database solution'. { for me, many
of the 'implementation details' of DataBaseFoo,
should be handled, like hashes, as things we do,
but prefer not to talk about in polite society. }
The problem with 'aesthetics' - my 'ugly sort'
kvetch, is that at times a bit of brute force
is 'good enough' - and can be better than the
most 'elegant' of looking algorithms.
One case in particular that left scars was what
technically was an 'elegant' SQL query a co-worker
had constructed. Since it seemed to be simpler. The problem was
that it meant a series of "interesting" join statements
that bogged the database engine down. { some of that
was due to bad implementation on the db engine. }
Rather than taking the 'less elegant' "bute force"
approach of two 'get foo' things that would dump
more bits on the wire, but less weight on the db engine,
and the post processing was done on 'cheaper'
support servers.
So while a given algorithm may map out as
O() in the logical analysis, depending
upon what else is in play, it may have an impact
that is disproportionate.
ciao
drieux
---

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: sorter script [was: Frustrated newbie question]

2003-12-09 Thread R. Joseph Newton
drieux wrote:

> Ironically, uh, duh, given tassilo's recent thumping
> of me for whining about acadamia - there are some
> ugly 'sorting algorithms' that have to be 'ugly'
> to be 'general enough' that are, well, ugly.

Hmmm.  I don't know about that.  Most sorting algorithms I have seen are
quite beautiful conceptually.  Even the rather primitive bubble sort can
look pretty viewed in its single dimension.  Each of the recursive
algorithms has its own style of elegance.  The mechanics of dealing with
various types or shapes of data can usually be abstracted with a function
call:

my @sorted = sort  {shape_to_fit($a, $b)} @source;

I do see something like this when it comes to storage structure, though.  I
love Balanced Tree structures from a conceptual perspective, for instance,
because I see a real elegance in them.  Still I cannot ignore the difference
between their O(log n) times and the O(1) time characteristic of the hash.
To me hashes are like sausage to a carnivore--I love the end product, but
have no desire to look too closely at the process.

> You might want to get your hands on Knuth's
> and crawl the searching and sorting algorithm
> sections if you are really interested in some
> serious analysis of good ways and bad ways to
> think about solving sorting algorithms.
>
> may I recommend that you peak at
>
> perldoc -q sort
>
> and then look at
>
>@sorted = map  { $_->[0] }
>   sort { $a->[1] cmp $b->[1] }
>   map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
>
> and think about what that MIGHT mean were it to
> have been plugged into the process.
>
> ciao
> drieux

I'm sorta stumped, I must admit.  Looks like you have material with a single
numeric prefix that is extraneous to your need, and a single token that you
are seeking, per line--and that you want it all to SHOUT.  I'd suggest
abstracting most of this into your shape_to_fit().when looking at it in the
context of the sort.  Of course, a specific name for the shape_to_fit
function would aid comprehension greatly.

Joseph


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: sorter script [was: Frustrated newbie question]

2003-12-09 Thread drieux
On Dec 9, 2003, at 7:45 AM, Bryan Harris wrote:
[..]
 Heck, why don't they just rewrite sort in perl
if it's that much faster?


since you asked...

8-)

ciao
drieux
---

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: sorter script [was: Frustrated newbie question]

2003-12-09 Thread Bryan Harris


> Bryan Harris wrote:
>> 
 Sometimes perl isn't quite the right tool for the job...
 
 % man sort
 % man uniq
>>> 
>>> If you code it correctly (unlike the program at the URL above) then a
>>> perl version will be more efficient and faster than using sort and uniq.
>> 
>> Please explain...
>> 
>> That's the last conclusion I thought anyone would be able to reach.
> 
> How about a little demo.  The times posted are the fastest from ten runs
> of the same programs.

[stuff cut out]

> The "sort | uniq" version has to run two processes and pass the whole
> file through the pipe from one process to the next.  The "sort -u"
> version has to sort the whole file first and then outputs only the
> unique values.  The perl version uses a hash to store the unique values
> first and then outputs the sorted values.  Depending on the number of
> duplicate values, the perl version will usually be faster as it has to
> sort a smaller list.

I see!  I just don't understand...  I thought perl's memory management, code
interpretation, overhead in creating hashes and just in running would've
taken far longer than sort.  Heck, why don't they just rewrite sort in perl
if it's that much faster?

- B



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: sorter script [was: Frustrated newbie question]

2003-12-09 Thread John W. Krahn
Bryan Harris wrote:
> 
> >> Sometimes perl isn't quite the right tool for the job...
> >>
> >> % man sort
> >> % man uniq
> >
> > If you code it correctly (unlike the program at the URL above) then a
> > perl version will be more efficient and faster than using sort and uniq.
> 
> Please explain...
> 
> That's the last conclusion I thought anyone would be able to reach.

How about a little demo.  The times posted are the fastest from ten runs
of the same programs.

$ perl -le'print int(rand(10_000)+50_000) for 1 .. 1_000_000' >
random.txt
$ time sort random.txt | uniq > sorted.shell

real0m38.799s
user0m34.880s
sys 0m2.920s
$ time sort -u random.txt > sorted.shell

real0m23.452s
user0m22.520s
sys 0m0.720s
$ time perl -lne'$h{$_}=()}{print for sort keys%h' random.txt >
sorted.perl

real0m18.450s
user0m17.880s
sys 0m0.450s
$ diff -s sorted.shell sorted.perl
Files sorted.shell and sorted.perl are identical


The "sort | uniq" version has to run two processes and pass the whole
file through the pipe from one process to the next.  The "sort -u"
version has to sort the whole file first and then outputs only the
unique values.  The perl version uses a hash to store the unique values
first and then outputs the sorted values.  Depending on the number of
duplicate values, the perl version will usually be faster as it has to
sort a smaller list.



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: sorter script [was: Frustrated newbie question]

2003-12-09 Thread drieux
On Dec 8, 2003, at 9:30 PM, Bryan Harris wrote:
[..]
Sometimes perl isn't quite the right tool for the job...

% man sort
% man uniq
If you code it correctly (unlike the program at the URL above) then a
perl version will be more efficient and faster than using sort and 
uniq.
Please explain...

That's the last conclusion I thought anyone would be able to reach.
[..]

For many simple things 'sort -u' will suffice,
which I presume was your argument. the problem of
course is getting what john asserts as 'code it correctly'.
oye - I finally used stuart clemmon's suggestion
{ thank you stuart! }
to peek at the code. OYE! I will defer to john
as to which piece has him going OYE! more than
the funkadelic of the URL's
	my @sorted = sort ;

and then a bunch of file IO as well...

but back to your side of the question, as you may
have noticed from using 'sort' even if you are running
'sort -u' there is a bunch of problems that can come
as the volume of data grows - that will lead to the
creation of a bunch of cache files.
Ironically, uh, duh, given tassilo's recent thumping
of me for whining about acadamia - there are some
ugly 'sorting algorithms' that have to be 'ugly'
to be 'general enough' that are, well, ugly.
You might want to get your hands on Knuth's
and crawl the searching and sorting algorithm
sections if you are really interested in some
serious analysis of good ways and bad ways to
think about solving sorting algorithms.
may I recommend that you peak at

	perldoc -q sort

and then look at

  @sorted = map  { $_->[0] }
 sort { $a->[1] cmp $b->[1] }
 map  { [ $_, uc( (/\d+\s*(\S+)/)[0]) ] } @data;
and think about what that MIGHT mean were it to
have been plugged into the process.
ciao
drieux
---

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: sorter script [was: Frustrated newbie question]

2003-12-08 Thread Bryan Harris


 My next Perl task after I get my list of one name per line, is to sort the
 list and eliminate duplicate names.
>>> 
>>> I have used the following script to sort and remove duplicate entries in
>>> flat
>>> text files.
>>> 
>>> http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm
>> 
>> Sometimes perl isn't quite the right tool for the job...
>> 
>> % man sort
>> % man uniq
> 
> If you code it correctly (unlike the program at the URL above) then a
> perl version will be more efficient and faster than using sort and uniq.

Please explain...

That's the last conclusion I thought anyone would be able to reach.

- B



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: sorter script [was: Frustrated newbie question]

2003-12-08 Thread John W. Krahn
Bryan Harris wrote:
> 
> >> My next Perl task after I get my list of one name per line, is to sort the
> >> list and eliminate duplicate names.
> >
> > I have used the following script to sort and remove duplicate entries in flat
> > text files.
> >
> > http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm
> 
> Sometimes perl isn't quite the right tool for the job...
> 
> % man sort
> % man uniq

If you code it correctly (unlike the program at the URL above) then a
perl version will be more efficient and faster than using sort and uniq.


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




good example of top-posting problem [Was RE: sorter script [was: Frustrated newbie question]]

2003-12-08 Thread Kevin Pfeiffer
Hi Tom,

Tom Kinzer top-posted:

> *Consciously* making the decision that your script will no longer be
> portable...
> 
> my $2Cents;
> 
> -Tom Kinzer

> -Original Message-
> From: Bryan Harris [mailto:[EMAIL PROTECTED]
> Sent: Sunday, December 07, 2003 5:32 PM
> To: Beginners Perl
> Subject: Re: sorter script [was: Frustrated newbie question]
> 
> 
> 
> 
>>> My next Perl task after I get my list of one name per line, is to sort
> the
>>> list and eliminate duplicate names.
>>
>> I have used the following script to sort and remove duplicate entries in
> flat
>> text files.
>>
>> http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm
> 
> 
> Sometimes perl isn't quite the right tool for the job...
> 
> % man sort
> % man uniq
> 
> - B

This is a good example of the problem with top-posting; particularly in this
list: I have no idea to which part of which poster your comment refers.

Sorry about complaining (I try to ignore this behaviour), but it does cause
problems in readability and comprehension (especially here). Not to mention
the problem of deciding where to position a response in such a mixed-up
message.

-K


-- 
Kevin Pfeiffer

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




RE: sorter script [was: Frustrated newbie question]

2003-12-07 Thread Tom Kinzer
*Consciously* making the decision that your script will no longer be
portable...

my $2Cents;

-Tom Kinzer

-Original Message-
From: Bryan Harris [mailto:[EMAIL PROTECTED]
Sent: Sunday, December 07, 2003 5:32 PM
To: Beginners Perl
Subject: Re: sorter script [was: Frustrated newbie question]




>> My next Perl task after I get my list of one name per line, is to sort
the
>> list and eliminate duplicate names.
>
> I have used the following script to sort and remove duplicate entries in
flat
> text files.
>
> http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm


Sometimes perl isn't quite the right tool for the job...

% man sort
% man uniq

- B



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>




Re: sorter script [was: Frustrated newbie question]

2003-12-07 Thread stuart_clemons
Hi drieux:

The link that failed for you, worked for me.  That link led to this link 
which had the code.

http://products.daviddurose.com/cgi-bin/download.cgi?script=sort


- Message from drieux <[EMAIL PROTECTED]> on Sun, 7 Dec 2003 09:04:51 
-0800 -
To:
Perl Perl <[EMAIL PROTECTED]>
Subject:
Re: sorter script [was: Frustrated newbie question]

On Dec 7, 2003, at 3:31 AM, John W. Krahn wrote:
[..]
>>
>> http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm
>
> Why?

more importantly,

How?

I have tried a couple of times to download it
and get nothing. the retreat to the handy dandy

GET -d -u -U -s -S -e 
http://www.downloaddatabase.com/databasesoftware/download-db-sorter- 
script.htm

is not giving me any insight into what is going on...


ciao
drieux

Re: sorter script [was: Frustrated newbie question]

2003-12-07 Thread Bryan Harris


>> My next Perl task after I get my list of one name per line, is to sort the
>> list and eliminate duplicate names.
> 
> I have used the following script to sort and remove duplicate entries in flat
> text files.
> 
> http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm


Sometimes perl isn't quite the right tool for the job...

% man sort
% man uniq

- B



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: sorter script [was: Frustrated newbie question]

2003-12-07 Thread drieux
On Dec 7, 2003, at 3:31 AM, John W. Krahn wrote:
[..]
http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm
Why?
more importantly,

How?

I have tried a couple of times to download it
and get nothing. the retreat to the handy dandy
GET -d -u -U -s -S -e  
http://www.downloaddatabase.com/databasesoftware/download-db-sorter- 
script.htm

is not giving me any insight into what is going on...

ciao
drieux
---

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



Re: sorter script [was: Frustrated newbie question]

2003-12-07 Thread John W. Krahn
Saskia Van Der Elst wrote:
> 
> On Friday 05 December 2003 10:53, [EMAIL PROTECTED] wrote:
> > My next Perl task after I get my list of one name per line, is to sort the
> > list and eliminate duplicate names.
> 
> I have used the following script to sort and remove duplicate entries in flat
> text files.
> 
> http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm

Why?


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 




Re: sorter script [was: Frustrated newbie question]

2003-12-05 Thread Saskia van der Elst
On Friday 05 December 2003 10:53, [EMAIL PROTECTED] wrote:
> My next Perl task after I get my list of one name per line, is to sort the
> list and eliminate duplicate names.

I have used the following script to sort and remove duplicate entries in flat 
text files.

http://www.downloaddatabase.com/databasesoftware/db-sorter-script.htm

Saskia

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]