On Mon, Jun 15, 2009 at 10:00 PM, David Fetter <da...@fetter.org> wrote:

> On Mon, Jun 15, 2009 at 11:34:38AM -0400, Tom Lane wrote:
> > David Fetter <da...@fetter.org> writes:
> > > * It's going to a lot of trouble to allow for the possibility of both
> > >   unordered results and of duplicate lines.  If we disallow duplicate
> > >   lines in unordered result sets, we can get a big speed gain by using
> > >   hash-based comparisons.
> >
> > Why not just sort the lines and compare?
>
> Good point :)
>

Please find attached the updated script and test cases. Changes since last
submission:

.) Script uses tabs for indentation.

.) Script almost passes perlcritic.com at 'stern' level.

.) Correct some RE matches, so that a ? mark only at the beginning of the
line matches (^ anchor).

.) Employ hybrid approach to support RE in unordered set, and to get better
performance:
    If there's no RE line in an unordered group of lines then perform sort
on both arrays and then compare. If there _is_ an RE line in unordered group
of lines, then do the O(n^2) processing to eliminate common lines and then
report on missing lines.

TODO:

.) Using Tie::File to make code a little cleaner.

I agree that the choice of the hybrid approach for Unordered Set comparison
makes script too indented, and maybe a little hard on eyes, but it's pretty
simple and I have tried to delineate the major sections with proper
comments.

Best regards,
-- 
Lets call it Postgres

EnterpriseDB      http://www.enterprisedb.com

gurjeet[.sin...@enterprisedb.com
singh.gurj...@{ gmail | hotmail | indiatimes | yahoo }.com
Mail sent from my BlackLaptop device

Attachment: neurodiff.pl
Description: Binary data

Attachment: expected.out
Description: Binary data

Attachment: result.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to