On Mon, Jun 15, 2009 at 10:00 PM, David Fetter <da...@fetter.org> wrote:
> On Mon, Jun 15, 2009 at 11:34:38AM -0400, Tom Lane wrote: > > David Fetter <da...@fetter.org> writes: > > > * It's going to a lot of trouble to allow for the possibility of both > > > unordered results and of duplicate lines. If we disallow duplicate > > > lines in unordered result sets, we can get a big speed gain by using > > > hash-based comparisons. > > > > Why not just sort the lines and compare? > > Good point :) > Please find attached the updated script and test cases. Changes since last submission: .) Script uses tabs for indentation. .) Script almost passes perlcritic.com at 'stern' level. .) Correct some RE matches, so that a ? mark only at the beginning of the line matches (^ anchor). .) Employ hybrid approach to support RE in unordered set, and to get better performance: If there's no RE line in an unordered group of lines then perform sort on both arrays and then compare. If there _is_ an RE line in unordered group of lines, then do the O(n^2) processing to eliminate common lines and then report on missing lines. TODO: .) Using Tie::File to make code a little cleaner. I agree that the choice of the hybrid approach for Unordered Set comparison makes script too indented, and maybe a little hard on eyes, but it's pretty simple and I have tried to delineate the major sections with proper comments. Best regards, -- Lets call it Postgres EnterpriseDB http://www.enterprisedb.com gurjeet[.sin...@enterprisedb.com singh.gurj...@{ gmail | hotmail | indiatimes | yahoo }.com Mail sent from my BlackLaptop device
neurodiff.pl
Description: Binary data
expected.out
Description: Binary data
result.out
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers