On 5/30/07, Laxminarayan G Kamath A <[EMAIL PROTECTED]> wrote: snip
Any ways of optimising it further?
snip
Premature optimization is the root of all evil. Have you profiled the code yet? If not then here is some documentation that will point you in the right direction http://www.perl.com/pub/a/2004/06/25/profiling.html http://search.cpan.org/~nwclark/perl-5.8.8/utils/dprofpp.PL But while I am looking lets see what is going on. snip
1. One line need not be one record. They may cointain multine fields. 2. A sigh of relief but : only multi-line fields are wrapped in double quotes. 3. commas are both inside and outside the fields. the ones in the fileds must not be treated as "seperator" - again fields with commans are wrapped in double quotes.
snip The following code seems to speed up the parsing by two orders of magnitude (2.214 seconds for the old code and 0.036 seconds for this code on 100 records). Also, there seems to be a bug in your original code. I setup a test file with a 100 records of 30 fields each and it found found 33 fields in 1 records found 34 fields in 1 records found 36 fields in 3 records found 37 fields in 5 records found 38 fields in 10 records found 39 fields in 9 records found 40 fields in 12 records found 41 fields in 17 records found 42 fields in 15 records found 43 fields in 13 records found 44 fields in 7 records found 45 fields in 5 records found 46 fields in 1 records found 48 fields in 1 records ===code to generate test file=== #!/usr/bin/perl use strict; use warnings; my $fields = 30; my $fieldlen = 30; my @fieldtype = qw(normal quoted comma); my $records = shift; for my $rec (1 .. $records) { for my $field (1 .. $fields) { my $type = $fieldtype[int rand @fieldtype]; if ($type eq 'normal') { print 'n' x $fieldlen, ","; } elsif ($type eq 'quoted') { print '"'; my $i = 0; until ($i < $fieldlen) { my $len = int rand $fieldlen; print 'q' x $len, "\n"; $i += $len; } print '",'; } elsif ($type eq 'comma') { print '"'; my $i = 0; until ($i == $fieldlen) { my $len = int rand $fieldlen; $len = $fieldlen - $i if $i+$len > $fieldlen; print 'c' x ($len/2), ',', 'c' x ($len/2), "\n"; $i += $len; } print '",'; } } print "\n"; } ===code to parse test file=== #!/usr/bin/perl use strict; use warnings; my $record = ""; my $quotes = 0; my @records; while (defined (my $line = <>)) { next if $record eq "" and $line =~ /^\s*$/; $record .= $line; #count the number of quotes $quotes += () = $line =~ /"/g; #if $quotes is even then we have a full record if ($quotes % 2 == 0) { $quotes = 0; chomp $record; my @fields; my $unbalanced = 0; for my $field (split /,/, $record) { my $count = $field =~ s/"//g; if ($count % 2) { if ($unbalanced) { $unbalanced = 0; $fields[-1] .= ",$field"; next; } $unbalanced = 1; push @fields, $field; next; } if ($unbalanced) { $fields[-1] .= ",$field"; } else { push @fields, $field; } } push @records, { whole => $record, fields => [EMAIL PROTECTED]; $record = ""; } } for my $rec (@records) { print join "|", @{$rec->{fields}},"\n===\n"; } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/