Re: Outlook CSV Parser
On Wed, 30 May 2007 12:14:52 -0400, Chas Owens [EMAIL PROTECTED] wrote: snip/ ... http://www.perl.com/pub/a/2004/06/25/profiling.html http://search.cpan.org/~nwclark/perl-5.8.8/utils/dprofpp.PL snip/ The following code seems to speed up the parsing by two orders of magnitude (2.214 seconds snip/ found 38 fields in 10 records found 39 fields in 9 records found 40 fields in 12 records found snip/ ===code to parse test file=== #!/usr/bin/perl use strict; use warnings; my $record = ; my $quotes = 0; my @records; while (defined (my $line = )) { next if $record eq and $line =~ /^\s*$/; $record .= $line; #count the number of quotes snip/ Did not I thank you for this ? Shame on me.. Thanks all the same.. your code actually helped me several folds.. * It also taught me to think of optimisation more seriously. * your code did not time out my cgi :-P Thanks a tonne! -- Cheers, Laxminarayan G Kamath A e-mail: [EMAIL PROTECTED] Work URL: http://deeproot.in -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On Wed, 30 May 2007 10:38:40 +0200, Dr.Ruud [EMAIL PROTECTED] wrote: You forgot to supply a link to such a file. Or show a __DATA__ section for testing. http://download.deeproot.in/~kamathln/outlook-encrtypted-sample.csv -- Cheers, Laxminarayan G Kamath A e-mail: [EMAIL PROTECTED] Work URL: http://deeproot.in -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On 05/31/2007 02:32 AM, Laxminarayan G Kamath A wrote: http://download.deeproot.in/~kamathln/outlook-encrtypted-sample.csv Well I asked for it. :-) It's impossible to tell where one record ends and another record begins with that file. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
Laxminarayan G Kamath A: Ruud: You forgot to supply a link to such a file. Or show a __DATA__ section for testing. http://download.deeproot.in/~kamathln/outlook-encrtypted-sample.csv OK, lets check how wellformed it is: perl -we' local $/; $_ = ; s/[^]*//g; s/(?=,)[^,]+(?=,)//g; s/^[^,]*,+//; print ' outlook-encrtypted-sample.csv That prints: --- 3snji [EMAIL PROTECTED] --- so AFAICS there is a problem somewhere at the end. Maybe try a zipped version? -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
Mumia W. schreef: Laxminarayan G Kamath A: http://download.deeproot.in/~kamathln/outlook-encrtypted-sample.csv Well I asked for it. :-) It's impossible to tell where one record ends and another record begins with that file. Maybe not, because the rule was that it ends at newline, unless inside quotes. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On 05/30/2007 12:40 AM, Laxminarayan G Kamath A wrote: Hi PERLers, We here at DeepRoot Linux were trying to parse Outlook's csv so that I can add them to ldap addressbook.. [...] The Perl module Text::CSV_XS would make your work much simpler, and it might execute a little faster. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On Wed, 30 May 2007 01:26:30 -0500, Mumia W. mumia.w.18.spam [EMAIL PROTECTED] wrote: The Perl module Text::CSV_XS would make your work much simpler, and it might execute a little faster. Thank you for pointing out .. but we have already tried it! Unfortunately, it failed to seperate the records in the right fashion. We have also tried the several more modules from CPAN.. and they were not able to parse the OutLook's CSV. If you read my mail again, you might find that I already mentioned that we tried several modules before falling back to writing our own code. What I am expecting is help with the variant of the regex I used as the condition for while loop. I am sure If we modify that regexp a little bit, then we can just use it on the record like this : $_ = $record; @fields = /regexp/g; I tried a lot of different ways but just could not get the right regexp :-(. -- Cheers, Laxminarayan G Kamath A e-mail: [EMAIL PROTECTED] Work URL: http://deeproot.in -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
Laxminarayan G Kamath A schreef: The stubling blocks : there are several types of problems in Outlook's CSV .. You forgot to supply a link to such a file. Or show a __DATA__ section for testing. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On 05/30/2007 03:04 AM, Laxminarayan G Kamath A wrote: [...] I tried a lot of different ways but just could not get the right regexp :-(. I reiterate what the eminent Dr. Ruud said. I need some data to play with before I play with the code you posted. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On Wed, 2007-05-30 at 13:34 +0530, Laxminarayan G Kamath A wrote: What I am expecting is help with the variant of the regex I used as the condition for while loop. I am sure If we modify that regexp a little bit, then we can just use it on the record like this : $_ = $record; @fields = /regexp/g; I tried a lot of different ways but just could not get the right regexp :-(. CSV is a horrible format. Far too unreliable, we have exported CSV from excel that imported differently into excel. Is there another option, eg connecting to Outlook via a remote connection? Is there another format available? I doubt a simple regex will do it if the CSV modules do not work. What data do you have problems with? Without samples there is nothing we can do. -- Ken Foskey FOSS developer -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On 5/30/07, Laxminarayan G Kamath A [EMAIL PROTECTED] wrote: snip Any ways of optimising it further? snip Premature optimization is the root of all evil. Have you profiled the code yet? If not then here is some documentation that will point you in the right direction http://www.perl.com/pub/a/2004/06/25/profiling.html http://search.cpan.org/~nwclark/perl-5.8.8/utils/dprofpp.PL But while I am looking lets see what is going on. snip 1. One line need not be one record. They may cointain multine fields. 2. A sigh of relief but : only multi-line fields are wrapped in double quotes. 3. commas are both inside and outside the fields. the ones in the fileds must not be treated as seperator - again fields with commans are wrapped in double quotes. snip The following code seems to speed up the parsing by two orders of magnitude (2.214 seconds for the old code and 0.036 seconds for this code on 100 records). Also, there seems to be a bug in your original code. I setup a test file with a 100 records of 30 fields each and it found found 33 fields in 1 records found 34 fields in 1 records found 36 fields in 3 records found 37 fields in 5 records found 38 fields in 10 records found 39 fields in 9 records found 40 fields in 12 records found 41 fields in 17 records found 42 fields in 15 records found 43 fields in 13 records found 44 fields in 7 records found 45 fields in 5 records found 46 fields in 1 records found 48 fields in 1 records ===code to generate test file=== #!/usr/bin/perl use strict; use warnings; my $fields= 30; my $fieldlen = 30; my @fieldtype = qw(normal quoted comma); my $records = shift; for my $rec (1 .. $records) { for my $field (1 .. $fields) { my $type = $fieldtype[int rand @fieldtype]; if ($type eq 'normal') { print 'n' x $fieldlen, ,; } elsif ($type eq 'quoted') { print ''; my $i = 0; until ($i $fieldlen) { my $len = int rand $fieldlen; print 'q' x $len, \n; $i += $len; } print ','; } elsif ($type eq 'comma') { print ''; my $i = 0; until ($i == $fieldlen) { my $len = int rand $fieldlen; $len = $fieldlen - $i if $i+$len $fieldlen; print 'c' x ($len/2), ',', 'c' x ($len/2), \n; $i += $len; } print ','; } } print \n; } ===code to parse test file=== #!/usr/bin/perl use strict; use warnings; my $record = ; my $quotes = 0; my @records; while (defined (my $line = )) { next if $record eq and $line =~ /^\s*$/; $record .= $line; #count the number of quotes $quotes += () = $line =~ //g; #if $quotes is even then we have a full record if ($quotes % 2 == 0) { $quotes = 0; chomp $record; my @fields; my $unbalanced = 0; for my $field (split /,/, $record) { my $count = $field =~ s///g; if ($count % 2) { if ($unbalanced) { $unbalanced = 0; $fields[-1] .= ,$field; next; } $unbalanced = 1; push @fields, $field; next; } if ($unbalanced) { $fields[-1] .= ,$field; } else { push @fields, $field; } } push @records, { whole = $record, fields = [EMAIL PROTECTED]; $record = ; } } for my $rec (@records) { print join |, @{$rec-{fields}},\n===\n; } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On 5/30/07, Ken Foskey [EMAIL PROTECTED] wrote: snip CSV is a horrible format. Far too unreliable, we have exported CSV from excel that imported differently into excel. snip Just pedantic nitpick, but CSV is an incredibly reliable format, the problem is find programs that actually use CSV rather than a CSV-like format. It works out to the same thing, but it isn't CSV's fault. For an example of a programmer using a CSV-like format where he/she should be using the real thing look at my other post on this thread. My code fails to handle escaped double quotes correctly. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Outlook CSV Parser
Hi PERLers, We here at DeepRoot Linux were trying to parse Outlook's csv so that I can add them to ldap addressbook.. After several futile tries around with lots of built in packages, we decided it was up to us to device an algorithm. As time was of importance , we wrote a well it works! character by character parsing perl algorithm. Wondering about how I could make it run faster, I thought I will give regular expressions a try. The attached file is what I have come up to.. but it still takes more than 10 seconds on my 1.6 GHZ P4 with 1G RAM to parse a 6500+ lines of CSV . and seperate them out .. Let alone importing them to LDAP. Any ways of optimising it further? The stubling blocks : there are several types of problems in Outlook's CSV .. 1. One line need not be one record. They may cointain multine fields. 2. A sigh of relief but : only multi-line fields are wrapped in double quotes. 3. commas are both inside and outside the fields. the ones in the fileds must not be treated as seperator - again fields with commans are wrapped in double quotes. I hope I am on the right mailing list.. Else, please direct me to the proper one. -- Cheers, Laxminarayan G Kamath A e-mail: [EMAIL PROTECTED] Work URL: http://deeproot.in -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Outlook CSV Parser
On Wed, 30 May 2007 11:10:00 +0530, Laxminarayan G Kamath A [EMAIL PROTECTED] wrote: The attached file is what I have come up to.. but it still takes more ... Had forgotten to attach the file.. -- Cheers, Laxminarayan G Kamath A e-mail: [EMAIL PROTECTED] Work URL: http://deeproot.in #!/usr/bin/perl my $readline, $line; $readline = ; $line = ; @records = (); @fields = (); while (STDIN) { $readline = $_; if (($line eq ) ($readline =~ /^\s*$/)) { next; } $line = $line . $readline; @fields=(); if ( $line =~ /(^[^\]*$)|(^([^\]*(\[^\]*\)[^\]*)+$)/s ) { chomp($line); print $line\n; print ---\n; push (@records, $line); $last = ; $_ = $line; while (/\B\([^\]*)\\B|([^\,\]*)/g) { $res = $1$2; push (@fields, $res); if ($res ne ) { /\B\([^\]*)\\B|([^\,\]*)/g; } } map({print $_\n\n} @fields); print ===\n; $line = ; } } -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/