Regex Help Please!
I am trying to come up with a script to convert this output from RRDTool DUMP to a format which lends itself to import into Excel 97. Unfortunately, I am just getting started with Perl and do not have a clear enough grasp of how to configure this so that it strips out the unwanted parts and formats it correctly. I would like to be able to feed a file into this script, and then receive a comma delimited formatted file as output. Can anyone point me in the right direction? I have the O'reilly camel book, but when I read the section on Regex, I feel like an idiot! :( Input file: | (misc header information I want to delete) #This is how the data I want to pull out is formatted !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 -- rowv 6.00e+001 /vv 6.90e+001 /v/row |--- Output wanted is: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, 6.00e+001, 6.90e+001 |-- Thanks in advance. Gordon -- ___ 1 cent a minute calls anywhere in the U.S.! http://www.getpennytalk.com/cgi-bin/adforward.cgi?p_key=RG9853KJurl=http://www.getpennytalk.com ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users
RE: Regex Help Please!
Here is a simplistic approach. May want more edits, but is a starting place. Placing the data for testing under DATA: while ( DATA ) { chomp; next if ( /^\s*$/ ); # bypass blank lines if ( /^!--\s(\d+.+)\s\/\s(\d+)\s-- rowv (.+) \/vv (.+) \/v\/row/ ) { printf %-s, %-s, %-s, %-s\n, $1, $2, $3, $4; }else { printf No hit on data:\n%-s\n, $_; } } __DATA__ !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 -- rowv 6.00e+001 /vv 6.90e+001 /v/row ^--- Script ends here Output: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, 6.00e+001, 6.90e+001 Wags ;) -Original Message- From: Gordon Brandt [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 10, 2002 10:17 To: [EMAIL PROTECTED] Subject: Regex Help Please! I am trying to come up with a script to convert this output from RRDTool DUMP to a format which lends itself to import into Excel 97. Unfortunately, I am just getting started with Perl and do not have a clear enough grasp of how to configure this so that it strips out the unwanted parts and formats it correctly. I would like to be able to feed a file into this script, and then receive a comma delimited formatted file as output. Can anyone point me in the right direction? I have the O'reilly camel book, but when I read the section on Regex, I feel like an idiot! :( Input file: | (misc header information I want to delete) #This is how the data I want to pull out is formatted !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 -- rowv 6.00e+001 /vv 6.90e+001 /v/row |--- Output wanted is: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, 6.00e+001, 6.90e+001 |-- Thanks in advance. Gordon -- ___ 1 cent a minute calls anywhere in the U.S.! http://www.getpennytalk.com/cgi-bin/adforward.cgi?p_key=RG9853KJurl=http://www.getpennytalk.com ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users
RE: Regex Help Please!
Works but not if you have more or fewer than 2 values in a row. Do you? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Wagner-David Sent: Thursday, January 10, 2002 1:31 PM To: 'Gordon Brandt'; [EMAIL PROTECTED] Subject: RE: Regex Help Please! Here is a simplistic approach. May want more edits, but is a starting place. Placing the data for testing under DATA: while ( DATA ) { chomp; next if ( /^\s*$/ ); # bypass blank lines if ( /^!--\s(\d+.+)\s\/\s(\d+)\s-- rowv (.+) \/vv (.+) \/v\/row/ ) { printf %-s, %-s, %-s, %-s\n, $1, $2, $3, $4; }else { printf No hit on data:\n%-s\n, $_; } } __DATA__ !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 -- rowv 6.00e+001 /vv 6.90e+001 /v/row ^--- Script ends here Output: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, 6.00e+001, 6.90e+001 Wags ;) -Original Message- From: Gordon Brandt [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 10, 2002 10:17 To: [EMAIL PROTECTED] Subject: Regex Help Please! I am trying to come up with a script to convert this output from RRDTool DUMP to a format which lends itself to import into Excel 97. Unfortunately, I am just getting started with Perl and do not have a clear enough grasp of how to configure this so that it strips out the unwanted parts and formats it correctly. I would like to be able to feed a file into this script, and then receive a comma delimited formatted file as output. Can anyone point me in the right direction? I have the O'reilly camel book, but when I read the section on Regex, I feel like an idiot! :( Input file: | (misc header information I want to delete) #This is how the data I want to pull out is formatted !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 -- rowv 6.00e+001 /vv 6.90e+001 /v/row |--- Output wanted is: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, 6.00e+001, 6.90e+001 |-- Thanks in advance. Gordon -- ___ 1 cent a minute calls anywhere in the U.S.! http://www.getpennytalk.com/cgi-bin/adforward.cgi?p_key=RG9853KJu rl=http://www.getpennytalk.com ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users
RE: Regex Help Please!
I worked from the data you provided. What can the data really look like? Provide some other and will make mod to handle(hopefully). Wags ;) -Original Message- From: Ron Hartikka [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 10, 2002 10:49 To: 'Gordon Brandt'; [EMAIL PROTECTED] Subject: RE: Regex Help Please! Works but not if you have more or fewer than 2 values in a row. Do you? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Wagner-David Sent: Thursday, January 10, 2002 1:31 PM To: 'Gordon Brandt'; [EMAIL PROTECTED] Subject: RE: Regex Help Please! Here is a simplistic approach. May want more edits, but is a starting place. Placing the data for testing under DATA: while ( DATA ) { chomp; next if ( /^\s*$/ ); # bypass blank lines if ( /^!--\s(\d+.+)\s\/\s(\d+)\s-- rowv (.+) \/vv (.+) \/v\/row/ ) { printf %-s, %-s, %-s, %-s\n, $1, $2, $3, $4; }else { printf No hit on data:\n%-s\n, $_; } } __DATA__ !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 -- rowv 6.00e+001 /vv 6.90e+001 /v/row ^--- Script ends here Output: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, 6.00e+001, 6.90e+001 Wags ;) -Original Message- From: Gordon Brandt [mailto:[EMAIL PROTECTED]] Sent: Thursday, January 10, 2002 10:17 To: [EMAIL PROTECTED] Subject: Regex Help Please! I am trying to come up with a script to convert this output from RRDTool DUMP to a format which lends itself to import into Excel 97. Unfortunately, I am just getting started with Perl and do not have a clear enough grasp of how to configure this so that it strips out the unwanted parts and formats it correctly. I would like to be able to feed a file into this script, and then receive a comma delimited formatted file as output. Can anyone point me in the right direction? I have the O'reilly camel book, but when I read the section on Regex, I feel like an idiot! :( Input file: | (misc header information I want to delete) #This is how the data I want to pull out is formatted !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 -- rowv 6.00e+001 /vv 6.90e+001 /v/row |--- Output wanted is: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, 6.00e+001, 6.90e+001 |-- Thanks in advance. Gordon -- ___ 1 cent a minute calls anywhere in the U.S.! http://www.getpennytalk.com/cgi-bin/adforward.cgi?p_key=RG9853KJu rl=http://www.getpennytalk.com ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users
RE: Regex Help Please!
:: -Original Message- :: From: Gordon Brandt [mailto:[EMAIL PROTECTED]] :: Sent: Thursday, January 10, 2002 12:17 PM :: To: [EMAIL PROTECTED] :: Subject: Regex Help Please! :: :: [-snip-] :: :: Input file: :: | :: :: (misc header information I want to delete) :: :: #This is how the data I want to pull out is formatted :: !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 :: -- rowv NaN /vv NaN /v/row :: !-- 2002-01-08 09:40:00 Eastern Standard Time / 1010500800 :: -- rowv 6.00e+001 /vv 6.90e+001 /v/row :: :: |--- :: :: Output wanted is: :: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN :: 2002-01-08 09:40:00 Eastern Standard Time, 1010500800, :: 6.00e+001, 6.90e+001 :: :: |-- The people on this list like nothing more, it seems, than chewing on a regular expression puzzle, so you've come to the right place. However, you'll get better results out of your request if you can fill in some more information about the input source. Regular expressions get more complicated when they have to deal with more variable/generic data forms, but they're relatively easy if you build one for a single specific case. Programmers in general will usually go for the least amount of complexity in a solution, while at the same time being able to handle all possible scenarios. So to get good regex help, we all need to understand what variables there are in the scenarios. Some issues that come to mind regarding the input data in your problem are: 1) Is the data guaranteed to contain one record per line? Can data ever spread to 2 or more lines? 2) Inside the v.../v tags are values (or not). Are there ALWAYS EXACTLY TWO v/v groups? 3) Can the data ever contain quotation marks or commas? This is important to know, when outputting to CSV. 4) Are there any other ways that the input data may vary from EXACTLY the format you've presented in your sample? - Aaron -- Aaron Brown - [EMAIL PROTECTED] Middleware Programmer University of Kansas 785-864-0423 http://www.ku.edu/~aaronb/ ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users
Re: Regex Help Please!
A less elegant (perhaps) solution, but effective, no matter how many rows / values: while() { s/\r//g; # I hate that carriage return chomp; next if(!/^.*\!--/); # skip non-matching lines my @values; my $ts = $1 if(s/\!--\s*(.*?)\s*--//); my($ts1,$ts2) = split(/\s*\/\s*/,$ts); while(s/row(.*?)\/row(.*)/$2/g) { my $row = $1; while($row =~ s/v\s*(.*?)\s*\/v(.*)/$2/g) { push(@values,$1); } } my $val_str = join(', ',@values); print($ts1, $ts2, $val_str\n); } on input: !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv NaN /vv NaN /v/row !-- 2002-01-08 09:35:00 Eastern Standard Time / 1010500500 -- rowv 59 /vv 6000 /vv 700/v/row returns: 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, NaN, NaN 2002-01-08 09:35:00 Eastern Standard Time, 1010500500, 59, 6000, 700 ___ Perl-Win32-Users mailing list [EMAIL PROTECTED] http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users