Re: Problem with regex
Hi Barry, On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik wrote: > Below is some test code that will be used in a larger program. > > In the code below I have a regular expression who's intent is to look > for " <1 or more characters> , <1 or more characters> " and replace the > comma with |. (the white space is just for clarity). > > IAC, the regex works, that is, it matches, but it only replaces the > final match. I have just re-read the camel book section on regexes and > have tried many variations, but apparently I'm too close to it to see > what must be a simple answer. > > BTW, if you guys think I'm posting too often, please say so. > > Barry Brevik > > use strict; > use warnings; > > my $csvLine = qq| "col , 1" , col___'2' , col-3, "col,4"|; > > print "before comma substitution: $csvLine\n\n"; > > $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; > > print "after comma substitution.: $csvLine\n\n"; > Tobias already gave you a solution and I also think using Text::CSV or Text::CSV_XS is way better for this task thank plain regexes, For example one day you might encounter a line that has an embedded " escaped using \. Then even if your regex worked earlier this can kill it. And what if there was an | in the original string? Nevertheless let me also try to explain the issue that you had with the regex as this can come up in other situations. First, I'd probably use plain " instead of \x22 as that will be probably easier to the reader to know what are you looking for. Second, the /s has probably no value at the end. That only changes the behavior of . to also match newlines.If you don't have newlines in your string (e.g. because you are processing a file line by line) then the /s has no effect. That makes this expression: $csvLine =~ s/(".+),(.+")/$1|$2/; Then, before going on you need to check what does this really match so I replaced the above with if ($csvLine =~ s/(".+),(.+")/$1|$2/s ){ print "match: <$1><$2>\n"; } and got match: <"col , 1" , col___'2' , col-3, "col><4"> You see, the .+ is greedy, it match from the first " as much as it could. You'd be better of telling it to match as little as possible by adding an extra ? after the quantifier. if ($csvLine =~ /(".+?),(.+?")/ ){ print "match: <$1><$2>\n"; } prints this: match: <"col >< 1"> Finally you need to do the substitution globally, so not only once but as many times as possible: $csvLine =~ s/(".+?),(.+?")/$1|$2/g; And the output is after comma substitution.: "col | 1" , col___'2' , col-3, "col|4" But again, for CSV files that can have embedded, it is better to use one of the real CSV parsers. regards Gabor -- Gabor Szabo http://szabgab.com/perl_tutorial.html ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Problem with regex
The whitespaces around the separator characters are not allowed in strict CSV. Try this below. Cheers - Tobias use strict; use warnings; use Text::CSV; my $csv = Text::CSV->new({ allow_whitespace => 1 }); open my $fh, "<&DATA" or die "Can't access DATA: $!\n"; while (my $row = $csv->getline($fh)) { print join("\n",@$row),"\n"; } $csv->eof or $csv->error_diag(); __END__ "col , 1" , col___'2' , col-3, "col,4" -Original Message- From: perl-win32-users-boun...@listserv.activestate.com [mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Barry Brevik Sent: Wednesday, November 09, 2011 5:35 PM To: perl Win32-users Subject: Problem with regex Below is some test code that will be used in a larger program. What I am trying to do is process lines from a CSV file where some of the 'cells' have commas embedded in the (see sample code below). I might have used text::CSV but as far as I can tell that module also can not deal with embedded commas. In the code below I have a regular expression who's intent is to look for " <1 or more characters> , <1 or more characters> " and replace the comma with |. (the white space is just for clarity). IAC, the regex works, that is, it matches, but it only replaces the final match. I have just re-read the camel book section on regexes and have tried many variations, but apparently I'm too close to it to see what must be a simple answer. BTW, if you guys think I'm posting too often, please say so. Barry Brevik use strict; use warnings; my $csvLine = qq| "col , 1" , col___'2' , col-3, "col,4"|; print "before comma substitution: $csvLine\n\n"; $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; print "after comma substitution.: $csvLine\n\n"; ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Problem with regex
Below is some test code that will be used in a larger program. What I am trying to do is process lines from a CSV file where some of the 'cells' have commas embedded in the (see sample code below). I might have used text::CSV but as far as I can tell that module also can not deal with embedded commas. In the code below I have a regular expression who's intent is to look for " <1 or more characters> , <1 or more characters> " and replace the comma with |. (the white space is just for clarity). IAC, the regex works, that is, it matches, but it only replaces the final match. I have just re-read the camel book section on regexes and have tried many variations, but apparently I'm too close to it to see what must be a simple answer. BTW, if you guys think I'm posting too often, please say so. Barry Brevik use strict; use warnings; my $csvLine = qq| "col , 1" , col___'2' , col-3, "col,4"|; print "before comma substitution: $csvLine\n\n"; $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s; print "after comma substitution.: $csvLine\n\n"; ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
AW: Problem with regex
> use strict; > use warnings; > > my $Data = 'Hello, i am a litte String.$ Please format me.$$$ > I am the end of the String.$$ And i am the last!'; > > $Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\\$2/gm; > $Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\$2/gm; > $Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\$2/gm; > print "Data: $Data \n"; > > ___END___ > > Notice, I change the double quotes to single quotes for $Data. > For me, the regex is clear. But if not for you, I can explain. > There are maybe some "better" solution, this is just a quick one. > Hello, First of all, many thanks for our quick and helpfully replies. I tried Karl-Heinz's solution and it works very good. Karl-Heinz: Yes the regex is clear to me, the solution with $1 & $2 was a good idea regards Holgi p.s. next time i should first take the "Owls" with me in the bath tub ;-) ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Problem with regex
At 09:47 AM 5/12/2006, Yekhande, Seema \(MLITS\) wrote: Holger, Actually $ is a special character in string in perl. So, if the $ is there in the input, you will have to always write it with the leading escape character. So, make your input will be like this, my $data = "Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!"; It will solve your problem. $ is only special in strings with double quote marks ( " ) around them. I think you meant to say: my $data = "Hello, i am a little String.\$ Please format me.\$\$\$ I am the end of the String.\$\$ And i am the last!"; That works, but, you can also use: my $data = 'Hello, i am a little String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!'; (Note the type of quote mark used) If you were to print out the original string data like this: my $data = "Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!"; print("$data\n"); you would get this: Hello, i am a litte String. format me. I am the end of the String.1896 And i am the last! i.e., the original string did not have any '$' characters in it at all. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Problem with regex
Holger, This worked for me note that you need to escape the $ characters in your string. The "3398" number is actually the PID of the perl process returned from the special variable $$ ... since you didn't escape the $ characters.. my $Data = "" i am a litte String.\$ Please format me.\$\$\$ I am the endof the String.\$\$ And i am the last!"; $Data =~ s/[\$]{3}//;$Data =~ s/[\$]{2}//;$Data =~ s/\$//; print $Data ."\n"; Hope that helps... Andy Speagle - On 5/12/06, Holger Wöhle <[EMAIL PROTECTED]> wrote: Hello,under Windows with ActiveState Perl i have a strange problem with a regex:Assuming the following String: my $Data = "" i am a litte String.$ Please format me.$$$ I am the endof the String.$$ And i am the last!"The regex should replace $ with the string , $$ with and $$$ with (please don't think about the why)If tried to use the following:$data =~ s/\$\$\$//gm; #should catch every occurrence of data =~ s/\$\$//gm; #should catch $$ $data =~ s/\$//gm; #the restSo data should look after the first regex:Hello, i am a litte String.$Please format me.I am the end of theString.$$And i am the last!And after the second: Hello, i am a litte String.$Please format me.I am the end of theString.And i am the last!And the last:Hello, i am a litte String.Please format me.I am the end of the String.And i am the last!But all regexes i tried (the one above are only one try) failed! When iprint out the string it looks like:Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last!Where the number after String. differs between every run.Can someone help me ?With regarsHolger___Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Problem with regex
The code below worked for me, after using single quotes around the original string to prevent any interpolation (it's always a good practice to print out the original string to verify that it's what you thought it was), and, of course, $Data is not the same as $data. At 08:38 AM 5/12/2006, Holger Wöhle wrote: Hello, under Windows with ActiveState Perl i have a strange problem with a regex: Assuming the following String: my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!" The regex should replace $ with the string , $$ with and $$$ with (please don't think about the why) If tried to use the following: $data =~ s/\$\$\$//gm; #should catch every occurrence of $$$ $data =~ s/\$\$//gm; #should catch $$ $data =~ s/\$//gm; #the rest So data should look after the first regex: Hello, i am a litte String.$Please format me.I am the end of the String.$$And i am the last! And after the second: Hello, i am a litte String.$Please format me.I am the end of the String.And i am the last! And the last: Hello, i am a litte String.Please format me.I am the end of the String.And i am the last! But all regexes i tried (the one above are only one try) failed! When i print out the string it looks like: Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last! Where the number after String. differs between every run. Can someone help me ? With regars Holger ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs Scanned for Spam and Viruses. PCG Information Technology Services. ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: Problem with regex
Holger, Actually $ is a special character in string in perl. So, if the $ is there in the input, you will have to always write it with the leading escape character. So, make your input will be like this, my $data = "Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!"; It will solve your problem. Thanks, Seema GPCT|TDDS|AIS|SPCM3 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Holger Wöhle Sent: Friday, May 12, 2006 6:09 PM To: perl-win32-users@listserv.ActiveState.com Subject: Problem with regex Hello, under Windows with ActiveState Perl i have a strange problem with a regex: Assuming the following String: my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!" The regex should replace $ with the string , $$ with and $$$ with (please don't think about the why) If tried to use the following: $data =~ s/\$\$\$//gm; #should catch every occurrence of $$$ $data =~ s/\$\$//gm; #should catch $$ $data =~ s/\$//gm; #the rest So data should look after the first regex: Hello, i am a litte String.$Please format me.I am the end of the String.$$And i am the last! And after the second: Hello, i am a litte String.$Please format me.I am the end of the String.And i am the last! And the last: Hello, i am a litte String.Please format me.I am the end of the String.And i am the last! But all regexes i tried (the one above are only one try) failed! When i print out the string it looks like: Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last! Where the number after String. differs between every run. Can someone help me ? With regars Holger ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs If you are not an intended recipient of this e-mail, please notify the sender, delete it and do not read, act upon, print, disclose, copy, retain or redistribute it. Click here for important additional terms relating to this e-mail. http://www.ml.com/email_terms/ ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: Problem with regex
Hello, my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!" The regex should replace $ with the string , $$ with and $$$ with (please don't think about the why) If tried to use the following: $data =~ s/\$\$\$//gm; #should catch every occurrence of $$$ $data =~ s/\$\$//gm; #should catch $$ $data =~ s/\$//gm; #the rest So data should look after the first regex: Hello, i am a litte String.$Please format me.I am the end of the String.$$And i am the last! And after the second: Hello, i am a litte String.$Please format me.I am the end of the String.And i am the last! And the last: Hello, i am a litte String.Please format me.I am the end of the String.And i am the last! But all regexes i tried (the one above are only one try) failed! When i print out the string it looks like: Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last! Where the number after String. differs between every run. Can someone help me ? This works at least on my machine: use strict; use warnings; my $Data = 'Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!'; $Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\\$2/gm; $Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\$2/gm; $Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\$2/gm; print "Data: $Data \n"; ___END___ Notice, I change the double quotes to single quotes for $Data. For me, the regex is clear. But if not for you, I can explain. There are maybe some "better" solution, this is just a quick one. Regards Karl-Heinz ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Problem with regex
Hello, under Windows with ActiveState Perl i have a strange problem with a regex: Assuming the following String: my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end of the String.$$ And i am the last!" The regex should replace $ with the string , $$ with and $$$ with (please don't think about the why) If tried to use the following: $data =~ s/\$\$\$//gm; #should catch every occurrence of $$$ $data =~ s/\$\$//gm; #should catch $$ $data =~ s/\$//gm; #the rest So data should look after the first regex: Hello, i am a litte String.$Please format me.I am the end of the String.$$And i am the last! And after the second: Hello, i am a litte String.$Please format me.I am the end of the String.And i am the last! And the last: Hello, i am a litte String.Please format me.I am the end of the String.And i am the last! But all regexes i tried (the one above are only one try) failed! When i print out the string it looks like: Hello, i am a litte String. Please format me. I am the end of the String.3398 And i am the last! Where the number after String. differs between every run. Can someone help me ? With regars Holger ___ Perl-Win32-Users mailing list Perl-Win32-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs