Re: Problem with regex

2011-11-10 Thread Gabor Szabo
Hi Barry,

On Thu, Nov 10, 2011 at 2:34 AM, Barry Brevik  wrote:
> Below is some test code that will be used in a larger program.
>
> In the code below I have a regular expression who's intent is to look
> for  " <1 or more characters> , <1 or more characters> " and replace the
> comma with |. (the white space is just for clarity).
>
> IAC, the regex works, that is, it matches, but it only replaces the
> final match. I have just re-read the camel book section on regexes and
> have tried many variations, but apparently I'm too close to it to see
> what must be a simple answer.
>
> BTW, if you guys think I'm posting too often, please say so.
>
> Barry Brevik
> 
> use strict;
> use warnings;
>
> my $csvLine = qq|  "col , 1"  ,  col___'2' ,  col-3, "col,4"|;
>
> print "before comma substitution: $csvLine\n\n";
>
> $csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;
>
> print "after comma substitution.: $csvLine\n\n";
>

Tobias already gave you a solution and
I also think using Text::CSV or Text::CSV_XS is way better for this task
thank plain regexes, For example one day you might encounter
a line that has an embedded " escaped using \.
Then even if your regex worked  earlier this can kill it.
And what if there was an | in the original string?


Nevertheless let me also try to explain the issue that you had
with the regex as this can come up in other situations.

First, I'd probably use plain " instead of \x22 as that will be
probably easier to the reader to know what are you looking for.

Second, the /s has probably no value at the end. That only changes
the behavior of . to also match newlines.If you don't have newlines in
your string (e.g. because you are processing a file line by line)
then the /s has no effect. That makes this expression:

$csvLine =~ s/(".+),(.+")/$1|$2/;

Then, before going on you need to check what does this really match so
I replaced
the above with

 if ($csvLine =~ s/(".+),(.+")/$1|$2/s ){
   print "match: <$1><$2>\n";
 }

and got

match: <"col , 1"  ,  col___'2' ,  col-3, "col><4">

You see, the .+ is greedy, it match from the first " as much as it could.
You'd be better of telling it to match as little as possible by adding
an extra ? after the quantifier.
 if ($csvLine =~ /(".+?),(.+?")/ ){
   print "match: <$1><$2>\n";
 }

prints this:
match: <"col >< 1">

Finally you need to do the substitution globally, so not only once but
as many times
as possible:

 $csvLine =~ s/(".+?),(.+?")/$1|$2/g;

And the output is

after comma substitution.:   "col | 1"  ,  col___'2' ,  col-3, "col|4"


But again, for CSV files that can have embedded, it is better to use
one of the real CSV parsers.

regards
  Gabor

-- 
Gabor Szabo
http://szabgab.com/perl_tutorial.html
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Problem with regex

2011-11-09 Thread Tobias Hoellrich
The whitespaces around the separator characters are not allowed in strict CSV. 
Try this below.

Cheers - Tobias

use strict;
use warnings;
use Text::CSV;

my $csv = Text::CSV->new({ allow_whitespace => 1 });
open my $fh, "<&DATA" or die "Can't access DATA: $!\n";
while (my $row = $csv->getline($fh)) {
print join("\n",@$row),"\n";
}
$csv->eof or $csv->error_diag();

__END__
"col , 1"  ,  col___'2' ,  col-3, "col,4"

-Original Message-
From: perl-win32-users-boun...@listserv.activestate.com 
[mailto:perl-win32-users-boun...@listserv.activestate.com] On Behalf Of Barry 
Brevik
Sent: Wednesday, November 09, 2011 5:35 PM
To: perl Win32-users
Subject: Problem with regex

Below is some test code that will be used in a larger program.

What I am trying to do is process lines from a CSV file where some of the 
'cells' have commas embedded in the (see sample code below). I might have used 
text::CSV but as far as I can tell that module also can not deal with embedded 
commas.

In the code below I have a regular expression who's intent is to look for  " <1 
or more characters> , <1 or more characters> " and replace the comma with |. 
(the white space is just for clarity).

IAC, the regex works, that is, it matches, but it only replaces the final 
match. I have just re-read the camel book section on regexes and have tried 
many variations, but apparently I'm too close to it to see what must be a 
simple answer.

BTW, if you guys think I'm posting too often, please say so.

Barry Brevik

use strict;
use warnings;
 
my $csvLine = qq|  "col , 1"  ,  col___'2' ,  col-3, "col,4"|;
 
print "before comma substitution: $csvLine\n\n";
 
$csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;
 
print "after comma substitution.: $csvLine\n\n";

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Problem with regex

2011-11-09 Thread Barry Brevik
Below is some test code that will be used in a larger program.

What I am trying to do is process lines from a CSV file where some of
the 'cells' have commas embedded in the (see sample code below). I might
have used text::CSV but as far as I can tell that module also can not
deal with embedded commas.

In the code below I have a regular expression who's intent is to look
for  " <1 or more characters> , <1 or more characters> " and replace the
comma with |. (the white space is just for clarity).

IAC, the regex works, that is, it matches, but it only replaces the
final match. I have just re-read the camel book section on regexes and
have tried many variations, but apparently I'm too close to it to see
what must be a simple answer.

BTW, if you guys think I'm posting too often, please say so.

Barry Brevik

use strict;
use warnings;
 
my $csvLine = qq|  "col , 1"  ,  col___'2' ,  col-3, "col,4"|;
 
print "before comma substitution: $csvLine\n\n";
 
$csvLine =~ s/(\x22.+),(.+\x22)/$1|$2/s;
 
print "after comma substitution.: $csvLine\n\n";

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


AW: Problem with regex

2006-05-14 Thread Holger Wöhle
> use strict;
> use warnings;
> 
> my $Data = 'Hello, i am a litte String.$ Please format me.$$$ 
> I am the end of the String.$$ And i am the last!';
> 
> $Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\\$2/gm;
> $Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\$2/gm;
> $Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\$2/gm;
> print "Data: $Data \n";
> 
> ___END___
> 
> Notice, I change the double quotes to single quotes for $Data.
> For me, the regex is clear. But if not for you, I can explain.
> There are maybe some "better" solution, this is just a quick one.
> 

Hello,
First of all, many thanks for our quick and helpfully replies.
I tried Karl-Heinz's solution and it works very good. 
Karl-Heinz: Yes the regex is clear to me, the solution with $1 & $2 was a 
good idea 

regards
Holgi

p.s. next time i should first take the "Owls" with me in the bath tub ;-) 




___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Problem with regex

2006-05-12 Thread John Deighan

At 09:47 AM 5/12/2006, Yekhande, Seema \(MLITS\) wrote:

Holger,

Actually $ is a special character in string in perl. So, if the $ is 
there in the input,

you will have to always write it with the leading escape character.

So, make your input will be like this,
my $data = "Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!";

It will solve your problem.


$ is only special in strings with double quote marks ( " ) around 
them. I think you meant to say:


my $data = "Hello, i am a little String.\$ Please format me.\$\$\$ I 
am the end of the String.\$\$ And i am the last!";


That works, but, you can also use:

my $data = 'Hello, i am a little String.$ Please format me.$$$ I am 
the end of the String.$$ And i am the last!';


(Note the type of quote mark used)

If you were to print out the original string data like this:

my $data = "Hello, i am a litte String.$ Please format me.$$$ I am 
the end of the String.$$ And i am the last!";

print("$data\n");

you would get this:

Hello, i am a litte String. format me. I am the end of the 
String.1896 And i am the last!


i.e., the original string did not have any '$' characters in it at all.

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Problem with regex

2006-05-12 Thread Andy Speagle
Holger,
 
This worked for me note that you need to escape the $ characters in your string.  The "3398" number is actually the PID of the perl process returned from the special variable $$ ... since you didn't escape the $ characters..

 
my $Data = "" i am a litte String.\$ Please format me.\$\$\$ I am the endof the String.\$\$ And i am the last!";
$Data =~ s/[\$]{3}//;$Data =~ s/[\$]{2}//;$Data =~ s/\$//;
print $Data ."\n";
Hope that helps...
 
Andy Speagle
 
- 
On 5/12/06, Holger Wöhle <[EMAIL PROTECTED]> wrote:
Hello,under Windows with ActiveState Perl i have a strange problem with a regex:Assuming the following String:
my $Data = "" i am a litte String.$ Please format me.$$$ I am the endof the String.$$ And i am the last!"The regex should replace $ with the string , $$ with  and $$$ with
 (please don't think about the why)If tried to use the following:$data =~ s/\$\$\$//gm; #should catch every occurrence of data =~ s/\$\$//gm; #should catch $$
$data =~ s/\$//gm; #the restSo data should look after the first regex:Hello, i am a litte String.$Please format me.I am the end of theString.$$And i am the last!And after the second:
Hello, i am a litte String.$Please format me.I am the end of theString.And i am the last!And the last:Hello, i am a litte String.Please format me.I am the end of the
String.And i am the last!But all regexes i tried (the one above are only one try) failed! When iprint out the string it looks like:Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!Where the number after String. differs between every run.Can someone help me ?With regarsHolger___Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.comTo unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Problem with regex

2006-05-12 Thread John Deighan
The code below worked for me, after using single 
quotes around the original string to prevent any 
interpolation (it's always a good practice to 
print out the original string to verify that it's 
what you thought it was), and, of course, $Data is not the same as $data.


At 08:38 AM 5/12/2006, Holger Wöhle wrote:

Hello,
under Windows with ActiveState Perl i have a strange problem with a regex:
Assuming the following String:

my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!"

The regex should replace $ with the string , $$ with  and $$$ with
 (please don't think about the why)

If tried to use the following:
$data =~ s/\$\$\$//gm; #should catch every occurrence of $$$
$data =~ s/\$\$//gm; #should catch $$
$data =~ s/\$//gm; #the rest

So data should look after the first regex:
Hello, i am a litte String.$Please format me.I am the end of the
String.$$And i am the last!
And after the second:
Hello, i am a litte String.$Please format me.I am the end of the
String.And i am the last!
And the last:
Hello, i am a litte String.Please format me.I am the end of the
String.And i am the last!

But all regexes i tried (the one above are only one try) failed! When i
print out the string it looks like:

Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!

Where the number after String. differs between every run.

Can someone help me ?

With regars
Holger

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Scanned for Spam and Viruses.
PCG Information Technology Services.



___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


RE: Problem with regex

2006-05-12 Thread Yekhande, Seema \(MLITS\)
Holger,

Actually $ is a special character in string in perl. So, if the $ is there in 
the input,
you will have to always write it with the leading escape character. 

So, make your input will be like this,
my $data = "Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!";

It will solve your problem.

Thanks,
Seema
GPCT|TDDS|AIS|SPCM3


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Holger Wöhle
Sent: Friday, May 12, 2006 6:09 PM
To: perl-win32-users@listserv.ActiveState.com
Subject: Problem with regex 


 
Hello,
under Windows with ActiveState Perl i have a strange problem with a regex:
Assuming the following String:

my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!"

The regex should replace $ with the string , $$ with  and $$$ with
 (please don't think about the why)

If tried to use the following:
$data =~ s/\$\$\$//gm; #should catch every occurrence of $$$
$data =~ s/\$\$//gm; #should catch $$
$data =~ s/\$//gm; #the rest

So data should look after the first regex:
Hello, i am a litte String.$Please format me.I am the end of the
String.$$And i am the last!
And after the second:
Hello, i am a litte String.$Please format me.I am the end of the
String.And i am the last!
And the last:
Hello, i am a litte String.Please format me.I am the end of the
String.And i am the last!

But all regexes i tried (the one above are only one try) failed! When i
print out the string it looks like:

Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!

Where the number after String. differs between every run.

Can someone help me ?

With regars
Holger

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


If you are not an intended recipient of this e-mail, please notify the sender, 
delete it and do not read, act upon, print, disclose, copy, retain or 
redistribute it. Click here for important additional terms relating to this 
e-mail. http://www.ml.com/email_terms/


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Re: Problem with regex

2006-05-12 Thread Karl-Heinz Kuth

Hello,



my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!"

The regex should replace $ with the string , $$ with  and $$$ with
 (please don't think about the why)

If tried to use the following:
$data =~ s/\$\$\$//gm; #should catch every occurrence of $$$
$data =~ s/\$\$//gm; #should catch $$
$data =~ s/\$//gm; #the rest

So data should look after the first regex:
Hello, i am a litte String.$Please format me.I am the end of the
String.$$And i am the last!
And after the second:
Hello, i am a litte String.$Please format me.I am the end of the
String.And i am the last!
And the last:
Hello, i am a litte String.Please format me.I am the end of the
String.And i am the last!

But all regexes i tried (the one above are only one try) failed! When i
print out the string it looks like:

Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!

Where the number after String. differs between every run.

Can someone help me ?


This works at least on my machine:

use strict;
use warnings;

my $Data = 'Hello, i am a litte String.$ Please format me.$$$ I am the 
end of the String.$$ And i am the last!';


$Data =~ s/([^\$]*)\${3,3}([^\$]+)/$1\\$2/gm;
$Data =~ s/([^\$]*)\${2,2}([^\$]+)/$1\$2/gm;
$Data =~ s/([^\$]*)\${1,1}([^\$]+)/$1\$2/gm;
print "Data: $Data \n";

___END___

Notice, I change the double quotes to single quotes for $Data.
For me, the regex is clear. But if not for you, I can explain.
There are maybe some "better" solution, this is just a quick one.

Regards
Karl-Heinz


___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs


Problem with regex

2006-05-12 Thread Holger Wöhle
 
Hello,
under Windows with ActiveState Perl i have a strange problem with a regex:
Assuming the following String:

my $Data = "Hello, i am a litte String.$ Please format me.$$$ I am the end
of the String.$$ And i am the last!"

The regex should replace $ with the string , $$ with  and $$$ with
 (please don't think about the why)

If tried to use the following:
$data =~ s/\$\$\$//gm; #should catch every occurrence of $$$
$data =~ s/\$\$//gm; #should catch $$
$data =~ s/\$//gm; #the rest

So data should look after the first regex:
Hello, i am a litte String.$Please format me.I am the end of the
String.$$And i am the last!
And after the second:
Hello, i am a litte String.$Please format me.I am the end of the
String.And i am the last!
And the last:
Hello, i am a litte String.Please format me.I am the end of the
String.And i am the last!

But all regexes i tried (the one above are only one try) failed! When i
print out the string it looks like:

Hello, i am a litte String. Please format me. I am the end of the
String.3398 And i am the last!

Where the number after String. differs between every run.

Can someone help me ?

With regars
Holger

___
Perl-Win32-Users mailing list
Perl-Win32-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs