Re: Using regular expressions to populate a variable?

2015-01-18 Thread Shawn H Corey
On Sun, 18 Jan 2015 11:49:11 -0500
Mike ekimduna...@gmail.com wrote:

 Hey everyone, I'm trying to find information on how I can use regular 
 expressions to populate a variable.
 
 I want to pull text between one set of characters and another set of 
 characters and use that to populate my variable. Can anyone point me
 in the right direction?
 
 Thanks.
 

Use parentheses to select the part of the match you want in your
variables: 

my ( $var1, $var2, @rest ) =~ /some characters(populates $var1)more
characters(populates $var2) more (populates @rest) more (populates
@rest) /;


See `perdoc perlre` and search for /Capture groups/
http://perldoc.perl.org/perlre.html#Capture-groups


For more info:
perldoc perlretuthttp://perldoc.perl.org/perlretut.html
perldoc perlrequick  http://perldoc.perl.org/perlrequick.html
perldoc perlre   http://perldoc.perl.org/perlre.html


-- 
Don't stop where the ink does.
Shawn

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Using regular expressions to populate a variable?

2015-01-18 Thread Jim Gibson

 On Jan 18, 2015, at 9:03 AM, Mike ekimduna...@gmail.com wrote:
 
 I was able to find match extraction in the perldoc.
 
 Here is a snippet of what I have.
 
 my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ );
 print $insult\n;
 
 But $insult is being populated with: 1
 
 It should be populated with text. Can anyone tell me what I'm doing wrong 
 here?

Your error is assigning the return value of the regular expression in a scalar 
context. In scalar context, a regular expression returns true or false 
indicating a match (or not). In array context, however, it returns the captured 
subexpressions as a list.

Try forcing the assignment into array context:

 my( $insult ) = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ );

You can also use the capture variables $1, $2, $3, etc., which will contain the 
captured subexpressions:

 my $insult;
 if( $mech-text =~ m/Insulter\ (.*)\ Taken/ ) ) {
   $insult = $1;
 }


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Using regular expressions to populate a variable?

2015-01-18 Thread Charles DeRykus
On Sun, Jan 18, 2015 at 9:28 AM, Jim Gibson jimsgib...@gmail.com wrote:

 On Jan 18, 2015, at 9:03 AM, Mike ekimduna...@gmail.com wrote:

 I was able to find match extraction in the perldoc.

 Here is a snippet of what I have.

 my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ );
 print $insult\n;

 But $insult is being populated with: 1

 It should be populated with text. Can anyone tell me what I'm doing wrong 
 here?

 Your error is assigning the return value of the regular expression in a 
 scalar context. In scalar context, a regular expression returns true or false 
 indicating a match (or not). In array context, however, it returns the 
 captured subexpressions as a list.

 Try forcing the assignment into array context:

  my( $insult ) = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ );
 ...

For more info: see perldoc perldata.  There a full discussion of  list
vs scalar context .

-- 
Charles DeRykus

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Using regular expressions to populate a variable?

2015-01-18 Thread Mike

I was able to find match extraction in the perldoc.

Here is a snippet of what I have.

my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ );
print $insult\n;

But $insult is being populated with: 1

It should be populated with text. Can anyone tell me what I'm doing 
wrong here?


Thanks.

On 1/18/15 11:49 AM, Mike wrote:
Hey everyone, I'm trying to find information on how I can use regular 
expressions to populate a variable.


I want to pull text between one set of characters and another set of 
characters and use that to populate my variable. Can anyone point me 
in the right direction?


Thanks.



--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Using regular expressions to populate a variable?

2015-01-18 Thread Mike
Hey everyone, I'm trying to find information on how I can use regular 
expressions to populate a variable.


I want to pull text between one set of characters and another set of 
characters and use that to populate my variable. Can anyone point me in 
the right direction?


Thanks.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Using regular expressions to populate a variable?

2015-01-18 Thread Mike

Thanks. This worked.

On 1/18/15 12:28 PM, Jim Gibson wrote:

On Jan 18, 2015, at 9:03 AM, Mike ekimduna...@gmail.com wrote:

I was able to find match extraction in the perldoc.

Here is a snippet of what I have.

my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ );
print $insult\n;

But $insult is being populated with: 1

It should be populated with text. Can anyone tell me what I'm doing wrong here?

Your error is assigning the return value of the regular expression in a scalar 
context. In scalar context, a regular expression returns true or false 
indicating a match (or not). In array context, however, it returns the captured 
subexpressions as a list.

Try forcing the assignment into array context:

  my( $insult ) = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ );

You can also use the capture variables $1, $2, $3, etc., which will contain the 
captured subexpressions:

  my $insult;
  if( $mech-text =~ m/Insulter\ (.*)\ Taken/ ) ) {
$insult = $1;
  }





--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Help with regular expressions

2011-05-10 Thread Tiago Hori
  Hasn't someone already fixed this problem?  If there isn't a CPAN module
 to
  perform standardized bibliographic reference formatting/parsing.  I
 haven't
  looked at CPAN; did either of you?  If a CPAN module doesn't exist, one
  should!
 

 What standard?

 Kalthoff K (2001) Analysis of biological development. McGraw-Hill, NY.


 Or


  Manning JT, Barley L, Walton J, Lewis-Jones DI, Trivers RL, Singh D,
  Thornhill R, Rohde P, Bereczkei T, Henzi P, Soler M, Szwed A. (2000) The
  2nd:4th digit ratio, sexual dimorphism, population differences, and
  reproductive success. evidence for sexually antagonistic genes? Evol Hum
  Behav. 21(3):163-183.


 Or


  Berger, M., Lawrence, M., Demichelis, F., Drier, Y., Cibulskis, K.,
  Sivachenko, A., Sboner, A., Esgueva, R., Pflueger, D., Sougnez, C.,
 Onofrio,
  R., Carter, S., Park, K., Habegger, L., Ambrogio, L., Fennell, T.,
 Parkin,
  M., Saksena, G., Voet, D., Ramos, A., Pugh, T., Wilkinson, J., Fisher,
 S.,
  Winckler, W., Mahan, S., Ardlie, K., Baldwin, J., Simons, J.,
 Kitabayashi,
  N., MacDonald, T., Kantoff, P., Chin, L., Gabriel, S., Gerstein, M.,
 Golub,
  T., Meyerson, M., Tewari, A., Lander, E., Getz, G., Rubin, M., 
 Garraway,
  L. (2011). The genomic complexity of primary human prostate cancer
 Nature,
  470 (7333), 214-220 DOI: 10.1038/nature09744


 ?

 If there's a standard, then sure, someone has probably put that into CPAN.
 The problem is that I don't think that there is, though I'd be glad to be
 proven wrong.



  What I want to be able to do eventually is parse each name separately and
  associate that with the title. I am not sure how yet, but I haven't even
  got
  there.
 
 
 That can range from pretty simple to fairly complex, depending on how much
 you want to squeeze out of that relationship. If you just want to be able
 to
 say Morgan, M.J wrote an article for X journal, titled Y, then that's
 just
 a hash (of hashes), and you need to look no further than this mail. But if
 you also want to say, Journal X has these authors. One of them is Wilson,
 C.E, who co-wrote article Y, where Crim, L.W. was also a collaborator, and
 whose primary author is Morgan, M.J., then hashes will probably not cut it
 anymore (a cyclical hash of hashes might do, but that's pretty tough to
 handle, and _very_ rough on the eyes). You'll probably want an object model
 there, or some database interaction.

 But we are getting ahead of ourselves for now :)


I figured that eventually it would be easier to somehow pass the results
into mySQL tables, but I left that bridge to be crossed once I get there.




  It works fine for the first name, but as expected if @entries contain
  several strings with authors names (I did that by matching the year and
  storing $` in the @entries) it will match the first author and it will go
  to
  the next $entries. Is there a way to match the pattern more than once,
 but
  to store each match separately?
 

 You are looking for the /g switch. You can look it up in perlretut[0].


I actually remember reading on the Llama book that the /g modifier could be
use with m// also and not only with s/// and thinking but when would you
need it with m//. :)


 For example, would I be able to store
  Morgan, M.J. as one item in an array and Wilson, C.E. as another one?
 
 
 
 Sure. the my @names = ... from above will suffice for that. But chances are
 you want more than that - In general, you have two options. Either you make
 several small regexes to extract the data piece by piece, or you create a
 grammar to do the job for you. For the latter, there's two main options: a
 (?(DEFINE)) pattern, which is Pure Perl and in the language since 5.010, or
 you pull out Regexp::Grammars from CPAN. They are pretty similar, but
 Regexp::Grammars is much more powerful, letting you access the full parse
 tree - so what I'll have to do in two steps in the next snippet, R::G would
 do in one.

 Here's my stab at it, using (?(DEFINE))[1], named captures[2], Unicode
 character properties[3], and a probably unnecessary lookbehind[1] in the
 split by the end. I made some arbitrary assumptions on the data, like
 saying
 that a title can't be longer than 52 characters, or can't have a period in
 it, or that the journal's name can't have digits in it, which I suppose is
 a
 tad disingenuous, but take it as an example, not a solution : P


Thanks! This gives me a lot to read on.

Cheers,

T.



-- 
Education is not to be used to promote obscurantism. - Theodonius
Dobzhansky.

Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori
PhD Candidate - Ocean Science Center-Memorial 

Help with regular expressions

2011-05-09 Thread Tiago Hori
Hi List,

I am trying to write a small script to parse bibliographic references like
this:

Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.

What I want to be able to do eventually is parse each name separately and
associate that with the title. I am not sure how yet, but I haven't even got
there.

Right now I am just trying to see if I can parse the names, so I came up
with this:

foreach (@entries){
if (/((\w)*, (([A-Z].)*),){1,}/){
 my $name = $;
 $name =~ s/\.,/\. /g;
 push @names, $name;
}
}

It works fine for the first name, but as expected if @entries contain
several strings with authors names (I did that by matching the year and
storing $` in the @entries) it will match the first author and it will go to
the next $entries. Is there a way to match the pattern more than once, but
to store each match separately? For example, would I be able to store
Morgan, M.J. as one item in an array and Wilson, C.E. as another one?

As always, any help is much appreciated.

Cheers,

Tiago
-- 
Education is not to be used to promote obscurantism. - Theodonius
Dobzhansky.

Gracias a la vida que me ha dado tanto
Me ha dado el sonido y el abecedario
Con él, las palabras que pienso y declaro
Madre, amigo, hermano
Y luz alumbrando la ruta del alma del que estoy amando

Gracias a la vida que me ha dado tanto
Me ha dado la marcha de mis pies cansados
Con ellos anduve ciudades y charcos
Playas y desiertos, montañas y llanos
Y la casa tuya, tu calle y tu patio

Violeta Parra - Gracias a la Vida

Tiago S. F. Hori
PhD Candidate - Ocean Science Center-Memorial University of Newfoundland


Re: Help with regular expressions

2011-05-09 Thread Sandip Bhattacharya
On Mon, May 9, 2011 at 11:44 PM, Tiago Hori tiago.h...@gmail.com wrote:
 I am trying to write a small script to parse bibliographic references like
 this:

 Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
 reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.

 What I want to be able to do eventually is parse each name separately and
 associate that with the title. I am not sure how yet, but I haven't even got
 there.

I took a stab at this. It might not be perfect and catch all possible
variations. But in any case, unless you have rules for the text in
these entries, it is very difficult to catch them all.

=
#!/usr/bin/perl
#

use strict;
use warnings;

my $text = END;
Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
END

my @authors=();

# Extract authors
# Assuming each author is composed of one of more matches of:
#   SPACE* WORD, SPACE* (ALPHABET PERIOD)+
if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) {
while(@matches) {
my $match = shift @matches;
my @comps = map {s/^ +//;s/ +$//;$_} (split ,, $match);
push @authors, join  ,@comps[1,0];
shift @matches;
}
}

# Extract title
# Everything from the first period followed by a space to the next period.
# Authors should have periods followed by either a letter or a comma
# for this to work
if ($text =~m/\. (.*?)\./s) {
my $title =  $1;
$title =~ s/\n/ /g;
foreach(@authors) {
print $title: $_\n;
}
}
=

$ ./match_2.pl
The effect of stress on reproduction in Atlantic cod: M.J. Morgan
The effect of stress on reproduction in Atlantic cod: C.E. Wilson
The effect of stress on reproduction in Atlantic cod: L.W. Crim

All, please let me know if there is a way to combine both the regexes.
I had a brain coredump before I gave up.

Thanks,
  Sandip

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Help with regular expressions

2011-05-09 Thread Kenneth Wolcott
On Mon, May 9, 2011 at 12:04, Sandip Bhattacharya 
sand...@foss-community.com wrote:

 On Mon, May 9, 2011 at 11:44 PM, Tiago Hori tiago.h...@gmail.com wrote:
  I am trying to write a small script to parse bibliographic references
 like
  this:
 
  Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
  reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
 
  What I want to be able to do eventually is parse each name separately and
  associate that with the title. I am not sure how yet, but I haven't even
 got
  there.

 I took a stab at this. It might not be perfect and catch all possible
 variations. But in any case, unless you have rules for the text in
 these entries, it is very difficult to catch them all.

 =
 #!/usr/bin/perl
 #

 use strict;
 use warnings;

 my $text = END;
 Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
 reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.
 END

 my @authors=();

 # Extract authors
 # Assuming each author is composed of one of more matches of:
 #   SPACE* WORD, SPACE* (ALPHABET PERIOD)+
 if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) {
while(@matches) {
my $match = shift @matches;
my @comps = map {s/^ +//;s/ +$//;$_} (split ,, $match);
push @authors, join  ,@comps[1,0];
shift @matches;
}
 }

 # Extract title
 # Everything from the first period followed by a space to the next period.
 # Authors should have periods followed by either a letter or a comma
 # for this to work
 if ($text =~m/\. (.*?)\./s) {
my $title =  $1;
$title =~ s/\n/ /g;
foreach(@authors) {
print $title: $_\n;
}
 }
 =

 $ ./match_2.pl
 The effect of stress on reproduction in Atlantic cod: M.J. Morgan
 The effect of stress on reproduction in Atlantic cod: C.E. Wilson
 The effect of stress on reproduction in Atlantic cod: L.W. Crim

 All, please let me know if there is a way to combine both the regexes.
 I had a brain coredump before I gave up.

 Thanks,
  Sandip


Hasn't someone already fixed this problem?  If there isn't a CPAN module to
perform standardized bibliographic reference formatting/parsing.  I haven't
looked at CPAN; did either of you?  If a CPAN module doesn't exist, one
should!

Ken Wolcott


Re: Help with regular expressions

2011-05-09 Thread Brian Fraser
On Mon, May 9, 2011 at 6:35 PM, Kenneth Wolcott kennethwolc...@gmail.comwrote:

 Hasn't someone already fixed this problem?  If there isn't a CPAN module to
 perform standardized bibliographic reference formatting/parsing.  I haven't
 looked at CPAN; did either of you?  If a CPAN module doesn't exist, one
 should!


What standard?

Kalthoff K (2001) Analysis of biological development. McGraw-Hill, NY.


Or


 Manning JT, Barley L, Walton J, Lewis-Jones DI, Trivers RL, Singh D,
 Thornhill R, Rohde P, Bereczkei T, Henzi P, Soler M, Szwed A. (2000) The
 2nd:4th digit ratio, sexual dimorphism, population differences, and
 reproductive success. evidence for sexually antagonistic genes? Evol Hum
 Behav. 21(3):163-183.


Or


 Berger, M., Lawrence, M., Demichelis, F., Drier, Y., Cibulskis, K.,
 Sivachenko, A., Sboner, A., Esgueva, R., Pflueger, D., Sougnez, C., Onofrio,
 R., Carter, S., Park, K., Habegger, L., Ambrogio, L., Fennell, T., Parkin,
 M., Saksena, G., Voet, D., Ramos, A., Pugh, T., Wilkinson, J., Fisher, S.,
 Winckler, W., Mahan, S., Ardlie, K., Baldwin, J., Simons, J., Kitabayashi,
 N., MacDonald, T., Kantoff, P., Chin, L., Gabriel, S., Gerstein, M., Golub,
 T., Meyerson, M., Tewari, A., Lander, E., Getz, G., Rubin, M.,  Garraway,
 L. (2011). The genomic complexity of primary human prostate cancer Nature,
 470 (7333), 214-220 DOI: 10.1038/nature09744


?

If there's a standard, then sure, someone has probably put that into CPAN.
The problem is that I don't think that there is, though I'd be glad to be
proven wrong.

On Mon, May 9, 2011 at 3:14 PM, Tiago Hori tiago.h...@gmail.com wrote:

 Hi List,


Howdy.



 What I want to be able to do eventually is parse each name separately and
 associate that with the title. I am not sure how yet, but I haven't even
 got
 there.


That can range from pretty simple to fairly complex, depending on how much
you want to squeeze out of that relationship. If you just want to be able to
say Morgan, M.J wrote an article for X journal, titled Y, then that's just
a hash (of hashes), and you need to look no further than this mail. But if
you also want to say, Journal X has these authors. One of them is Wilson,
C.E, who co-wrote article Y, where Crim, L.W. was also a collaborator, and
whose primary author is Morgan, M.J., then hashes will probably not cut it
anymore (a cyclical hash of hashes might do, but that's pretty tough to
handle, and _very_ rough on the eyes). You'll probably want an object model
there, or some database interaction.

But we are getting ahead of ourselves for now :)


 foreach (@entries){
if (/((\w)*, (([A-Z].)*),){1,}/){


You probably want some like my @names = /( \w+, (?: [A-Z] \. )+ ,\s* )+/xg
instead.


  my $name = $;


Try not to use $ and $` - There's a program-wide speed penalty if you do.
Just using capturing groups should make do.


 It works fine for the first name, but as expected if @entries contain
 several strings with authors names (I did that by matching the year and
 storing $` in the @entries) it will match the first author and it will go
 to
 the next $entries. Is there a way to match the pattern more than once, but
 to store each match separately?


You are looking for the /g switch. You can look it up in perlretut[0].


 For example, would I be able to store
 Morgan, M.J. as one item in an array and Wilson, C.E. as another one?



Sure. the my @names = ... from above will suffice for that. But chances are
you want more than that - In general, you have two options. Either you make
several small regexes to extract the data piece by piece, or you create a
grammar to do the job for you. For the latter, there's two main options: a
(?(DEFINE)) pattern, which is Pure Perl and in the language since 5.010, or
you pull out Regexp::Grammars from CPAN. They are pretty similar, but
Regexp::Grammars is much more powerful, letting you access the full parse
tree - so what I'll have to do in two steps in the next snippet, R::G would
do in one.

Here's my stab at it, using (?(DEFINE))[1], named captures[2], Unicode
character properties[3], and a probably unnecessary lookbehind[1] in the
split by the end. I made some arbitrary assumptions on the data, like saying
that a title can't be longer than 52 characters, or can't have a period in
it, or that the journal's name can't have digits in it, which I suppose is a
tad disingenuous, but take it as an example, not a solution : P

use 5.010;

$_ = 'Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on
reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.';

/
(?all_names (?ALL_NAMES) )
(?year (?YEAR) )\. \s+
(?title (?TITLE) )\. \s+
(?journal (?JOURNAL) )\. \s*
(?edition (?NUM)+ ), \s*
(?pages (?NUM)+-(?NUM)+ )\.


(?(DEFINE)
(?ALL_NAMES ( (?FULL_NAME), \s+)+ )
(?FULL_NAME (?SURNAME), \s* (?INITIALS) )
(?SURNAME \p{Lu}\p{L}* )
(?INITIALS (?:\p{Lu}\.)+ )
(?YEAR \p{PosixDigit}{4} )
(?TITLE [^.]{1,52} ) #Article title
(?JOURNAL \P{PosixDigit}+ ) #Journal name
(?NUM \p{PosixDigit} ) 

Re: Regular Expressions Question

2011-04-12 Thread gkl
On Apr 10, 11:03 pm, jwkr...@shaw.ca (John W. Krahn) wrote:
 cityuk wrote:
  Dear All,

 Hello,



  This is more of a generic question on regular expressions as my
  program is working fine but I was just curious.

  Say you have the following URLs:

 http://www.test.com/image.gif
 http://www.test.com/?src=image.gif?width=12

  I want to get the type of the image, i.e. the string gif.

  For the first URL the regular expression .*\.([a-z]{3}) will do the
  trick while for the second one I am using .*=\([a-z]{3})?.*.

  Ignoring the fact that the REs can be written better my question is:

  If I put them together, that is write them as

  .*\.([a-z]{3})|.*=\([a-z]{3})?.*

  perl thinks that the or only applies to the characters immediately
  surrounding it (in this case ) and .).

 No.  The alternation applies to the complete pattern '.*\.([a-z]{3})' OR

OK. So if I understood you correctly, given the following (actual)
URLs

http://beta.images.theglobeandmail.com/archive/01258/election_heads__1258993cl-3.jpg
http://storage.canoe.ca/v1/dynamic_resize/?src=http://www.torontosun.com/news/decision2011/2011/04/06/300_harper_boring.jpgsize=248x186

the following pattern

^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$

should match them both. Am I correct?

Regards,
George


 '.*=\([a-z]{3})?.*'.

 John
 --
 Any intelligent fool can make things bigger and
 more complex... It takes a touch of genius -
 and a lot of courage to move in the opposite
 direction.                   -- Albert Einstein


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions Question

2011-04-12 Thread C.DeRykus
On Apr 11, 7:21 am, gklc...@googlemail.com (gkl) wrote:
 On Apr 10, 11:03 pm, jwkr...@shaw.ca (John W. Krahn) wrote:
stion on regular expressions as my
   program is working fine but I was just curious.

   Say you have the following URLs:

  http://www.test.com/image.gif
  http://www.test.com/?src=image.gif?width=12
 

 OK. So if I understood you correctly, given the following (actual)
 URLs

 http://beta.images.theglobeandmail.com/archive/01258/election_heads__...http://storage.canoe.ca/v1/dynamic_resize/?src=http://www.torontosun

 the following pattern

 ^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$

 should match them both. Am I correct?


No, there is at least one problem. In your first
alternative, the '.*'  will  also match the literal '?'
which the second alternative is matching.

 See: 'perldoc perlretut' for a review.

[ The  URI module which was mentioned will
 be a quicker solution and will work work all
 cases. ]

--
Charles DeRykus


See: perldoc perlretut


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions Question

2011-04-12 Thread Rob Dixon

On 11/04/2011 15:21, gkl wrote:


OK. So if I understood you correctly, given the following (actual)
URLs

http://beta.images.theglobeandmail.com/archive/01258/election_heads__1258993cl-3.jpg
http://storage.canoe.ca/v1/dynamic_resize/?src=http://www.torontosun.com/news/decision2011/2011/04/06/300_harper_boring.jpgsize=248x186

the following pattern

^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$

should match them both. Am I correct?


First of all I notice that the src parameter in your second URL's query
is now an absolute URL, whereas your first post had just a file name.
Since we cannot anticipate how far and in which direction your problem
may grow, it is your responsibility to present the entirety of the
possibilities as you know them. Otherwise you will be engaging the world
in a goose chase of the wildest sort.

If you mean

  /^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$/

then you must apply the /x modifier, otherwise the spaces at the end of
the first option and at the beginning of the second form part of the
expressions.

As far as I can think,

/^\s*.*\.([a-zA-z]{3})$/

is exactly equivalent to

/\.([a-zA-z]{3})$/

which, presumably as you intend, will match the first URL and capture
'jpg'. It will fail to match the second URL.


While the first option seemed to be considering the possibility of
irrelevant leading spaces, the second

/^\S*\?\S*\.([a-zA-z]{3}).*$/

is insisting on a sequence of non-spaces from the beginning of the
string up to the last possible question mark. Then another sequence of
non-spaces up to the last possible dot, followed by three alphas and an
ampersand. The subsequent /.*$/ does nothing.

I suggest to you that simply

/.*\.([a-z]+)/i

will match all of the four URLs you have posted so far, and capture from
them exactly what you expect. Only you can know the full extent of your
problem, and why you refuse the advice you have been offered.

I will continue to try to help you.

Rob


























--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions Question

2011-04-11 Thread Rob Dixon
On 11/04/2011 06:43, Shlomi Fish wrote:
 On Sunday 10 Apr 2011 14:05:49 cityuk wrote:

 This is more of a generic question on regular expressions as my
 program is working fine but I was just curious.

 Say you have the following URLs:

 http://www.test.com/image.gif
 http://www.test.com/?src=image.gif?width=12

 
 Don't use regular expressions to parse URLs - instead use URI.pm:
 
 http://cpan.uwinnipeg.ca/dist/URI

I agree. The program below shows a subroutine which will extract the
file type from either form of URL. It first checks to see if there is a
'src' option in the query, using this for the file name if so; otherwise
it uses the last segment of the URL path. The file type type is
extracted by capturing all trailing non-dot characters from the file
name.

(I assume your second address should read
http://www.test.com/?src=image.gifwidth=12 with an ampersand instead
of a second question mark?)

HTH,

Rob


use strict;
use warnings;

use URI;

sub filetype_from_url {
  my $url = URI-new($_[0]);
  my %form = $url-query_form;
  my $file = $form{src} || ($url-path_segments)[-1];
  return $file =~ /([^.]+)\z/;
}

print filetype_from_url('http://www.test.com/image.gif'), \n;
print filetype_from_url('http://www.test.com/?src=image.gifwidth=12'), \n;





-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Regular Expressions Question

2011-04-10 Thread cityuk
Dear All,

This is more of a generic question on regular expressions as my
program is working fine but I was just curious.

Say you have the following URLs:

http://www.test.com/image.gif
http://www.test.com/?src=image.gif?width=12

I want to get the type of the image, i.e. the string gif.

For the first URL the regular expression .*\.([a-z]{3}) will do the
trick while for the second one I am using .*=\([a-z]{3})?.*.

Ignoring the fact that the REs can be written better my question is:

If I put them together, that is write them as

.*\.([a-z]{3})|.*=\([a-z]{3})?.*

perl thinks that the or only applies to the characters immediately
surrounding it (in this case ) and .).

Is there a way to say here is a whole RE, here is another and match
the first or the second?

Regards,
George


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions Question

2011-04-10 Thread John W. Krahn

cityuk wrote:

Dear All,


Hello,


This is more of a generic question on regular expressions as my
program is working fine but I was just curious.

Say you have the following URLs:

http://www.test.com/image.gif
http://www.test.com/?src=image.gif?width=12

I want to get the type of the image, i.e. the string gif.

For the first URL the regular expression .*\.([a-z]{3}) will do the
trick while for the second one I am using .*=\([a-z]{3})?.*.

Ignoring the fact that the REs can be written better my question is:

If I put them together, that is write them as

.*\.([a-z]{3})|.*=\([a-z]{3})?.*

perl thinks that the or only applies to the characters immediately
surrounding it (in this case ) and .).


No.  The alternation applies to the complete pattern '.*\.([a-z]{3})' OR 
'.*=\([a-z]{3})?.*'.




John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction.   -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions Question

2011-04-10 Thread David Christensen

On 04/10/2011 04:05 AM, cityuk wrote:

Is there a way to say here is a whole RE, here is another and match
the first or the second?


Jeffrey E.F. Friedl, 2006, Mastering Regular Expressions, 3 e., 
O'Reilly Media, ISBN 978-0-596-52812-6.


http://oreilly.com/catalog/9780596528126/


HTH,

David

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions Question

2011-04-10 Thread Shlomi Fish
On Sunday 10 Apr 2011 14:05:49 cityuk wrote:
 Dear All,
 
 This is more of a generic question on regular expressions as my
 program is working fine but I was just curious.
 
 Say you have the following URLs:
 
 http://www.test.com/image.gif
 http://www.test.com/?src=image.gif?width=12
 

Don't use regular expressions to parse URLs - instead use URI.pm:

http://cpan.uwinnipeg.ca/dist/URI

Regards,

Shlomi Fish

-- 
-
Shlomi Fish   http://www.shlomifish.org/
http://www.shlomifish.org/humour/ways_to_do_it.html

Electrical Engineering studies. In the Technion. Been there. Done that. Forgot
a lot. Remember too much.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular expressions question

2009-11-19 Thread mangled...@yahoo.com
   Can anyone tell me how to write a regular expression which matches
   anything _except_ a litteral string ?

 One could also use a zero-with negative look-ahead assertion:

 #!/usr/bin/perl -w

 use strict;

 while( my $line = DATA ){
   if( $line =~ m/^(?!Nomatch)/ ){
     print match: $line;
   }

 }

Thanks a lot for the reply,  that worked perfectly in my application.

David


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Regular expressions question

2009-11-18 Thread mangled...@yahoo.com
Hi,

Can anyone tell me hoq to write a regular expression which matches
anything _except_ a litteral string ?

For instance, I want to match any line which does not begin with
Nomatch.  So in the following :

Line1 
Line2 
Nomatch 
Line3 
Line 4 

I would match every line except the one containing Nomatch 

Many thanks,

David


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular expressions question

2009-11-18 Thread Dermot
2009/11/17 mangled...@yahoo.com mangled...@yahoo.com:
 Hi,

Hello,


 Can anyone tell me hoq to write a regular expression which matches
 anything _except_ a litteral string ?

 For instance, I want to match any line which does not begin with
 Nomatch.  So in the following :

 Line1 
 Line2 
 Nomatch 
 Line3 
 Line 4 

 I would match every line except the one containing Nomatch 


You would negate the pattern. Something like this:

#!/usr/bin/perl


use strict;
use warnings;

while (DATA) {
print if ! /^Nomatch/;
}

__DATA__
Line1 
Line2 
Nomatch 
Line3 
Line 4 
~

Output:
Line1 
Line2 
Line3 
Line 4 

see
perldoc perlop  #Logical-Not
and
perldoc perlsyn
and of course
perldoc perlrequick


HTH,
Dp.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




AW: Regular expressions question

2009-11-18 Thread Thomas Bätzler
Hi,

Dermot paik...@googlemail.com suggested:
 2009/11/17 mangled...@yahoo.com mangled...@yahoo.com:

  Can anyone tell me hoq to write a regular expression which matches
  anything _except_ a litteral string ?
 
  For instance, I want to match any line which does not begin with
  Nomatch.  So in the following :


 You would negate the pattern. Something like this:
 
 #!/usr/bin/perl
 
 
 use strict;
 use warnings;
 
 while (DATA) {
 print if ! /^Nomatch/;
 }
 
 __DATA__
 Line1 
 Line2 
 Nomatch 
 Line3 
 Line 4 

One could also use a zero-with negative look-ahead assertion:

#!/usr/bin/perl -w

use strict;

while( my $line = DATA ){
  if( $line =~ m/^(?!Nomatch)/ ){
print match: $line;
  }
}

__DATA__
Line1 
Line2 
Nomatch 
Line3 
Line 4 

Cheers,
Thomas

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular expressions question

2009-11-18 Thread Rob Coops
On Wed, Nov 18, 2009 at 5:05 PM, Thomas Bätzler t.baetz...@bringe.comwrote:

 Hi,

 Dermot paik...@googlemail.com suggested:
  2009/11/17 mangled...@yahoo.com mangled...@yahoo.com:

   Can anyone tell me hoq to write a regular expression which matches
   anything _except_ a litteral string ?
  
   For instance, I want to match any line which does not begin with
   Nomatch.  So in the following :


  You would negate the pattern. Something like this:
 
  #!/usr/bin/perl
 
 
  use strict;
  use warnings;
 
  while (DATA) {
  print if ! /^Nomatch/;
  }
 
  __DATA__
  Line1 
  Line2 
  Nomatch 
  Line3 
  Line 4 

 One could also use a zero-with negative look-ahead assertion:

 #!/usr/bin/perl -w

 use strict;

 while( my $line = DATA ){
  if( $line =~ m/^(?!Nomatch)/ ){
print match: $line;
   }
 }

 __DATA__
 Line1 
 Line2 
 Nomatch 
 Line3 
 Line 4 

 Cheers,
 Thomas

 --
 To unsubscribe, e-mail: beginners-unsubscr...@perl.org
 For additional commands, e-mail: beginners-h...@perl.org
 http://learn.perl.org/



Look ahead notation works only on relatively recent versions of Perl, if
your environment contains things like HP-UX that ships with a decades old
version of Perl 5.005 I believe it is (depending on the version of HP-UX of
course) you might get in trouble.

I would therefore not use it or make the script explicitly require 5.6 or
higher just in case.

Regards,

Rob


Re: Regular Expressions with Incremented Variable Embedded

2009-06-01 Thread Dr.Ruud

Raabe, Wesley wrote:
I am using regular expressions to alter a text file. Where my original file has three spaces to start a paragraph, I want to replace each instance of three spaces with a bracketed paragraph number, with a counter for paragraph numbers,  pgf 1, pgf 2, pgf 3 etc. [...] 

The WHILE loop that I've crafted is like this: 


while (IN) {
 chomp;
  s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print $para_num;}})\/gi;  
# Replace three spaces with pgf XX
   print OUT $_\n;
}

I'm trying to embed the PERL code  based on the PERL tutorial 
(http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing-Perl-code-in-a-regular-expression,
 which is noted as an experimental feature. But it doesn't work (using MAC OSX). The output in my text 
file is pgf (?{my  = 1; ++;){print ;}}) at start of each paragraph.

Is there a way to do this with AUTO-INCREMENT variable and a FOR loop outside the regular expression in which the value is inserted inside the regular expression? My earlier attempts to do it that way always resulted in no change in the value, just pgf 1 on every paragraph time. 


I don't understand your g-modifier. Why is it there?
I assume that you only want to make the substitution at the start of a line.


#!/usr/bin/perl -w
  use strict;

  my $fname_inp = test.inp;
  my $fname_oup = test.oup;
  {
  open my $fh_inp, , $fname_inp or die '$fname_inp': , $!;
  open my $fh_oup, , $fname_oup or die '$fname_oup': , $!;

  my $pgf = 1;
  while ( $fh_inp ) {
  s/^[ ]{3}/pgf $pgf/ and $pgf++;
  print $fh_oup $_;
  }
  close $fh_oup or die '$fname_oup': , $!;
  }
__END__

--
Ruud

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions with Incremented Variable Embedded

2009-05-31 Thread John W. Krahn

Raabe, Wesley wrote:


I am using regular expressions to alter a text file. Where my original
file has three spaces to start a paragraph, I want to replace each
instance of three spaces with a bracketed paragraph number, with a
counter for paragraph numbers,  pgf 1, pgf 2, pgf 3 etc.  The
PERL program that I'm using is modeled on the answer to chapter 9,
question 3 in the Learning Perl book (4th ed.). 

The WHILE loop that I've crafted is like this: 


while (IN) {
 chomp;
  s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print $para_num;}})\/gi;  
# Replace three spaces with pgf XX
   print OUT $_\n;
}

I'm trying to embed the PERL code  based on the PERL tutorial
(http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing-
Perl-code-in-a-regular-expression, which is noted as an experimental
feature. But it doesn't work (using MAC OSX). The output in my text
file is pgf (?{my  = 1; ++;){print ;}}) at start of each
paragraph.

Is there a way to do this with AUTO-INCREMENT variable and a FOR loop
outside the regular expression in which the value is inserted inside
the regular expression? My earlier attempts to do it that way always
resulted in no change in the value, just pgf 1 on every paragraph
time. 



my $para_num;
while ( IN ) {
s/   /pgf @{[++$para_num]}/g;
print OUT;
}



John
--
Those people who think they know everything are a great
annoyance to those of us who do.-- Isaac Asimov

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Regular Expressions with Incremented Variable Embedded

2009-05-30 Thread Raabe, Wesley

I am using regular expressions to alter a text file. Where my original file has 
three spaces to start a paragraph, I want to replace each instance of three 
spaces with a bracketed paragraph number, with a counter for paragraph numbers, 
 pgf 1, pgf 2, pgf 3 etc.  The PERL program that I'm using is modeled on 
the answer to chapter 9, question 3 in the Learning Perl book (4th ed.). 

The WHILE loop that I've crafted is like this: 

while (IN) {
 chomp;
  s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print 
$para_num;}})\/gi;  # Replace three spaces with pgf XX
   print OUT $_\n;
}

I'm trying to embed the PERL code  based on the PERL tutorial 
(http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing-Perl-code-in-a-regular-expression,
 which is noted as an experimental feature. But it doesn't work (using MAC 
OSX). The output in my text file is pgf (?{my  = 1; ++;){print ;}}) at 
start of each paragraph.

Is there a way to do this with AUTO-INCREMENT variable and a FOR loop outside 
the regular expression in which the value is inserted inside the regular 
expression? My earlier attempts to do it that way always resulted in no change 
in the value, just pgf 1 on every paragraph time. 

Thanks,

Wesley Raabe
wra...@kent.edu
Assistant Professor
Textual Editing and American Literature
Kent State University
--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions with Incremented Variable Embedded

2009-05-30 Thread Chas. Owens
On Sat, May 30, 2009 at 23:32, Raabe, Wesley wra...@kent.edu wrote:

 I am using regular expressions to alter a text file. Where my original file 
 has three spaces to start a paragraph, I want to replace each instance of 
 three spaces with a bracketed paragraph number, with a counter for paragraph 
 numbers,  pgf 1, pgf 2, pgf 3 etc.  The PERL program that I'm using is 
 modeled on the answer to chapter 9, question 3 in the Learning Perl book (4th 
 ed.).

 The WHILE loop that I've crafted is like this:

    while (IN) {
     chomp;
      s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print 
 $para_num;}})\/gi;  # Replace three spaces with pgf XX
       print OUT $_\n;
 }

 I'm trying to embed the PERL code  based on the PERL tutorial 
 (http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing-Perl-code-in-a-regular-expression,
  which is noted as an experimental feature. But it doesn't work (using MAC 
 OSX). The output in my text file is pgf (?{my  = 1; ++;){print ;}}) at 
 start of each paragraph.

 Is there a way to do this with AUTO-INCREMENT variable and a FOR loop outside 
 the regular expression in which the value is inserted inside the regular 
 expression? My earlier attempts to do it that way always resulted in no 
 change in the value, just pgf 1 on every paragraph time.
snip

That would be because the second part of a s/// is not a regex, it is
a double quote string.  What you want is the /e option which
interprets the second part as Perl code instead:

my $i = 0;
while (IN) {
s/[ ]{3}/pgf  . $i++ . /ge;
print;
}


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-24 Thread Jay Savage
On Wed, Apr 22, 2009 at 6:12 PM, Chas. Owens chas.ow...@gmail.com wrote:
 On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
 Chas. Owens wrote:

 On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc
 wrote:
 snip

 The utf8 pragma affects the whole file,

 Well, only the part of the file that is parsed after the

    use utf8;

 statement, right?
 snip

 Hmm, I don't think it would reparse the whole file, but
 it does run in a BEGIN block...hmm, I must test it.


It runs in a begin block, but it is still lexically scoped. Pragmata
are very special cases of modules that provide modifications of
compile-time behavior, and many of them perform sleight of hand behind
the scenes. Here, the sleight of hand is using utf8 to simply add a
bit mask to $^H and relying on the the behavior of the compiler hints.

The important thing to remember about a BEGIN block that it is run as
soon as it is defined, where it is defined. Just because it is
executed early in the compile-optimize-run cycle does not mean that
it is magically transported to an earlier position in the file.
Generally, you want to apply the behavior introduced by a module to
have file scope, which is why use statements normally appear early in
the file.

See perlpragma and the description of $^H in perlrun for details.

HTH

-- j
--
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.downloadsquad.com  http://www.engatiki.org

values of β will give rise to dom!

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-24 Thread Chas. Owens
2009/4/24 Jay Savage daggerqu...@gmail.com:
snip
 Hmm, I don't think it would reparse the whole file, but
 it does run in a BEGIN block...hmm, I must test it.


 It runs in a begin block, but it is still lexically scoped. Pragmata
 are very special cases of modules that provide modifications of
 compile-time behavior, and many of them perform sleight of hand behind
 the scenes. Here, the sleight of hand is using utf8 to simply add a
 bit mask to $^H and relying on the the behavior of the compiler hints.

 The important thing to remember about a BEGIN block that it is run as
 soon as it is defined, where it is defined. Just because it is
 executed early in the compile-optimize-run cycle does not mean that
 it is magically transported to an earlier position in the file.
 Generally, you want to apply the behavior introduced by a module to
 have file scope, which is why use statements normally appear early in
 the file.

 See perlpragma and the description of $^H in perlrun for details.
snip

All of this is good information, but for one thing: not all pragmas
are lexically scoped.  Hence the need to test and/or read the docs.
For instance, the re pragma[1] is only partially lexical:

#!/usr/bin/perl

use strict;
use warnings;

foo =~ /(o+)/; #re 'debug' still affects this line

use re 'debug';

1. http://perldoc.perl.org/re.html

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-24 Thread Chas. Owens
On Fri, Apr 24, 2009 at 15:53, Chas. Owens chas.ow...@gmail.com wrote:
snip
 All of this is good information, but for one thing: not all pragmas
 are lexically scoped.  Hence the need to test and/or read the docs.
 For instance, the re pragma[1] is only partially lexical:

 #!/usr/bin/perl

 use strict;
 use warnings;

 foo =~ /(o+)/; #re 'debug' still affects this line

 use re 'debug';

 1. http://perldoc.perl.org/re.html
snip

The sigtrap pragma is another example of a pragma that is not
lexically scoped.  The docs don't say one way or the other, but a
quick test proves that it isn't:

#!/usr/bin/perl

use strict;
use warnings;

kill 2, $$;

sub not_even_called {
use sigtrap die = INT;
}

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-24 Thread Jay Savage
On Fri, Apr 24, 2009 at 3:53 PM, Chas. Owens chas.ow...@gmail.com wrote:
 2009/4/24 Jay Savage daggerqu...@gmail.com:
 snip
 Hmm, I don't think it would reparse the whole file, but
 it does run in a BEGIN block...hmm, I must test it.


 It runs in a begin block, but it is still lexically scoped. Pragmata
 are very special cases of modules that provide modifications of
 compile-time behavior, and many of them perform sleight of hand behind
 the scenes. Here, the sleight of hand is using utf8 to simply add a
 bit mask to $^H and relying on the the behavior of the compiler hints.

 The important thing to remember about a BEGIN block that it is run as
 soon as it is defined, where it is defined. Just because it is
 executed early in the compile-optimize-run cycle does not mean that
 it is magically transported to an earlier position in the file.
 Generally, you want to apply the behavior introduced by a module to
 have file scope, which is why use statements normally appear early in
 the file.

 See perlpragma and the description of $^H in perlrun for details.
 snip

 All of this is good information, but for one thing: not all pragmas
 are lexically scoped.  Hence the need to test and/or read the docs.
 For instance, the re pragma[1] is only partially lexical:

 #!/usr/bin/perl

 use strict;
 use warnings;

 foo =~ /(o+)/; #re 'debug' still affects this line

 use re 'debug';

 1. http://perldoc.perl.org/re.html


Agreed, absolutely. My point was that just because something's wrapped
in in a BEGIN block doesn't mean one should assume it affects the
entire program, or be surprised when it doesn't

--j
--
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.downloadsquad.com  http://www.engatiki.org

values of β will give rise to dom!

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Stanisław T. Findeisen

Gunnar Hjalmarsson wrote:

Stanisław T. Findeisen wrote:
Hi how to write regular expressions matching against Unicode (eg., 
UTF-8) strings?


For instance, in my regexp:

qr/^([.@ \w])*$/


Decode the UTF-8 encoded strings before applying the regex on them.

$ perl -MEncode -le '
$utf8_encoded = smörgåsbord;
$s = decode UTF-8, $utf8_encoded;
print Match if $s =~ /^\w+$/;
'
Match
$


Thanks, decode helped with this. But can I ask you one more question? 
What assumptions does Perl make regarding input file (i.e., the 
program/script file) encoding?


Is it so that string literals in Perl are byte arrays in fact? What you 
type is what you get?


STF

===
http://eisenbits.homelinux.net/~stf/
OpenPGP: DFD9 0146 3794 9CF6 17EA  D63F DBF5 8AA8 3B31 FE8A
===

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson

Stanisław T. Findeisen wrote:

Gunnar Hjalmarsson wrote:

Stanisław T. Findeisen wrote:
Hi how to write regular expressions matching against Unicode (eg., 
UTF-8) strings?


For instance, in my regexp:

qr/^([.@ \w])*$/


Decode the UTF-8 encoded strings before applying the regex on them.

$ perl -MEncode -le '
$utf8_encoded = smörgåsbord;
$s = decode UTF-8, $utf8_encoded;
print Match if $s =~ /^\w+$/;
'
Match
$


Thanks, decode helped with this. But can I ask you one more question? 
What assumptions does Perl make regarding input file (i.e., the 
program/script file) encoding?


AFAIK, it just converts the bytes into Perl's internal format, but it 
does not assume anything (at least not by default) with respect to the 
character encoding.



Is it so that string literals in Perl are byte arrays in fact?


String literals in a Perl script are byte *strings* until decoded.


What you type is what you get?


Not sure what you mean by that.

You may find http://perldoc.perl.org/perlunitut.html helpful.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Stanisław T. Findeisen

Gunnar Hjalmarsson wrote:
What assumptions does Perl make regarding input file (i.e., the 
program/script file) encoding?


AFAIK, it just converts the bytes into Perl's internal format, but it 
does not assume anything (at least not by default) with respect to the 
character encoding.



Is it so that string literals in Perl are byte arrays in fact?


String literals in a Perl script are byte *strings* until decoded.


Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) 
one can however make them parsed (decoded) (provided they are valid UTF-8).


It's all about UTF8 flag: 
http://perldoc.perl.org/Encode.html#The-UTF8-flag .


STF

===
http://eisenbits.homelinux.net/~stf/
OpenPGP: DFD9 0146 3794 9CF6 17EA  D63F DBF5 8AA8 3B31 FE8A
===

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson

Stanisław T. Findeisen wrote:

Gunnar Hjalmarsson wrote:
What assumptions does Perl make regarding input file (i.e., the 
program/script file) encoding?


AFAIK, it just converts the bytes into Perl's internal format, but it 
does not assume anything (at least not by default) with respect to the 
character encoding.



Is it so that string literals in Perl are byte arrays in fact?


String literals in a Perl script are byte *strings* until decoded.


Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) 
one can however make them parsed (decoded) (provided they are valid UTF-8).


No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. 
variable names or subroutine names.


$ perl -MEncode -le '
$s = smörgåsbord;
print length $s;
use utf8;
print length $s;
$s = decode UTF-8, $s;
print length $s;
'
13
13
11
$



It's all about UTF8 flag: 
http://perldoc.perl.org/Encode.html#The-UTF8-flag .


Maybe... That's above my head right now, I'm afraid.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson

Gunnar Hjalmarsson wrote:

Stanisław T. Findeisen wrote:
With use utf8 (http://perldoc.perl.org/utf8.html) one can however 
make them parsed (decoded) (provided they are valid UTF-8).


No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. 
variable names or subroutine names.


Or did you possibly mean the utf8::decode() function?

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Stanisław T. Findeisen

Gunnar Hjalmarsson wrote:

Or did you possibly mean the utf8::decode() function?


I mean this:

#!/usr/bin/perl

use warnings;
use strict;
# use utf8;
use Encode;

my $utf8_encoded = smörgåsbord;
print('is_utf8: ' . (Encode::is_utf8($utf8_encoded) ? 'TRUE' : 'FALSE') 
. \n);


This outputs FALSE here, but uncomment use utf8 and it gets TRUE. 
Looks like with use utf8 those string literals aren't ordinary byte 
strings anymore. Perhaps they are as if Encode::decode had been applied 
to them?


STF

===
http://eisenbits.homelinux.net/~stf/
OpenPGP: DFD9 0146 3794 9CF6 17EA  D63F DBF5 8AA8 3B31 FE8A
===

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Chas. Owens
On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
snip
 Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one
 can however make them parsed (decoded) (provided they are valid UTF-8).

 No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable
 names or subroutine names.
snip

From perldoc utf8[1]:
Bytes in the source text that have their high-bit set will be
treated as being part of a literal UTF-X sequence. This includes
most literals such as identifier names, string constants, and
constant regular expression patterns.

The utf8 pragma affects the whole file, not just variable and subroutine names.

1. http://perldoc.perl.org/utf8.html

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson

Stanisław T. Findeisen wrote:

I mean this:

#!/usr/bin/perl

use warnings;
use strict;
# use utf8;
use Encode;

my $utf8_encoded = smörgåsbord;
print('is_utf8: ' . (Encode::is_utf8($utf8_encoded) ? 'TRUE' : 'FALSE') 
. \n);


This outputs FALSE here, but uncomment use utf8 and it gets TRUE. 
Looks like with use utf8 those string literals aren't ordinary byte 
strings anymore. Perhaps they are as if Encode::decode had been applied 
to them?


Yes, it seems to be so. Please also see my reply to Chas.'s post.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Gunnar Hjalmarsson

Chas. Owens wrote:

On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
snip

Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one
can however make them parsed (decoded) (provided they are valid UTF-8).

No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable
names or subroutine names.

snip

From perldoc utf8[1]:
Bytes in the source text that have their high-bit set will be
treated as being part of a literal UTF-X sequence. This includes
most literals such as identifier names, string constants, and
constant regular expression patterns.

The utf8 pragma affects the whole file,


Well, only the part of the file that is parsed after the

use utf8;

statement, right?


not just variable and subroutine names.


Yes, I agree on that now. Thanks for the correction.

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Chas. Owens
On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
 Chas. Owens wrote:

 On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc
 wrote:
 snip

 Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html)
 one
 can however make them parsed (decoded) (provided they are valid UTF-8).

 No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g.
 variable
 names or subroutine names.

 snip

 From perldoc utf8[1]:
    Bytes in the source text that have their high-bit set will be
    treated as being part of a literal UTF-X sequence. This includes
    most literals such as identifier names, string constants, and
    constant regular expression patterns.

 The utf8 pragma affects the whole file,

 Well, only the part of the file that is parsed after the

    use utf8;

 statement, right?
snip

Hmm, I don't think it would reparse the whole file, but
it does run in a BEGIN block...hmm, I must test it.


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-22 Thread Chas. Owens
On Wed, Apr 22, 2009 at 18:12, Chas. Owens chas.ow...@gmail.com wrote:
 On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
 Chas. Owens wrote:

 On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc
 wrote:
 snip

 Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html)
 one
 can however make them parsed (decoded) (provided they are valid UTF-8).

 No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g.
 variable
 names or subroutine names.

 snip

 From perldoc utf8[1]:
    Bytes in the source text that have their high-bit set will be
    treated as being part of a literal UTF-X sequence. This includes
    most literals such as identifier names, string constants, and
    constant regular expression patterns.

 The utf8 pragma affects the whole file,

 Well, only the part of the file that is parsed after the

    use utf8;

 statement, right?
 snip

 Hmm, I don't think it would reparse the whole file, but
 it does run in a BEGIN block...hmm, I must test it.
snip

#!/usr/bn/perl

use strict;

my $first;

BEGIN { $first = é };

my $next = é;

use utf8;

my $last = é;

print first is , utf8::is_utf8($first) ?  : not , UTF-8\n;
print next is , utf8::is_utf8($next) ?  : not , UTF-8\n;
print last is , utf8::is_utf8($last) ?  : not , UTF-8\n;

gives me

first is not UTF-8
next is not UTF-8
last is UTF-8

So I would say that it only takes affect for lines after it is used.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




\w regular expressions unicode

2009-04-18 Thread Stanisław T. Findeisen
Hi how to write regular expressions matching against Unicode (eg., 
UTF-8) strings?


For instance, in my regexp:

qr/^([.@ \w])*$/

I am using \w because here: http://perldoc.perl.org/perlretut.html it says:

===
\w matches a word character (alphanumeric or _), not just [0-9a-zA-Z_] 
but also digits and characters from non-roman scripts

===

Unfortunately, this doesn't seem to work with non-ASCII. :-/

Is this a configuration issue?

Thanks!
STF

===
http://eisenbits.homelinux.net/~stf/
OpenPGP: DFD9 0146 3794 9CF6 17EA  D63F DBF5 8AA8 3B31 FE8A
===

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: \w regular expressions unicode

2009-04-18 Thread Gunnar Hjalmarsson

Stanisław T. Findeisen wrote:
Hi how to write regular expressions matching against Unicode (eg., 
UTF-8) strings?


For instance, in my regexp:

qr/^([.@ \w])*$/


Decode the UTF-8 encoded strings before applying the regex on them.

$ perl -MEncode -le '
$utf8_encoded = smörgåsbord;
$s = decode UTF-8, $utf8_encoded;
print Match if $s =~ /^\w+$/;
'
Match
$

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-08 Thread Gunnar Hjalmarsson

Chas. Owens wrote:

On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson nore...@gunnar.cc wrote:

TMTOWTDI

   use Time::Local;
   while (DATA) {
   s{,(.+?),}{
   my ($d, $m, $y) = split /\//, $1;
   my $t = timelocal 0, 0, 0, $d, $m-1, $y;
   ($d, $m, $y) = (localtime $t)[3..5];
   sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d;
   }e;
   }

snip

And this would be the confusing, fragile mess I spoke of.


Sorry, but I fail too see how using the s/// operator to extract the 
date field would be so much more confusing and fragile compared to 
split() + join().


--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-08 Thread Chas. Owens
On Sun, Feb 8, 2009 at 03:49, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
 Chas. Owens wrote:

 On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson nore...@gunnar.cc
 wrote:

 TMTOWTDI

   use Time::Local;
   while (DATA) {
   s{,(.+?),}{
   my ($d, $m, $y) = split /\//, $1;
   my $t = timelocal 0, 0, 0, $d, $m-1, $y;
   ($d, $m, $y) = (localtime $t)[3..5];
   sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d;
   }e;
   }

 snip

 And this would be the confusing, fragile mess I spoke of.

 Sorry, but I fail too see how using the s/// operator to extract the date
 field would be so much more confusing and fragile compared to split() +
 join().
snip

You are calling three functions (one of which is split) and assigning
returns three times inside the replacement.  Add on top of that the
fact that the regex only works for the second field.  Compare all of
that to calling two much simpler functions, a simple substitution, and
one assignment.  Try to imagine what happens six months from now when
you need to go back and perform a transformation on the fifth field.
Are you going to extend the regex to try to capture that value?  Or
are you just going to rewrite the code to use a split like you should
have in the first place?  Also, there may be a need to handle commas
in the fields at some point in the future.  This will entail using a
module like Text::CSV.  With the split code you can just replace the
split with the proper parsing function from the module.  With the
giant substitution code you pretty much have to rewrite the whole
thing.

I am all for using advanced features of Perl when it makes the code
clearer or more concise, but this code is longer than the split
version, involves more functions (including the confusing* localtime
and timelocal functions), and doesn't even do error checking on the
data.

On an unrelated topic, why are you using timelocal?  A much better
solution is to use the strftime function from the POSIX module:

#!/usr/bin/perl

use strict;
use warnings;

use POSIX;

while (DATA) {
s{,([^,]+),}{
my ($m, $d, $y) = $1 =~ m^([0-9]+)/([0-9]+)/([0-9]+)$
or die $. has an invalid date format;
strftime ,%Y%m%d,, 0, 0, 0, $d, $m - 1, $y - 1900;
}e;
print;
}

__DATA__
1,1/1/2009,optional,foo
2,1/2/2009,,bar
3,1/3/2009,,baz


Note how the split from your code has been changed to a regex.  This
is because split is indiscriminate.  This was good in my code because
it acted as future proofing against more fields being added to the
record** (which is unlikely to affect the meaning of earlier fields),
but bad here because we know the expected format of the date and the
chances of it not being that format and the code still being correct
at some point in the future is small.

* localtime pretty much only makes sense when you know the C based tm
structure it came from and timelocal, besides being a word play that
is too clever by half, is worse because it violates that structure***.
** also, if we wanted to throw an error because there were too few or
too many fields it would be easily achieved by asking the array how
many elements it held.
*** http://perldoc.perl.org/Time/Local.html#Year-Value-Interpretation

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-08 Thread Gunnar Hjalmarsson

Chas. Owens wrote:

On Sun, Feb 8, 2009 at 03:49, Gunnar Hjalmarsson nore...@gunnar.cc wrote:

Sorry, but I fail too see how using the s/// operator to extract the date
field would be so much more confusing and fragile compared to split() +
join().


You are calling three functions (one of which is split) and assigning
returns three times inside the replacement.  Add on top of that the
fact that the regex only works for the second field.  Compare all of
that to calling two much simpler functions, a simple substitution, and
one assignment.


Think you are comparing apples and oranges now. Since we don't know what 
kind of conversion the OP wants to do, I thought we were only discussing 
the date extracting part of the problem. To clarify, I rewrote my code:


use Time::Local;

while (DATA) {
s{(?=,)(.+?)(?=,)}{ dateconvert($1) }e;
print;
}

sub dateconvert {
my ($d, $m, $y) = split /\//, shift;
my $t = timelocal 0, 0, 0, $d, $m-1, $y;
($d, $m, $y) = (localtime $t)[3..5];
sprintf '%d-%02d-%02d', $y+1900, $m+1, $d;
}

__DATA__
TICKER,06/02/09,OPEN,HIGH,LOW,CLOSE,VOLUME,OI
TICKER,07/02/09,OPEN,HIGH,LOW,CLOSE,VOLUME,OI
TICKER,08/02/97,OPEN,HIGH,LOW,CLOSE,VOLUME,OI


In other words, if we are to compare each others code, I believe that

s{(?=,)(.+?)(?=,)}{ dateconvert($1) }e;
print;

ought to be compared with

my @record = split /,/, $_;
$record[1] = dateconvert( $record[1] );
print join ,, @record;


Try to imagine what happens six months from now when
you need to go back and perform a transformation on the fifth field.
Are you going to extend the regex to try to capture that value?  Or
are you just going to rewrite the code to use a split like you should
have in the first place?


Didn't think about that. Maybe I will use split + join. Not a big deal, IMO.


I am all for using advanced features of Perl when it makes the code
clearer or more concise, but this code is longer than the split
version, involves more functions (including the confusing* localtime
and timelocal functions),


My use of localtime and timelocal is totally unrelated to whether I use 
the split version or not.



and doesn't even do error checking on the data.


Not true. timelocal() does error checking.


On an unrelated topic, why are you using timelocal?


Because of its built-in error checking? ;-)  Or maybe because I wanted 
to use its Year Value Interpretation feature. (Note that I assumed 
conversion from dd/mm/yy to -mm-dd, and that a date from the 90's is 
included in my sample data.)



A much better
solution is to use the strftime function from the POSIX module:


Maybe.

Somehow I tend to believe that date conversion code becomes more robust 
if you go to epoch seconds and back. Isn't that what most date and time 
related modules do behind the scenes, btw?


--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Regular Expressions

2009-02-07 Thread Soham Das
Hi All,

I am a noob in Perl and hence would like some help to what I am sure is a very 
easy problem.

I have got a text  file in csv format
The format is:
TICKER,DATE,OPEN,HIGH,LOW,CLOSE,VOLUME,OI

Now my objective is to change the format of the date, and rename the whole file 
as a .csv

So, my strategy is:
I want to read the content between the first and second comma, take it in a 
variable and do the slicing and dicing and write it back.

Because I need some real life practice in REGEX, how do you suggest I read the 
contents between the first and the second comma?

Soham


  Add more friends to your messenger and enjoy! Go to 
http://messenger.yahoo.com/invite/

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-07 Thread Chas. Owens
On Sat, Feb 7, 2009 at 08:45, Soham Das soham...@yahoo.co.in wrote:
 Hi All,

 I am a noob in Perl and hence would like some help to what I am sure is a 
 very easy problem.

 I have got a text  file in csv format
 The format is:
 TICKER,DATE,OPEN,HIGH,LOW,CLOSE,VOLUME,OI

 Now my objective is to change the format of the date, and rename the whole 
 file as a .csv

 So, my strategy is:
 I want to read the content between the first and second comma, take it in a 
 variable and do the slicing and dicing and write it back.

 Because I need some real life practice in REGEX, how do you suggest I read 
 the contents between the first and the second comma?
snip

This isn't a job for a regex; it is a job for split:

my @record = split ,, $record;
$record[1] =~ s{(..)/(..)/()}{$3$1$2}
or die line $. has an invalid date format;
print join ,, @record;

You could say

$record =~ s{(.*?),(..)/(..)/(),}{$1,$4$2$3,}
or die line $. has an invalid date format;
print $record;

but the next person to maintain your code may be a little upset at
you, especially in the more complicated versions of this type of
substitution.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-07 Thread Gunnar Hjalmarsson

Chas. Owens wrote:

This isn't a job for a regex; it is a job for split:


whose first argument is a regex pattern... ;-)

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-07 Thread Chas. Owens
On Sat, Feb 7, 2009 at 16:09, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
 Chas. Owens wrote:

 This isn't a job for a regex; it is a job for split:

 whose first argument is a regex pattern... ;-)
snip

Yes and a regex follows in the substitute, but the whole things isn't
being done with a regex.  Trying to do it with one regex can lead to a
confusing and fragile mess.


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-07 Thread Gunnar Hjalmarsson

Chas. Owens wrote:

On Sat, Feb 7, 2009 at 16:09, Gunnar Hjalmarsson nore...@gunnar.cc wrote:

Chas. Owens wrote:

This isn't a job for a regex; it is a job for split:

whose first argument is a regex pattern... ;-)

snip

Yes and a regex follows in the substitute, but the whole things isn't
being done with a regex.  Trying to do it with one regex can lead to a
confusing and fragile mess.


TMTOWTDI

use Time::Local;
while (DATA) {
s{,(.+?),}{
my ($d, $m, $y) = split /\//, $1;
my $t = timelocal 0, 0, 0, $d, $m-1, $y;
($d, $m, $y) = (localtime $t)[3..5];
sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d;
}e;
}

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-07 Thread Rob Dixon
Chas. Owens wrote:
 On Sat, Feb 7, 2009 at 08:45, Soham Das soham...@yahoo.co.in wrote:
 Hi All,

 I am a noob in Perl and hence would like some help to what I am sure is a 
 very easy problem.

 I have got a text  file in csv format
 The format is:
 TICKER,DATE,OPEN,HIGH,LOW,CLOSE,VOLUME,OI

 Now my objective is to change the format of the date, and rename the whole 
 file as a .csv

 So, my strategy is:
 I want to read the content between the first and second comma, take it in a 
 variable and do the slicing and dicing and write it back.

 Because I need some real life practice in REGEX, how do you suggest I read 
 the contents between the first and the second comma?
 snip
 
 This isn't a job for a regex; it is a job for split:
 
 my @record = split ,, $record;
 $record[1] =~ s{(..)/(..)/()}{$3$1$2}
 or die line $. has an invalid date format;
 print join ,, @record;
 
 You could say
 
 $record =~ s{(.*?),(..)/(..)/(),}{$1,$4$2$3,}
 or die line $. has an invalid date format;
 print $record;
 
 but the next person to maintain your code may be a little upset at
 you, especially in the more complicated versions of this type of
 substitution.

$record =~ s|,(..)/(..)/(),|,$3$1$2,| or die Data problem;

Rob


-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-07 Thread Chas. Owens
On Sat, Feb 7, 2009 at 19:21, Rob Dixon rob.di...@gmx.com wrote:
snip
 $record =~ s|,(..)/(..)/(),|,$3$1$2,| or die Data problem;
snip

Yes, but how would you handle it if this weren't the second field?  It
is better to have a general solution.


-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Regular Expressions

2009-02-07 Thread Chas. Owens
On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson nore...@gunnar.cc wrote:
 Chas. Owens wrote:

 On Sat, Feb 7, 2009 at 16:09, Gunnar Hjalmarsson nore...@gunnar.cc
 wrote:

 Chas. Owens wrote:

 This isn't a job for a regex; it is a job for split:

 whose first argument is a regex pattern... ;-)

 snip

 Yes and a regex follows in the substitute, but the whole things isn't
 being done with a regex.  Trying to do it with one regex can lead to a
 confusing and fragile mess.

 TMTOWTDI

use Time::Local;
while (DATA) {
s{,(.+?),}{
my ($d, $m, $y) = split /\//, $1;
my $t = timelocal 0, 0, 0, $d, $m-1, $y;
($d, $m, $y) = (localtime $t)[3..5];
sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d;
}e;
}
snip

And this would be the confusing, fragile mess I spoke of.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-05 Thread Aaron Rubinstein
 Given just the idea of the data, can you improve on that?

I bet I could!  It's interesting how my instinct, when trying to develop a
programming solution, is to wrestle with the problem inside the context of
the language.  As a result, the solutions I come up with tend to be shaped
by my limited understanding of that language.  I think you're right that
this is a case of fluency, that I am fluent in English and my best problem
solving skills are most likely in that context.  Trying to solve the problem
in Perl, I'm likely not using my best skills and thus come up with a poor
solution.

I also take from your advice, whether you meant it or not, that I should
approach my code as if it would be scalable.  My solution is probably
adequate for a small scale problem but its silliness would quickly be
exposed as soon as the data scaled up.

Thanks for the advice and inspiration.

On Sat, May 3, 2008 at 8:08 PM, Rob Dixon [EMAIL PROTECTED] wrote:

 rubinsta wrote:
  Hello,
 
  I'm a Perl uber-novice and I'm trying to compare two files in order to
  exclude items listed on one file from the complete list on the other
  file.  What I have so far prints out a third file listing everything
  that matches the exclude file from the complete file (which I'm hoping
  will be a duplicate of the exclude file) just so I can make sure that
  the comparison script is working.  The files are lists of numbers
  separated by newlines.  The exclude file has 333 numbers and the
  complete file has 9000 numbers.
 
  Here's what I have so far:
 
  #!/usr/bin/perl
  use strict;
  use warnings;
 
  open(ALL, all.txt) or die $!;
  open(EX, exclude.txt) or die $!;
  open(OUT,'exTest.txt') or die $!;
 
  my @ex_lines = EX;
  my @all_lines = ALL;
 
  foreach $all (@all_lines){
 foreach $ex (@ex_lines){
 if ($ex =~ /(^$all)/){

 The lines you have read from the object files are unchomped (include the
 trailing newline character) and there is no allowance for leading or
 trailing
 whitespace. Are you sure of your input data?

 The regex has an unnecessary capture (parentheses) and isn't tied at the
 end of
 the string, although leaving the record separator at the end of $ex and
 $all has
 a similar effect.

 It should really be simply

  if ($ex eq $all)

print OUT $1;

 The two strings are equal, so

  print OUT $all;

 }
 }
  }
  close(ALL);
  close(EX);

 Explicit closures are pointless unless the status is verified. All open
 filehandles will be closed by Perl when it finishes processing the script.

 (Even if an input file doesn't close cleanly, the damage has already been
 done
 when an earlier read failed. If a volume is dismounted while the program
 is
 running, for example, without explicit handling of read errors the file
 will
 simply appear to be shorter than its true length.)

  close(OUT);

 There's no need to close output files unless you're in a fragile
 environment, or
  if it is vital that the output information is complete. For instance it
 may be
 useful to write

  close $output or die $!;
  unlink 'input.txt';

 so that the object data was discarded only if the target data was safely
 written
 and secured.

  I realize the nested foreach loops are ugly but I don't know enough to
  navigate the filehandles, which as I understand, can only be assigned
  to variables in their entirety as an array.  Any thoughts on how this
  might be done?

 You should try to solve the problem instead of solving the data. Nearly
 all of
 your code is about opening, reading, and closing files. Your solution
 amounts to:

  if any of the lines in ALL match any of the lines in EX then print (it)

 Given just the idea of the data, can you improve on that? For instance, if
 one
 or both of the object files are sorted then you may not need to reassess
 all of
 the lines for each comparison. Or if the lines could occur more than once
 in
 either or both files, then it may be an idea to maintain a record of what
 comparisons had already been made. Those ideas are independent of Perl, or
 indeed of any programming language.

 After that, the line blurs. Programming languages are useful thinking
 tools for
 imagining programming solutions, just as natural languages are useful for
 life's
 challenges. An idea expressed in Latin can be impossible to recreate
 intact in
 French, just a solution in Forth can be inexpressible in C++.

 But despite its blurriness the line is narrow, so have courage and dash
 cross it
 into the implementation, where all languages have ways to open, close,
 read and
 write files; ways to handle numbers and strings; conveniences for arrays
 and
 constants and, God forbid, error handling.

 But I encourage you to start at the beginning, and if common sense is more
 familiar to you than Perl or any other programming language then use that.
 Your
 imagination is your best tool.

 If you were given two piles of line printer paper and were told to find
 the
 differences:

 - what questions would 

Re: Comparing files with regular expressions

2008-05-05 Thread Rob Dixon
Aaron Rubinstein wrote:

 Given just the idea of the data, can you improve on that?
 
 I bet I could!

I bet you could too :)

 It's interesting how my instinct, when trying to develop a programming 
 solution, is to wrestle with the problem inside the context of the language. 
 As a result, the solutions I come up with tend to be shaped by my limited 
 understanding of that language. I think you're right that this is a case of
 fluency, that I am fluent in English and my best problem solving skills are
 most likely in that context. Trying to solve the problem in Perl, I'm likely
 not using my best skills and thus come up with a poor solution.

It's a frequent assumption that when you working with a tool of any sort,
whether it's a knife and fork or a golf club, that you should work with that
tool until you are proficient. But unless those tools are prescribed by the
rules of the game in play then you should consider alternatives. I often eat
from a ladle or wooden spoon when I am cooking, but etiquette says that I may
not do the same at table; and getting a ball into a hole half a mile away by
hitting it with a stick is not a good solution by any standards.

More often than not, a programming language restricts what you can do over what
you can describe using English, and while you can always get more out of any
language by becoming familiar with it, you are usually becoming familiar with
what is impossible or difficult rather than getting used to new exciting
possibilities.

 I also take from your advice, whether you meant it or not, that I should
 approach my code as if it would be scalable.  My solution is probably
 adequate for a small scale problem but its silliness would quickly be
 exposed as soon as the data scaled up.

Never write off your solution as silly. If it works then it is a solution, and
final solutions are almost never the best ones possible.

I meant quite the opposite about scalability. My intention was to emphasize that
the amount of data changes what is a good solution. It is a useful exercise to
imagine that the data is printed on sheets of paper and that you have to solve
the problem manually given just an aircraft hangar full of filing cabinets. If
you have only a couple of sheets of paper with a single line printed on each,
then you can just sit at your desk and write the output. But if you have several
stacks of paper then you might want to start using the filing system.

 Thanks for the advice and inspiration.

You're more than welcome. Remember that the best way to solve a problem, whether
it's a programming problem or any other sort, is to think about whether it's
comparable to any situation you have already come across. It's called
abstraction and it's your friend :)

Rob

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-03 Thread Jenda Krynicky
From:   Chas. Owens [EMAIL PROTECTED]
 On Fri, May 2, 2008 at 10:44 AM, rubinsta [EMAIL PROTECTED] wrote:
 snip
  Any thoughts as to why
   some of the matches are getting missed?
 snip
 
 Not off hand.  I will extract your code and do some tests.  Can you
 send me your data or is it sensitive?
 
 snip
   Just out of beginner curiosity, why did you suggest I use the 3
   argument filehandle instead of:
   open(EX, exclude1.txt) or die $!
 snip
 
 Because the three argument version of open is safer.  It doesn't
 matter in the code you wrote because you used a literal string, but if
 you say
 
 open FH, $file or die could not open $file: $!;
 
 expecting FH to be a read filehandle and $file contains the filename
 important, you will wind up with a write filehandle.

And that means you were lucky. If the $file contained something like 
|rm -rf / or rm -rf / | ...

Jenda
= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-03 Thread Chas. Owens
On Sat, May 3, 2008 at 4:42 PM, Jenda Krynicky [EMAIL PROTECTED] wrote:
snip
  [stuff about how two arg open is more dangerous than three arg open
  And that means you were lucky. If the $file contained something like
  |rm -rf / or rm -rf / | ...
snip

Nah, you would be lucky if that were the case: / isn't a valid POSIX
filename character.  | rm -rf . or rm -rf . | on the other hand is
much more dangerous.  Not only is . a valid filename character, you
also tend to actually have permission to modify the current working
directory.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-03 Thread Jenda Krynicky
From: Chas. Owens [EMAIL PROTECTED]
 On Sat, May 3, 2008 at 4:42 PM, Jenda Krynicky [EMAIL PROTECTED] wrote:
 snip
   [stuff about how two arg open is more dangerous than three arg open
   And that means you were lucky. If the $file contained something like
   |rm -rf / or rm -rf / | ...
 snip
 
 Nah, you would be lucky if that were the case: / isn't a valid POSIX
 filename character.

Why do you think it matters? And | is a valid POSIX filename 
character?

  | rm -rf . or rm -rf . | on the other hand is
 much more dangerous.  Not only is . a valid filename character, you
 also tend to actually have permission to modify the current working
 directory.

You are right about the last issue though.

Jenda
= [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
-- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-03 Thread Rob Dixon
rubinsta wrote:
 Hello,
 
 I'm a Perl uber-novice and I'm trying to compare two files in order to
 exclude items listed on one file from the complete list on the other
 file.  What I have so far prints out a third file listing everything
 that matches the exclude file from the complete file (which I'm hoping
 will be a duplicate of the exclude file) just so I can make sure that
 the comparison script is working.  The files are lists of numbers
 separated by newlines.  The exclude file has 333 numbers and the
 complete file has 9000 numbers.
 
 Here's what I have so far:
 
 #!/usr/bin/perl
 use strict;
 use warnings;
 
 open(ALL, all.txt) or die $!;
 open(EX, exclude.txt) or die $!;
 open(OUT,'exTest.txt') or die $!;
 
 my @ex_lines = EX;
 my @all_lines = ALL;
 
 foreach $all (@all_lines){
foreach $ex (@ex_lines){
if ($ex =~ /(^$all)/){

The lines you have read from the object files are unchomped (include the
trailing newline character) and there is no allowance for leading or trailing
whitespace. Are you sure of your input data?

The regex has an unnecessary capture (parentheses) and isn't tied at the end of
the string, although leaving the record separator at the end of $ex and $all has
a similar effect.

It should really be simply

  if ($ex eq $all)

   print OUT $1;

The two strings are equal, so

  print OUT $all;

}
}
 }
 close(ALL);
 close(EX);

Explicit closures are pointless unless the status is verified. All open
filehandles will be closed by Perl when it finishes processing the script.

(Even if an input file doesn't close cleanly, the damage has already been done
when an earlier read failed. If a volume is dismounted while the program is
running, for example, without explicit handling of read errors the file will
simply appear to be shorter than its true length.)

 close(OUT);

There's no need to close output files unless you're in a fragile environment, or
 if it is vital that the output information is complete. For instance it may be
useful to write

  close $output or die $!;
  unlink 'input.txt';

so that the object data was discarded only if the target data was safely written
and secured.

 I realize the nested foreach loops are ugly but I don't know enough to
 navigate the filehandles, which as I understand, can only be assigned
 to variables in their entirety as an array.  Any thoughts on how this
 might be done?

You should try to solve the problem instead of solving the data. Nearly all of
your code is about opening, reading, and closing files. Your solution amounts 
to:

  if any of the lines in ALL match any of the lines in EX then print (it)

Given just the idea of the data, can you improve on that? For instance, if one
or both of the object files are sorted then you may not need to reassess all of
the lines for each comparison. Or if the lines could occur more than once in
either or both files, then it may be an idea to maintain a record of what
comparisons had already been made. Those ideas are independent of Perl, or
indeed of any programming language.

After that, the line blurs. Programming languages are useful thinking tools for
imagining programming solutions, just as natural languages are useful for life's
challenges. An idea expressed in Latin can be impossible to recreate intact in
French, just a solution in Forth can be inexpressible in C++.

But despite its blurriness the line is narrow, so have courage and dash cross it
into the implementation, where all languages have ways to open, close, read and
write files; ways to handle numbers and strings; conveniences for arrays and
constants and, God forbid, error handling.

But I encourage you to start at the beginning, and if common sense is more
familiar to you than Perl or any other programming language then use that. Your
imagination is your best tool.

If you were given two piles of line printer paper and were told to find the
differences:

- what questions would you ask about the problem?
- how would you go about it?
- what would you want to know about the contents?

Once you know the answers, you have a solution. Then you can code it, given
knowledge of the language at hand.

Many things will change the solution, just as you would do things differently if
you had only two sheets of paper to compare, or a two-inch-thick stack. Whether
you had to do it every day or it was somebody else's turn in ten years' time.
Whether it was obvious that all of the lines on one stack of paper were the same
except for a few changes. You get the idea?

But unless it is easier for you to formulate solutions in Perl or any other
language, then imagine a real-world equivalent and use common sense.

Then just code it, and we will help.

HTH,

Rob

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-03 Thread Chas. Owens
On Sat, May 3, 2008 at 5:57 PM, Jenda Krynicky [EMAIL PROTECTED] wrote:
 From: Chas. Owens [EMAIL PROTECTED]

  On Sat, May 3, 2008 at 4:42 PM, Jenda Krynicky [EMAIL PROTECTED] wrote:
   snip
 [stuff about how two arg open is more dangerous than three arg open
 And that means you were lucky. If the $file contained something like
 |rm -rf / or rm -rf / | ...
   snip
  
   Nah, you would be lucky if that were the case: / isn't a valid POSIX
   filename character.

  Why do you think it matters? And | is a valid POSIX filename
snip

Hmm, you are right.  I was an idiot there.  I was assuming the file
was coming off disk (but named in a way to cause problems for the
processing program), but $file could come from anywhere.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Comparing files with regular expressions

2008-05-02 Thread rubinsta
Hello,

I'm a Perl uber-novice and I'm trying to compare two files in order to
exclude items listed on one file from the complete list on the other
file.  What I have so far prints out a third file listing everything
that matches the exclude file from the complete file (which I'm hoping
will be a duplicate of the exclude file) just so I can make sure that
the comparison script is working.  The files are lists of numbers
separated by newlines.  The exclude file has 333 numbers and the
complete file has 9000 numbers.

Here's what I have so far:

#!/usr/bin/perl
use strict;
use warnings;

open(ALL, all.txt) or die $!;
open(EX, exclude.txt) or die $!;
open(OUT,'exTest.txt') or die $!;

my @ex_lines = EX;
my @all_lines = ALL;

foreach $all (@all_lines){
   foreach $ex (@ex_lines){
   if ($ex =~ /(^$all)/){
print OUT $1;
   }
   }
}
close(ALL);
close(EX);
close(OUT);

I realize the nested foreach loops are ugly but I don't know enough to
navigate the filehandles, which as I understand, can only be assigned
to variables in their entirety as an array.  Any thoughts on how this
might be done?

Thanks!


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-02 Thread Chas. Owens
On Thu, May 1, 2008 at 4:09 PM, rubinsta [EMAIL PROTECTED] wrote:
 Hello,

  I'm a Perl uber-novice and I'm trying to compare two files in order to
  exclude items listed on one file from the complete list on the other
  file.  What I have so far prints out a third file listing everything
  that matches the exclude file from the complete file (which I'm hoping
  will be a duplicate of the exclude file) just so I can make sure that
  the comparison script is working.  The files are lists of numbers
  separated by newlines.  The exclude file has 333 numbers and the
  complete file has 9000 numbers.

  Here's what I have so far:

  #!/usr/bin/perl
  use strict;
  use warnings;

  open(ALL, all.txt) or die $!;
  open(EX, exclude.txt) or die $!;
  open(OUT,'exTest.txt') or die $!;
snip

Use the three argument version of open and lexical filehandles:

open my $ex, , exclude.txt
or die could not open exclude.txt: $!;

snip

  my @ex_lines = EX;
  my @all_lines = ALL;
snip

Using filehandles in list context is a bad idea.  It may work now when
the files are small, but data almost always grows.  Unless you are
certain that the file will remain small you should not do this.  Use a
while loop instead.

snip

  foreach $all (@all_lines){
foreach $ex (@ex_lines){
if ($ex =~ /(^$all)/){

This is testing to see if there are any lines in the exclude file that
start with what was in the complete file.  That is if the complete
file was

1
2

and the exclude file was

10
20

then all lines would be excluded.  Is this really what you want?
Also, given that you have not surrounded $all with \Q and \E (like
/^\Q$all\E/) and metacharacters in $all (like *, ., ?, etc.) will be
treated as metacharacters instead of normal characters.  Unless the
lines in complete are know to be regexes this could be bad.  And by
bad I mean everything from mismatches to the dreaded (?{system qq(rm
-rf $ENV{HOME})}).

If you don't have regexes in the complete file but do want to check
for its entires as prefixes in the exclude file, you are better off
using a prefix tree (aka a trie*).  It is an O(m log n)** algorithm,
as opposed to the O(n*m) algorithm you are using now.  There is at
least one Perl implementation: Tree::Trie***.

If you don't have regexes in the complete file and do not want to
check for entries as prefixes in the exclude file you are better off
using a hash set* to test for existence (roughly an O(m+n)
solution).  Luckily in Perl a hash set is easy to build, you just use
a hash variable with the keys being your data and the values all being
either undef or 1 depending on your style (I tend to use 1 for
simplicity's sake, but I think undef might be smaller).  Using a hash
also gives you the freedom to use something like DB_FILE** if the
files get very large (thus saving memory without having to add much
code.

snip
 print OUT $1;
}
}
  }
  close(ALL);
  close(EX);
  close(OUT);
snip

These calls to close at the end of the script are unnecessary.  Only
call close explicitly if you need to close a file before the
filehandle goes out of scope.

Another simple tip is to treat STDIN/files on the command line as your
complete file and STDOUT as your output file.  This form of Perl
script is called a filter and is very easy to write and use.  What
follows is my implementation of the hash set version:

#!/usr/bin/perl

use strict;
use warnings;

#this is a hack to make the script runnable
#without external data files, in a normal
#script you would open a real exclude file
#here
my $exclude = 1\n2\n3\n;
open my $ex, , \$exclude
or die could not open the scalar \$exculde as a file: $!;

my %exists;
$exists{$_} = 1 while $ex;

#this is also a hack, in a normal script
#you would say
#while (my $line = ) {
#to get a loop over STDIN or files specified
#on the commandline
while (my $line = DATA) {
print $line unless $exists{$line};
}

__DATA__
1
2
10
20


* http://en.wikipedia.org/wiki/Trie
** This is big O notation, basically it measure the order of
magnitude of number of steps needed to complete the algorithm.  So, if
you had 1,000 lines in exclude and 10,000 lines in complete it would
take roughly 10,000,000 steps to complete the algorithm you are using
now and only 13,287 with the trie.
*** http://search.cpan.org/~avif/Tree-Trie-1.5/Trie.pm
 http://en.wikipedia.org/wiki/Big_O_notation
* basically a hash with no values used for testing of existance of values
** http://perldoc.perl.org/DB_File.html

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Comparing files with regular expressions

2008-05-02 Thread rubinsta
Many thanks, Chas.  These are all very helpful (and educational!)
suggestions.  I adapted your example like so (specifying the all.txt
on the command-line):

#!/usr/bin/perl
use strict;
use warnings;

open my $ex, , exclude.txt or die $!;
open my $out, , exTest.txt or die $!;

my %exists;
$exists{$_} = 1 while $ex;

## I changed the unless to if so I could easily
## compare the output of the script to the
## original exclude.txt file

while (my $line = ){
print $out $line if $exists{$line};
}

The problem is the exlude.txt and exTest.txt do not match.  Everything
in the exTest.txt file is also in the exclude.txt file but there are a
number of lines that appear in the all.txt and the exclude.txt that do
not end up in exTest.txt.  The numbers are EANs and are thus all
exactly the same format, e.g. 9780657007423.  Any thoughts as to why
some of the matches are getting missed?

Just out of beginner curiosity, why did you suggest I use the 3
argument filehandle instead of:
open(EX, exclude1.txt) or die $!

Thanks again for all your help!


On May 2, 7:41 am, [EMAIL PROTECTED] (Chas. Owens) wrote:
 On Thu, May 1, 2008 at 4:09 PM, rubinsta [EMAIL PROTECTED] wrote:
  Hello,

   I'm a Perl uber-novice and I'm trying to compare two files in order to
   exclude items listed on one file from the complete list on the other
   file.  What I have so far prints out a third file listing everything
   that matches the exclude file from the complete file (which I'm hoping
   will be a duplicate of the exclude file) just so I can make sure that
   the comparison script is working.  The files are lists of numbers
   separated by newlines.  The exclude file has 333 numbers and the
   complete file has 9000 numbers.

   Here's what I have so far:

   #!/usr/bin/perl
   use strict;
   use warnings;

   open(ALL, all.txt) or die $!;
   open(EX, exclude.txt) or die $!;
   open(OUT,'exTest.txt') or die $!;

 snip

 Use the three argument version of open and lexical filehandles:

 open my $ex, , exclude.txt
 or die could not open exclude.txt: $!;

 snip

   my @ex_lines = EX;
   my @all_lines = ALL;

 snip

 Using filehandles in list context is a bad idea.  It may work now when
 the files are small, but data almost always grows.  Unless you are
 certain that the file will remain small you should not do this.  Use a
 while loop instead.

 snip



   foreach $all (@all_lines){
 foreach $ex (@ex_lines){
 if ($ex =~ /(^$all)/){

 This is testing to see if there are any lines in the exclude file that
 start with what was in the complete file.  That is if the complete
 file was

 1
 2

 and the exclude file was

 10
 20

 then all lines would be excluded.  Is this really what you want?
 Also, given that you have not surrounded $all with \Q and \E (like
 /^\Q$all\E/) and metacharacters in $all (like *, ., ?, etc.) will be
 treated as metacharacters instead of normal characters.  Unless the
 lines in complete are know to be regexes this could be bad.  And by
 bad I mean everything from mismatches to the dreaded (?{system qq(rm
 -rf $ENV{HOME})}).

 If you don't have regexes in the complete file but do want to check
 for its entires as prefixes in the exclude file, you are better off
 using a prefix tree (aka a trie*).  It is an O(m log n)** algorithm,
 as opposed to the O(n*m) algorithm you are using now.  There is at
 least one Perl implementation: Tree::Trie***.

 If you don't have regexes in the complete file and do not want to
 check for entries as prefixes in the exclude file you are better off
 using a hash set* to test for existence (roughly an O(m+n)
 solution).  Luckily in Perl a hash set is easy to build, you just use
 a hash variable with the keys being your data and the values all being
 either undef or 1 depending on your style (I tend to use 1 for
 simplicity's sake, but I think undef might be smaller).  Using a hash
 also gives you the freedom to use something like DB_FILE** if the
 files get very large (thus saving memory without having to add much
 code.

 snip print OUT $1;
 }
 }
   }
   close(ALL);
   close(EX);
   close(OUT);

 snip

 These calls to close at the end of the script are unnecessary.  Only
 call close explicitly if you need to close a file before the
 filehandle goes out of scope.

 Another simple tip is to treat STDIN/files on the command line as your
 complete file and STDOUT as your output file.  This form of Perl
 script is called a filter and is very easy to write and use.  What
 follows is my implementation of the hash set version:

 #!/usr/bin/perl

 use strict;
 use warnings;

 #this is a hack to make the script runnable
 #without external data files, in a normal
 #script you would open a real exclude file
 #here
 my $exclude = 1\n2\n3\n;
 open my $ex, , \$exclude
 or die could not open the scalar \$exculde as a file: $!;

 my %exists;
 $exists{$_} = 1 while $ex;

 #this is also a hack, in a normal script
 #you would say
 #while (my $line = ) {
 #to 

Re: Comparing files with regular expressions

2008-05-02 Thread Chas. Owens
On Fri, May 2, 2008 at 10:44 AM, rubinsta [EMAIL PROTECTED] wrote:
snip
 Any thoughts as to why
  some of the matches are getting missed?
snip

Not off hand.  I will extract your code and do some tests.  Can you
send me your data or is it sensitive?

snip
  Just out of beginner curiosity, why did you suggest I use the 3
  argument filehandle instead of:
  open(EX, exclude1.txt) or die $!
snip

Because the three argument version of open is safer.  It doesn't
matter in the code you wrote because you used a literal string, but if
you say

open FH, $file or die could not open $file: $!;

expecting FH to be a read filehandle and $file contains the filename
important, you will wind up with a write filehandle.  Specifying
the type of filehandle you want separately from the file is an
important safety feature.  Using the old version of open is a bad
habit you should not develop.  You should know it exists (like many of
the other bad habits left over from earlier versions of the Language)
in case you run into code that uses it, but you shouldn't use it
yourself.  I would also strongly recommend using lexical filehandles
instead of the old bareword style for similar reasons.

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




problem using backslash on brackets in regular expressions

2008-04-22 Thread Daniel McClory

Hi,

I have files which contain sentences, where some lines have extra  
information inside brackets and parentheses.  I would like to delete  
everything contained within brackets or parentheses, including the  
brackets.  I know that I am supposed to use the backslash to turn off  
the metacharacter properties of brackets and parentheses in a regular  
expression.


I am trying to use the s/// operator to remove it, by doing this:

while(INPUT)
  {
$_ =~ s/\[*\]//;
$_ =~ s/\(*\)//;
print $_;
  }

so if the input is:
*MOT:   I'm gonna first [//] first I wanna use em all up .

then the output I'd like to get is:
*MOT:   I'm gonna first first I wanna use em all up .

but instead what I get is:
*MOT:   I'm gonna first [// first I wanna use em all up .

It only deletes the last piece, the ] bracket.  How can I erase the  
whole thing?


Thanks.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




RE: problem using backslash on brackets in regular expressions

2008-04-22 Thread Wagner, David --- Senior Programmer Analyst --- WGO
 -Original Message-
 From: Daniel McClory [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, April 22, 2008 16:06
 To: beginners@perl.org
 Subject: problem using backslash on brackets in regular expressions
 
 Hi,
 
 I have files which contain sentences, where some lines have extra  
 information inside brackets and parentheses.  I would like to delete  
 everything contained within brackets or parentheses, including the  
 brackets.  I know that I am supposed to use the backslash to 
 turn off  
 the metacharacter properties of brackets and parentheses in a 
 regular  
 expression.
 
 I am trying to use the s/// operator to remove it, by doing this:
 
 while(INPUT)
{
  $_ =~ s/\[*\]//;
What you are saying here is the first bracket can have zero or
more occurances followed by a ], which is what you are seeing in your
output(ie, the / before the ] is not a [ okay, then ] and replace the ]
with nothing.

 s/\[[^\]]+]//;
  $_ =~ s/\(*\)//;
 s/\([^\)]+)//;

No reason to do the $_ =~ as by default that is what is going to
be done anyway.

Wags ;)

  print $_;
}
 
 so if the input is:
 *MOT:   I'm gonna first [//] first I wanna use em all up .
 
 then the output I'd like to get is:
 *MOT:   I'm gonna first first I wanna use em all up .
 
 but instead what I get is:
 *MOT:   I'm gonna first [// first I wanna use em all up .
 
 It only deletes the last piece, the ] bracket.  How can I erase the  
 whole thing?
 
 Thanks.
 
 -- 
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/
 
 
 

**
This message contains information that is confidential and proprietary to FedEx 
Freight or its affiliates.  It is intended only for the recipient named and for 
the express  purpose(s) described therein.  Any other use is prohibited.
**


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: problem using backslash on brackets in regular expressions

2008-04-22 Thread Dr.Ruud
Daniel McClory schreef:

 while(INPUT)
{
  $_ =~ s/\[*\]//;
  $_ =~ s/\(*\)//;
  print $_;
}

 while ( INPUT ) {
 s/\[.*?\]//;
 s/\(.*?\)//;
 print;
 }

-- 
Affijn, Ruud

Gewoon is een tijger.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




RE: problem using backslash on brackets in regular expressions

2008-04-22 Thread adarsh.s85
Hello,
(snip)
I am trying to use the s/// operator to remove it, by doing this:
 
while(INPUT)
   {
 $_ =~ s/\[*\]//;
 $_ =~ s/\(*\)//;
 print $_;
   }
(snip)
 
The method used is incorrect. 
$_ =~ s/\[*\]//; --- This says that the search is for opening
parenthesis (zero or more occurrences of it, since a '*' follows '[')
followed by a close parenthesis.  
 
(snip)
so if the input is:
*MOT:   I'm gonna first [//] first I wanna use em all up .
(snip)
 
In this input, considering ur search pattern, close parenthesis ']' is
found. Before that, the character is '/which is not '[' (zero occurrence
of it). Thus, it's a valid search. So only ']' is removed.
 
The correct search pattern ought to be: 
$_ =~ s/\[.*\]//;
--- This shall search for an opening parenthesis followed by zero or
more characters (.*) followed by a close parenthesis. So if the input
is
 
*MOT:   I'm gonna first [//] first I wanna use em all up .
then output will be:
*MOT:   I'm gonna first first I wanna use em all up .
 
[!]HOWEVER if there are more than one pair of '[]' then another problem
occurs.
Eg:
Input: I'm gonna first [//] second [//] third I wanna use em all up.
Output: I'm gonna first third I wanna use em all up.
 
* as u can see the 2nd 'first' is missing. This is because of
the greediness of Perl which tries to match as much of the search
pattern as possible.
 
To solve this, we use the '?' operator. Thus, the correct search pattern
is
$_ =~ s/\[.*?\]//;
This will give the output: I'm gonna first second [//] third I wanna use
em all up.
 
To remove all such occurrences, use the global search:
$_ =~ s/\[.*?\]//g;
Use a similar approach for '()'.
 
Regards,
Adarsh
 
 
 


Can regular expressions be used as subroutine arguments?

2008-03-08 Thread R (Chandra) Chandrasekhar

Hello Folks,

I need to make a substitution in place for each element of an array, and I need 
to do this to two arrays. Currently the relevant code fragment (without pragmas) is:



foreach my $element (@cddb_artist)
{
$element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/;
}
foreach my $element (@cddb_track)
{
$element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/;
}


The above fragment seems to be a good candidate for generalizing into a 
subroutine. I have two questions regarding this:


1. Can this particular regular expression, involving as it does, matched 
sub-pattern variables like $1, be used as a subroutine argument, and if so, how?


2. Can arbitrary regular expressions, including /PATTERN/REPLACEMENT/ versions 
for substitutions, be used as subroutine arguments, and if so, how?


TIA.

Chandra

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Can regular expressions be used as subroutine arguments?

2008-03-08 Thread Dr.Ruud
R (Chandra) Chandrasekhar schreef:

 
 foreach my $element (@cddb_artist)
  {
  $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/;
  }
 foreach my $element (@cddb_track)
  {
  $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/;
  }
 

You can write all that as this single line:

s/^.*?([0-9,a-f]{8}):.*$/$1/ for @cddb_artist, @cddb_track;

Do you really want the comma inside the character class?


 1. Can this particular regular expression, involving as it does,
 matched sub-pattern variables like $1, be used as a subroutine
 argument, and if so, how? 

Only the first part of the substitution is a regular expression. 

my $re_hex8 = qr/[[:xdigit:]]{8}/;
s/^.*?($re_hex8):.*$/$1/ for @cddb_artist, @cddb_track;

Alternative:

perl -wle'
  my @cddb_artist = (xyz 12345678: abc);
  my @cddb_track  = (abc fedcba09: xyz);
  my $re_hex8 = qr/[[:xdigit:]]{8}/;
  ($_) = m/($re_hex8)(?=:)/ for @cddb_artist, @cddb_track;
  print for @cddb_artist, @cddb_track;
'
12345678
fedcba09


 2. Can arbitrary regular expressions, including /PATTERN/REPLACEMENT/
 versions 
 for substitutions, be used as subroutine arguments, and if so, how?

Store the parts in variables. 

-- 
Affijn, Ruud

Gewoon is een tijger.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Can regular expressions be used as subroutine arguments?

2008-03-08 Thread Chas. Owens
On Sat, Mar 8, 2008 at 9:59 AM, Dr.Ruud [EMAIL PROTECTED] wrote:
snip
   2. Can arbitrary regular expressions, including /PATTERN/REPLACEMENT/
   versions
   for substitutions, be used as subroutine arguments, and if so, how?

  Store the parts in variables.
snip

Specifically, use the qr// operator to create precompiled regexes that
can be stored in a scalar:

my $regex = qr/^.*?([0-9,a-f]{8}):.*$/;

$string =~ s/$regex/$1/;

-- 
Chas. Owens
wonkden.net
The most important skill a programmer can have is the ability to read.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Can regular expressions be used as subroutine arguments?

2008-03-08 Thread John W. Krahn

R (Chandra) Chandrasekhar wrote:

Hello Folks,

I need to make a substitution in place for each element of an array, and 
I need to do this to two arrays. Currently the relevant code fragment 
(without pragmas) is:



foreach my $element (@cddb_artist)
{
$element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/;
}
foreach my $element (@cddb_track)
{
$element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/;
}



As Dr.Ruud said that could be written as:

s/^.*?([0-9,a-f]{8}):.*$/$1/ for @cddb_artist, @cddb_track;

But you don't really need the anchors so:

s/.*?([0-9,a-f]{8}):.*/$1/ for @cddb_artist, @cddb_track;

And if you are not worried about preserving the newline at the end you 
could do it like this:


($_) = /([0-9,a-f]{8}):/ for @cddb_artist, @cddb_track;


The above fragment seems to be a good candidate for generalizing into a 
subroutine.


sub my_sub_something {
my $regex = shift;
( $_ ) = /$regex/ for @_;
}

And call it like this:

my_sub_something( qr/([0-9,a-f]{8}):/, @cddb_artist, @cddb_track );


But it would probably be simpler just to use the for statement above.




John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order.-- Larry Wall

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: entering regular expressions from the keyboard

2007-08-24 Thread Dr.Ruud
Jay Savage schreef:
 Dr.Ruud:
 Christopher Spears:

 #print $regexp;
 
 Make that
   print qr/$regexp/;
 
 Not sure where your headed with this.

My headed? :)

It was an alternative for the commented debug line. 


 First, OP wants to print the input back to the user.

And I presume that it is more a developer directed print statement.

-- 
Affijn, Ruud

Gewoon is een tijger.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: entering regular expressions from the keyboard

2007-08-23 Thread Jay Savage
On 8/21/07, Dr.Ruud [EMAIL PROTECTED] wrote:
 Jeff Pang schreef:
  Christopher Spears:

  print Enter regular expression: ;
  chomp(my $regexp = STDIN);
 
  $regexp = quotemeta($regexp);

 Since it specifically asks for a regular expression, I would definitely
 not do quotemeta().



Exactly. quotemeta() defeats the whole purpose here. We *want* the
user to be able to input metacharacters for the match.

  #print $regexp;

 Make that

   print qr/$regexp/;


Not sure where your headed with this. First, OP wants to print the
input back to the user. it makes sense to do this unmodified, for the
most part. Also, qr// doesn't modify the variable, it returns the
compiled expression, which is just being thrown away after the print.
That means the regex is actually being compiled twice. It probably
doesn't, though, make sense to compile the regex before entering the
loop, so perhaps something like:

 chomp(my $regexp = STDIN);
 print $regexp, \n;
 $regexp = qr/$regexp/;
 ...

One additional note to Chris:

In any case, '$_ =~ \$regxep' is almost certainly not what you're
looking for. Since $regexp is a simple scalar and not a reference,
your current code is trying to match against something like
/SCALAR(0x18231cc)/.


HTH,

--jay
--
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.downloadsquad.com  http://www.engatiki.org

values of β will give rise to dom!


Re: entering regular expressions from the keyboard

2007-08-23 Thread Jay Savage
On 8/23/07, Jay Savage [EMAIL PROTECTED] wrote:
 That means the regex is actually being compiled twice. It probably
 doesn't, though, make sense to compile the regex before entering the
 loop, so perhaps something like:

Make that *does* make sense.

-- j
--
This email and attachment(s): [  ] blogable; [ x ] ask first; [  ]
private and confidential

daggerquill [at] gmail [dot] com
http://www.tuaw.com  http://www.downloadsquad.com  http://www.engatiki.org

values of β will give rise to dom!


Re: entering regular expressions from the keyboard

2007-08-21 Thread Paul Lalli
On Aug 20, 11:28 pm, [EMAIL PROTECTED] (Christopher Spears) wrote:
 I'm working on the second exercise of the second
 chapter.  I'm supposed to write a program that asks
 the user to type a regular expression.  The program
 then uses the regular expression to try to find a
 match in the directory that I hard coded into the
 program.  Here is what I have so far:

 #!/usr/bin/perl -w
 use strict;

 print Enter regular expression: ;

 chomp(my $regexp = STDIN);
 #print $regexp;

 opendir(CPPDIR,/home/io/chris_cpp/) or die Could
 not open directory: $!;
 my @allfiles = readdir CPPDIR;
 closedir CPPDIR;

 foreach $_(@allfiles){
 if ($_ =~ \$regexp){
 print $_.\n;
 }

 }

 My problem lies with the matching part.  I'm not sure
 how to use the string that I stored in the $regexp
 variable as a regular expression.  Any hints?

Shawn and Jeff each gave you half of the answer.

Jeff pointed out that when your pattern is contained in a variable,
you should use quotemeta().  This will backslash any metacharacters
the variable might contain, so that they match themselves rather than
being special in the pattern match (so any periods match periods,
rather than any character, plus signs match plus signs, rather than
meaning one or more of the previous, etc):
$regexp = quotemeta($regexp)

And Shawn pointed out that the proper syntax for a pattern match is:
$_ =~ /$regexp/

Those two lines should be combined:
$regexp = quotemeta($regexp);
foreach $_(@allfiles){
 if ($_ =~ /$regexp/){
 print $_.\n;
 }
}


Or, instead of calling quotemeta() explicitly, you can use the \Q and
\E escape sequences to do the backquoting within the pattern match
itself:

foreach $_ (@allfiles) {
 if ($_ =~ /\Q$regexp\E/) {
 print $_ . \n;
 }
}


Also note that an experienced Perl programmer would either eliminate
the $_ whenever it's not needed:
foreach (@allfiles) {
   if (/\Q$regexp\E/) {
   print $_\n;
   }
}

Or would use a better variable name as the loop iterator:
foreach my $file (@allfiles) {
if ($file =~ /\Q$regexp\E/) {
print $file\n;
}
}


Hope that helps,
Paul Lalli


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: entering regular expressions from the keyboard

2007-08-21 Thread Dr.Ruud
Jeff Pang schreef:
 Christopher Spears:

 print Enter regular expression: ;
 chomp(my $regexp = STDIN);

 $regexp = quotemeta($regexp);

Since it specifically asks for a regular expression, I would definitely
not do quotemeta().


 #print $regexp;

Make that

  print qr/$regexp/;

-- 
Affijn, Ruud

Gewoon is een tijger.


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




entering regular expressions from the keyboard

2007-08-20 Thread Christopher Spears
Hi!

I'm trying to get back into Perl again by working
through Intermediate Perl.  Unfortunately, the Perl
part of my brain has atrophied!  

I'm working on the second exercise of the second
chapter.  I'm supposed to write a program that asks
the user to type a regular expression.  The program
then uses the regular expression to try to find a
match in the directory that I hard coded into the
program.  Here is what I have so far:

#!/usr/bin/perl -w
use strict;

print Enter regular expression: ;

chomp(my $regexp = STDIN);
#print $regexp;

opendir(CPPDIR,/home/io/chris_cpp/) or die Could
not open directory: $!;
my @allfiles = readdir CPPDIR;
closedir CPPDIR;

foreach $_(@allfiles){
if ($_ =~ \$regexp){
print $_.\n;
}
}

My problem lies with the matching part.  I'm not sure
how to use the string that I stored in the $regexp
variable as a regular expression.  Any hints?



I'm the last person to pretend that I'm a radio.  I'd rather go out and be a 
color television set.
-David Bowie

Who dares wins
-British military motto

I generally know what I'm doing.
-Buster Keaton

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: entering regular expressions from the keyboard

2007-08-20 Thread Jeff Pang


-Original Message-
From: Christopher Spears [EMAIL PROTECTED]
Sent: Aug 21, 2007 11:28 AM
To: beginners@perl.org
Subject: entering regular expressions from the keyboard

Hi!

I'm trying to get back into Perl again by working
through Intermediate Perl.  Unfortunately, the Perl
part of my brain has atrophied!  

I'm working on the second exercise of the second
chapter.  I'm supposed to write a program that asks
the user to type a regular expression.  The program
then uses the regular expression to try to find a
match in the directory that I hard coded into the
program.  Here is what I have so far:

#!/usr/bin/perl -w
use strict;

print Enter regular expression: ;

chomp(my $regexp = STDIN);
#print $regexp;

$regexp = quotemeta($regexp);

See also perldoc -f quotemeta.


--
Jeff Pang - [EMAIL PROTECTED]
http://home.arcor.de/jeffpang/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: entering regular expressions from the keyboard

2007-08-20 Thread Mr. Shawn H. Corey

Christopher Spears wrote:

Hi!

I'm trying to get back into Perl again by working
through Intermediate Perl.  Unfortunately, the Perl
part of my brain has atrophied!  


I'm working on the second exercise of the second
chapter.  I'm supposed to write a program that asks
the user to type a regular expression.  The program
then uses the regular expression to try to find a
match in the directory that I hard coded into the
program.  Here is what I have so far:

#!/usr/bin/perl -w
use strict;

print Enter regular expression: ;

chomp(my $regexp = STDIN);
#print $regexp;



# from here


opendir(CPPDIR,/home/io/chris_cpp/) or die Could
not open directory: $!;
my @allfiles = readdir CPPDIR;
closedir CPPDIR;


# try:
my @allfiles = glob( '*' );



foreach $_(@allfiles){
if ($_ =~ \$regexp){


# bad regular expression. try:
 if( /$regexp/ ){


print $_.\n;
}
}

My problem lies with the matching part.  I'm not sure
how to use the string that I stored in the $regexp
variable as a regular expression.  Any hints?



I'm the last person to pretend that I'm a radio.  I'd rather go out and be a color 
television set.
-David Bowie

Who dares wins
-British military motto


Of course we'll win; we're British
- another British military motto



I generally know what I'm doing.
-Buster Keaton



--
Just my 0.0002 million dollars worth,
 Shawn

For the things we have to learn before we can do them, we learn by doing them.
 Aristotle

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




regular expressions issue

2007-06-27 Thread Amichai Teumim

I created a file called data.txt which contains a bunch of junk, including
some IPs. I want $line to be  stored in
$iphttp://www.tek-tips.com/viewthread.cfm?qid=1382614page=1#
.

It works, except for the regular expressions which should find only IPs. If
I use the regular expression with the grep command in terminal I get only
the IPs. Here in Perl I don't get any output.

#!/usr/bin/perl

@input = `cat ~/ip.txt`;

foreach $line (@input){
 if($line =~
/[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}/){
 $ip = $line;
 print $ip;
 }
}

Any ideas? It's breaking my head.

Amichai


Re: regular expressions issue

2007-06-27 Thread Tom Phoenix

On 6/27/07, Amichai Teumim [EMAIL PROTECTED] wrote:


If I use the regular expression with the grep command in
terminal I get only the IPs. Here in Perl I don't get any output.


The grep command uses grep's regular expressions, but Perl uses Perl's
regular expressions. Alas, everybody's regular expressions are
different. Perl's are usually better, of course. But the syntax is
always different.


@input = `cat ~/ip.txt`;


I hope that this is _supposed_ to be a quick-and-dirty program. This
works, although it's slower than using a filehandle would be, and it
probably uses more memory. Although if you're using the tilde to open
a file in the user's home directory, well, that's maybe the best way
to do it.


/[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}/){


I think in Perl that pattern might be this:

 /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/

But do you really want to match 999.999.999.999? You don't have to.
Have you heard of Regexp::Common? Regexp::Common::net seems to have
what you want.

   /^$RE{net}{IPv4}$/

   http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common.pm
   http://search.cpan.org/dist/Regexp-Common/lib/Regexp/Common/net.pm

Even if you don't want to install the module to get just one pattern,
you could use the pattern that it supplies, which is sure to be at
least as good as anything you would write on your own.

Good luck with it!

--Tom Phoenix
Stonehenge Perl Training

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: regular expressions issue

2007-06-27 Thread Rob Dixon

Amichai Teumim wrote:


I created a file called data.txt which contains a bunch of junk, including
some IPs. I want $line to be  stored in
$iphttp://www.tek-tips.com/viewthread.cfm?qid=1382614page=1#
.

It works, except for the regular expressions which should find only IPs. If
I use the regular expression with the grep command in terminal I get only
the IPs. Here in Perl I don't get any output.

#!/usr/bin/perl

@input = `cat ~/ip.txt`;

foreach $line (@input){
 if($line =~ /[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}/){ 


 $ip = $line;
 print $ip;
 }
}

Any ideas? It's breaking my head.


Perl doesn't require the braces to be escaped. As it is the regex is matching 
literal
braces in the string which don't exist. Try this:

 if ($line =~ 
/[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/) {
   :
 }

and, by the way, [0-9] is more concise than [[:digit:]].

HTH,

Rob

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Using regular expressions with delimitaters

2007-04-12 Thread yaron
Hi

The 8.1.8 =~ /[\d $versao \s]/ will always return true because the square 
parenthesis ([]) matches the string against one of the chars inside. In this 
case the \d (digit) matches because you have a digit inside.

In your code you wrote  8.1.8 =~ /$version/. This takes the $version a treat 
it as a regular expression.
I don't think that this is what you want. You actually want something like 
$version =~ /8\.1\.8/.

Yaron Kahanovitch
- Original Message -
From: Rodrigo Tavares [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, April 11, 2007 4:30:58 PM (GMT+0200) Auto-Detected
Subject: Using regular expressions with delimitaters

Hello,

I need to use the delimiter   , (one blank space).
I read perdoc, i try to use this : 

if ( 8.1.8 =~ /[\d $versao \s]/)

But the expression is always true.
Where is the error ?

my code :

#!/usr/bin/perl
$version=`/usr/local/pgsql/bin/pg_ctl --version`;
print $version;

if ( 8.1.8 =~ /$version/)
 {
  print $version\n;
 }
else
 {
  print Wrong version !\n;
 }

Output, about program:

pg_ctl (PostgreSQL) 8.1.8
Wrong version

Best regards,

Rodrigo Faria

__
Fale com seus amigos  de graça com o novo Yahoo! Messenger 
http://br.messenger.yahoo.com/ 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Using regular expressions with delimitaters

2007-04-11 Thread Rodrigo Tavares
Hello,

I need to use the delimiter   , (one blank space).
I read perdoc, i try to use this : 

if ( 8.1.8 =~ /[\d $versao \s]/)

But the expression is always true.
Where is the error ?

my code :

#!/usr/bin/perl
$version=`/usr/local/pgsql/bin/pg_ctl --version`;
print $version;

if ( 8.1.8 =~ /$version/)
 {
  print $version\n;
 }
else
 {
  print Wrong version !\n;
 }

Output, about program:

pg_ctl (PostgreSQL) 8.1.8
Wrong version

Best regards,

Rodrigo Faria

__
Fale com seus amigos  de graça com o novo Yahoo! Messenger 
http://br.messenger.yahoo.com/ 

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




RE: Using regular expressions with delimitaters

2007-04-11 Thread Moon, John
From: Rodrigo Tavares [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, April 11, 2007 9:31 AM
To: beginners@perl.org
Subject: Using regular expressions with delimitaters

Hello,

I need to use the delimiter   , (one blank space).
I read perdoc, i try to use this : 

if ( 8.1.8 =~ /[\d $versao \s]/)

But the expression is always true.
Where is the error ?

my code :

#!/usr/bin/perl
$version=`/usr/local/pgsql/bin/pg_ctl --version`;
print $version;

if ( 8.1.8 =~ /$version/)
 {
  print $version\n;
 }
else
 {
  print Wrong version !\n;
 }

Output, about program:

pg_ctl (PostgreSQL) 8.1.8
Wrong version

Best regards,

Rodrigo Faria

[] 
Maybe you are making this too hard...
perl -e '$date=`date`; print Is Apr\n if $date =~ /Apr/;'
As an example... 

Hope this helps...
[] jwm

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




Re: Using regular expressions with delimitaters

2007-04-11 Thread Chas Owens

On 4/11/07, Rodrigo Tavares [EMAIL PROTECTED] wrote:
snip

if ( 8.1.8 =~ /$version/)

snip

You are using the operators incorrectly.  It should look like this:

if ($version =~ /8\.1\.8/)

The form is variable binding_operator regex.  Note that the periods
need to be escaped otherwise they will be interpreted as any-character
by the regex.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/




grouppin in the regular expressions

2006-10-13 Thread I . B .

Hi nice people,


how to specify using regular expressions: match everything but string (xxx)

i would do this :

$line =~ /[^(xxx)]+/;

but, as it was mentioned before () inside character class is not working.
what is solution here?

thank you!

~i


RE: grouppin in the regular expressions

2006-10-13 Thread Wagner, David --- Senior Programmer Analyst --- WGO
use !~ vs =~ which is if not so
if ( $line !~ /\(xxx\)/ ) {
# does not contain (xxx)
 }else {
# does contain
   } 

  If you have any problems or questions, please let me know.

 Thanks.

  Wags ;)
David R Wagner
Senior Programmer Analyst
FedEx Freight
1.408.323.4225x2224 TEL
1.408.323.4449   FAX
http://fedex.com/us 

-Original Message-
From: I.B. [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 13, 2006 12:03
To: beginners@perl.org
Subject: grouppin in the regular expressions

Hi nice people,


how to specify using regular expressions: match everything but string
(xxx)

i would do this :

$line =~ /[^(xxx)]+/;

but, as it was mentioned before () inside character class is not
working.
what is solution here?

thank you!

~i

**
This message contains information that is confidential and proprietary to FedEx 
Freight or its affiliates.  It is intended only for the recipient named and for 
the express  purpose(s) described therein.  Any other use is prohibited.
**


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: grouppin in the regular expressions

2006-10-13 Thread John W. Krahn
I.B. wrote:
 Hi nice people,

Hello,

 how to specify using regular expressions: match everything but string (xxx)
 
 i would do this :
 
 $line =~ /[^(xxx)]+/;
 
 but, as it was mentioned before () inside character class is not working.
 what is solution here?

Perhaps you want:

$line !~ /xxx/;



John
-- 
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.   -- Larry Wall

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: grouppin in the regular expressions

2006-10-13 Thread I . B .

sorry, I didn't fraze my question correctly.


example :
$line=abcxabcxxabcxxxabc;

how to match everything beofre xxx but not xxx itself?
the answer i got is to use lookaheads:

my $line = abcxxabcxxxabc;
if ($line =~ m{(.*?(?:(?!xxx).))xxx}){
print matched: $1\n;
}
else{
print failed\n;
}


very cool,
thanx everyone
~i


On 10/13/06, John W. Krahn [EMAIL PROTECTED] wrote:

 I.B. wrote:
  Hi nice people,

 Hello,

  how to specify using regular expressions: match everything but string
 (xxx)
 
  i would do this :
 
  $line =~ /[^(xxx)]+/;
 
  but, as it was mentioned before () inside character class is not
 working.
  what is solution here?

 Perhaps you want:

 $line !~ /xxx/;



 John
 --
 Perl isn't a toolbox, but a small machine shop where you can
 special-order
 certain sorts of tools at low cost and in short order.   -- Larry
 Wall

 --
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 http://learn.perl.org/ http://learn.perl.org/first-response 






Re: grouppin in the regular expressions

2006-10-13 Thread John W. Krahn
I.B. wrote:
 sorry, I didn't fraze my question correctly.
  ^
  phrase

 example :
 $line=abcxabcxxabcxxxabc;

 how to match everything beofre xxx but not xxx itself?
 the answer i got is to use lookaheads:

 my $line = abcxxabcxxxabc;
 if ($line =~ m{(.*?(?:(?!xxx).))xxx}){
 print matched: $1\n;
 }
 else{
 print failed\n;
 }

Your expression is too complicated:

if ( $line =~ /(.*?)xxx/ ) {

would accomplish the same thing.

$ perl -le'$_ = abcxabcxxabcxxxabc; print $1 if /(.*?(?:(?!xxx).))xxx/'
abcxabcxxabc
$ perl -le'$_ = abcxabcxxabcxxxabc; print $1 if /(.*?)xxx/'
abcxabcxxabc




John
-- 
Perl isn't a toolbox, but a small machine shop where you can special-order
certain sorts of tools at low cost and in short order.   -- Larry Wall

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Regular expressions

2006-04-26 Thread badrinath chitrala
HI

Sombody help me if i give ([a-z]+)(.*)([a-z]+) as input string output i get
is

$1 is 'silly'
$2 is 'silly'
$3 is 'silly'

this is wrong  according to be book i refer
please somone clarify me code i used is as below

use strict;
use warnings;

$_ = '1: A silly sentence (495,a) *BUT* one which will be useful. (3)';

print Enter a regular expression:;
my $pattern = STDIN;
chomp($pattern);
if(/$pattern/){
print The text matches the pattern '$pattern'.\n;
print \$1 is '$1'\n if defined $1;
print \$2 is '$1'\n if defined $2;
print \$3 is '$1'\n if defined $3;
print \$4 is '$1'\n if defined $4;
print \$5 is '$1'\n if defined $5;
}else{
print '$pattern' was not found.\n;
}


Re: Regular expressions

2006-04-26 Thread Косов Евгений

Sombody help me if i give ([a-z]+)(.*)([a-z]+) as input string output i get
is

$1 is 'silly'
$2 is 'silly'
$3 is 'silly'

this is wrong  according to be book i refer
please somone clarify me code i used is as below


This is correct. first word that matches  ([a-z]+) is 'silly'.



print \$1 is '$1'\n if defined $1;
print \$2 is '$1'\n if defined $2;
print \$3 is '$1'\n if defined $3;
print \$4 is '$1'\n if defined $4;
print \$5 is '$1'\n if defined $5;


maybe you ment somthing like this:

print \$1 is '$1'\n if defined $1;
print \$2 is '$2'\n if defined $2;
print \$3 is '$3'\n if defined $3;
print \$4 is '$4'\n if defined $4;
print \$5 is '$5'\n if defined $5;


smime.p7s
Description: S/MIME Cryptographic Signature


regular expressions

2006-04-21 Thread Bowen, Bruce
In perldoc under this topic s is listed as Treat string as a single line and 
m as Treat string as multiples lines. 

If I have text that has varying spaces at the begging of each line, and I use 

$string =~ s/^\s+//; It will remove the spaces from in from of the first line 
but not any other lines.  That is clear to me.

However, it does not clear all of the leading spaces from all of the lines if I 
use

$string =~ m/^\s+//;

In fact I'm getting error message compile error.  What am I missing here?

Thanks,
Bruce Bowen



Re: regular expressions

2006-04-21 Thread Xavier Noria


On Apr 21, 2006, at 16:10, Bowen, Bruce wrote:

In perldoc under this topic s is listed as Treat string as a  
single line and m as Treat string as multiples lines.


If I have text that has varying spaces at the begging of each line,  
and I use


$string =~ s/^\s+//; It will remove the spaces from in from of the  
first line but not any other lines.  That is clear to me.


However, it does not clear all of the leading spaces from all of  
the lines if I use


$string =~ m/^\s+//;


Modifiers go to the end:

  $string =~ s/^\s+//m;

-- fxn


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: regular expressions

2006-04-21 Thread John W. Krahn
Bowen, Bruce wrote:
 In perldoc under this topic s is listed as Treat string as a single
 line and m as Treat string as multiples lines. 
 
 If I have text that has varying spaces at the begging of each line,
 and I use 
 
 $string =~ s/^\s+//; It will remove the spaces from in from of the
 first line but not any other lines.  That is clear to me.
 
 However, it does not clear all of the leading spaces from all of the
 lines if I use
 
 $string =~ m/^\s+//;
 
 In fact I'm getting error message compile error.  What am I missing
 here?

perldoc perlop
[snip]
   m/PATTERN/cgimosx
   ^^ ^
[snip]
   s/PATTERN/REPLACEMENT/egimosx
   ^^ ^

The /s option affects the behaviour of the . meta-character.  The /m option
affects the behaviour of the ^ and $ meta-characters.

Assuming you have the string:

my $string = one\n   two\n   three\nfour\n   five\n;

$string =~ s/.+//;

Will produce the string:

\n   two\n   three\nfour\n   five\n

And:

$string =~ s/.+//g;

Will produce the string:

\n\n\n\n\n

While:

$string =~ s/.+//s;

Will produce the string:




$string =~ s/^\s+//;

Will produce the string:

one\n   two\n   three\nfour\n   five\n

(It isn't modified.)

While:

$string =~ s/^\s+//m;

Will produce the string:

one\ntwo\n   three\nfour\n   five\n

(Only the first match is changed.)

And:

$string =~ s/^\s+//mg;

Will produce the string:

one\ntwo\nthree\nfour\nfive\n



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




  1   2   3   4   >