Re: Using regular expressions to populate a variable?
On Sun, 18 Jan 2015 11:49:11 -0500 Mike ekimduna...@gmail.com wrote: Hey everyone, I'm trying to find information on how I can use regular expressions to populate a variable. I want to pull text between one set of characters and another set of characters and use that to populate my variable. Can anyone point me in the right direction? Thanks. Use parentheses to select the part of the match you want in your variables: my ( $var1, $var2, @rest ) =~ /some characters(populates $var1)more characters(populates $var2) more (populates @rest) more (populates @rest) /; See `perdoc perlre` and search for /Capture groups/ http://perldoc.perl.org/perlre.html#Capture-groups For more info: perldoc perlretuthttp://perldoc.perl.org/perlretut.html perldoc perlrequick http://perldoc.perl.org/perlrequick.html perldoc perlre http://perldoc.perl.org/perlre.html -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Using regular expressions to populate a variable?
On Jan 18, 2015, at 9:03 AM, Mike ekimduna...@gmail.com wrote: I was able to find match extraction in the perldoc. Here is a snippet of what I have. my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ ); print $insult\n; But $insult is being populated with: 1 It should be populated with text. Can anyone tell me what I'm doing wrong here? Your error is assigning the return value of the regular expression in a scalar context. In scalar context, a regular expression returns true or false indicating a match (or not). In array context, however, it returns the captured subexpressions as a list. Try forcing the assignment into array context: my( $insult ) = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ ); You can also use the capture variables $1, $2, $3, etc., which will contain the captured subexpressions: my $insult; if( $mech-text =~ m/Insulter\ (.*)\ Taken/ ) ) { $insult = $1; } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Using regular expressions to populate a variable?
On Sun, Jan 18, 2015 at 9:28 AM, Jim Gibson jimsgib...@gmail.com wrote: On Jan 18, 2015, at 9:03 AM, Mike ekimduna...@gmail.com wrote: I was able to find match extraction in the perldoc. Here is a snippet of what I have. my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ ); print $insult\n; But $insult is being populated with: 1 It should be populated with text. Can anyone tell me what I'm doing wrong here? Your error is assigning the return value of the regular expression in a scalar context. In scalar context, a regular expression returns true or false indicating a match (or not). In array context, however, it returns the captured subexpressions as a list. Try forcing the assignment into array context: my( $insult ) = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ ); ... For more info: see perldoc perldata. There a full discussion of list vs scalar context . -- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Using regular expressions to populate a variable?
I was able to find match extraction in the perldoc. Here is a snippet of what I have. my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ ); print $insult\n; But $insult is being populated with: 1 It should be populated with text. Can anyone tell me what I'm doing wrong here? Thanks. On 1/18/15 11:49 AM, Mike wrote: Hey everyone, I'm trying to find information on how I can use regular expressions to populate a variable. I want to pull text between one set of characters and another set of characters and use that to populate my variable. Can anyone point me in the right direction? Thanks. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Using regular expressions to populate a variable?
Hey everyone, I'm trying to find information on how I can use regular expressions to populate a variable. I want to pull text between one set of characters and another set of characters and use that to populate my variable. Can anyone point me in the right direction? Thanks. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Using regular expressions to populate a variable?
Thanks. This worked. On 1/18/15 12:28 PM, Jim Gibson wrote: On Jan 18, 2015, at 9:03 AM, Mike ekimduna...@gmail.com wrote: I was able to find match extraction in the perldoc. Here is a snippet of what I have. my $insult = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ ); print $insult\n; But $insult is being populated with: 1 It should be populated with text. Can anyone tell me what I'm doing wrong here? Your error is assigning the return value of the regular expression in a scalar context. In scalar context, a regular expression returns true or false indicating a match (or not). In array context, however, it returns the captured subexpressions as a list. Try forcing the assignment into array context: my( $insult ) = ( $mech-text =~ m/Insulter\ (.*)\ Taken/ ); You can also use the capture variables $1, $2, $3, etc., which will contain the captured subexpressions: my $insult; if( $mech-text =~ m/Insulter\ (.*)\ Taken/ ) ) { $insult = $1; } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Help with regular expressions
Hasn't someone already fixed this problem? If there isn't a CPAN module to perform standardized bibliographic reference formatting/parsing. I haven't looked at CPAN; did either of you? If a CPAN module doesn't exist, one should! What standard? Kalthoff K (2001) Analysis of biological development. McGraw-Hill, NY. Or Manning JT, Barley L, Walton J, Lewis-Jones DI, Trivers RL, Singh D, Thornhill R, Rohde P, Bereczkei T, Henzi P, Soler M, Szwed A. (2000) The 2nd:4th digit ratio, sexual dimorphism, population differences, and reproductive success. evidence for sexually antagonistic genes? Evol Hum Behav. 21(3):163-183. Or Berger, M., Lawrence, M., Demichelis, F., Drier, Y., Cibulskis, K., Sivachenko, A., Sboner, A., Esgueva, R., Pflueger, D., Sougnez, C., Onofrio, R., Carter, S., Park, K., Habegger, L., Ambrogio, L., Fennell, T., Parkin, M., Saksena, G., Voet, D., Ramos, A., Pugh, T., Wilkinson, J., Fisher, S., Winckler, W., Mahan, S., Ardlie, K., Baldwin, J., Simons, J., Kitabayashi, N., MacDonald, T., Kantoff, P., Chin, L., Gabriel, S., Gerstein, M., Golub, T., Meyerson, M., Tewari, A., Lander, E., Getz, G., Rubin, M., Garraway, L. (2011). The genomic complexity of primary human prostate cancer Nature, 470 (7333), 214-220 DOI: 10.1038/nature09744 ? If there's a standard, then sure, someone has probably put that into CPAN. The problem is that I don't think that there is, though I'd be glad to be proven wrong. What I want to be able to do eventually is parse each name separately and associate that with the title. I am not sure how yet, but I haven't even got there. That can range from pretty simple to fairly complex, depending on how much you want to squeeze out of that relationship. If you just want to be able to say Morgan, M.J wrote an article for X journal, titled Y, then that's just a hash (of hashes), and you need to look no further than this mail. But if you also want to say, Journal X has these authors. One of them is Wilson, C.E, who co-wrote article Y, where Crim, L.W. was also a collaborator, and whose primary author is Morgan, M.J., then hashes will probably not cut it anymore (a cyclical hash of hashes might do, but that's pretty tough to handle, and _very_ rough on the eyes). You'll probably want an object model there, or some database interaction. But we are getting ahead of ourselves for now :) I figured that eventually it would be easier to somehow pass the results into mySQL tables, but I left that bridge to be crossed once I get there. It works fine for the first name, but as expected if @entries contain several strings with authors names (I did that by matching the year and storing $` in the @entries) it will match the first author and it will go to the next $entries. Is there a way to match the pattern more than once, but to store each match separately? You are looking for the /g switch. You can look it up in perlretut[0]. I actually remember reading on the Llama book that the /g modifier could be use with m// also and not only with s/// and thinking but when would you need it with m//. :) For example, would I be able to store Morgan, M.J. as one item in an array and Wilson, C.E. as another one? Sure. the my @names = ... from above will suffice for that. But chances are you want more than that - In general, you have two options. Either you make several small regexes to extract the data piece by piece, or you create a grammar to do the job for you. For the latter, there's two main options: a (?(DEFINE)) pattern, which is Pure Perl and in the language since 5.010, or you pull out Regexp::Grammars from CPAN. They are pretty similar, but Regexp::Grammars is much more powerful, letting you access the full parse tree - so what I'll have to do in two steps in the next snippet, R::G would do in one. Here's my stab at it, using (?(DEFINE))[1], named captures[2], Unicode character properties[3], and a probably unnecessary lookbehind[1] in the split by the end. I made some arbitrary assumptions on the data, like saying that a title can't be longer than 52 characters, or can't have a period in it, or that the journal's name can't have digits in it, which I suppose is a tad disingenuous, but take it as an example, not a solution : P Thanks! This gives me a lot to read on. Cheers, T. -- Education is not to be used to promote obscurantism. - Theodonius Dobzhansky. Gracias a la vida que me ha dado tanto Me ha dado el sonido y el abecedario Con él, las palabras que pienso y declaro Madre, amigo, hermano Y luz alumbrando la ruta del alma del que estoy amando Gracias a la vida que me ha dado tanto Me ha dado la marcha de mis pies cansados Con ellos anduve ciudades y charcos Playas y desiertos, montañas y llanos Y la casa tuya, tu calle y tu patio Violeta Parra - Gracias a la Vida Tiago S. F. Hori PhD Candidate - Ocean Science Center-Memorial
Help with regular expressions
Hi List, I am trying to write a small script to parse bibliographic references like this: Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on reproduction in Atlantic cod. J. Fish Biol. 54, 477-488. What I want to be able to do eventually is parse each name separately and associate that with the title. I am not sure how yet, but I haven't even got there. Right now I am just trying to see if I can parse the names, so I came up with this: foreach (@entries){ if (/((\w)*, (([A-Z].)*),){1,}/){ my $name = $; $name =~ s/\.,/\. /g; push @names, $name; } } It works fine for the first name, but as expected if @entries contain several strings with authors names (I did that by matching the year and storing $` in the @entries) it will match the first author and it will go to the next $entries. Is there a way to match the pattern more than once, but to store each match separately? For example, would I be able to store Morgan, M.J. as one item in an array and Wilson, C.E. as another one? As always, any help is much appreciated. Cheers, Tiago -- Education is not to be used to promote obscurantism. - Theodonius Dobzhansky. Gracias a la vida que me ha dado tanto Me ha dado el sonido y el abecedario Con él, las palabras que pienso y declaro Madre, amigo, hermano Y luz alumbrando la ruta del alma del que estoy amando Gracias a la vida que me ha dado tanto Me ha dado la marcha de mis pies cansados Con ellos anduve ciudades y charcos Playas y desiertos, montañas y llanos Y la casa tuya, tu calle y tu patio Violeta Parra - Gracias a la Vida Tiago S. F. Hori PhD Candidate - Ocean Science Center-Memorial University of Newfoundland
Re: Help with regular expressions
On Mon, May 9, 2011 at 11:44 PM, Tiago Hori tiago.h...@gmail.com wrote: I am trying to write a small script to parse bibliographic references like this: Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on reproduction in Atlantic cod. J. Fish Biol. 54, 477-488. What I want to be able to do eventually is parse each name separately and associate that with the title. I am not sure how yet, but I haven't even got there. I took a stab at this. It might not be perfect and catch all possible variations. But in any case, unless you have rules for the text in these entries, it is very difficult to catch them all. = #!/usr/bin/perl # use strict; use warnings; my $text = END; Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on reproduction in Atlantic cod. J. Fish Biol. 54, 477-488. END my @authors=(); # Extract authors # Assuming each author is composed of one of more matches of: # SPACE* WORD, SPACE* (ALPHABET PERIOD)+ if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) { while(@matches) { my $match = shift @matches; my @comps = map {s/^ +//;s/ +$//;$_} (split ,, $match); push @authors, join ,@comps[1,0]; shift @matches; } } # Extract title # Everything from the first period followed by a space to the next period. # Authors should have periods followed by either a letter or a comma # for this to work if ($text =~m/\. (.*?)\./s) { my $title = $1; $title =~ s/\n/ /g; foreach(@authors) { print $title: $_\n; } } = $ ./match_2.pl The effect of stress on reproduction in Atlantic cod: M.J. Morgan The effect of stress on reproduction in Atlantic cod: C.E. Wilson The effect of stress on reproduction in Atlantic cod: L.W. Crim All, please let me know if there is a way to combine both the regexes. I had a brain coredump before I gave up. Thanks, Sandip -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Help with regular expressions
On Mon, May 9, 2011 at 12:04, Sandip Bhattacharya sand...@foss-community.com wrote: On Mon, May 9, 2011 at 11:44 PM, Tiago Hori tiago.h...@gmail.com wrote: I am trying to write a small script to parse bibliographic references like this: Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on reproduction in Atlantic cod. J. Fish Biol. 54, 477-488. What I want to be able to do eventually is parse each name separately and associate that with the title. I am not sure how yet, but I haven't even got there. I took a stab at this. It might not be perfect and catch all possible variations. But in any case, unless you have rules for the text in these entries, it is very difficult to catch them all. = #!/usr/bin/perl # use strict; use warnings; my $text = END; Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on reproduction in Atlantic cod. J. Fish Biol. 54, 477-488. END my @authors=(); # Extract authors # Assuming each author is composed of one of more matches of: # SPACE* WORD, SPACE* (ALPHABET PERIOD)+ if (my @matches = $text =~ m/(\s*\w+,\s*(\w\.)+),/gs) { while(@matches) { my $match = shift @matches; my @comps = map {s/^ +//;s/ +$//;$_} (split ,, $match); push @authors, join ,@comps[1,0]; shift @matches; } } # Extract title # Everything from the first period followed by a space to the next period. # Authors should have periods followed by either a letter or a comma # for this to work if ($text =~m/\. (.*?)\./s) { my $title = $1; $title =~ s/\n/ /g; foreach(@authors) { print $title: $_\n; } } = $ ./match_2.pl The effect of stress on reproduction in Atlantic cod: M.J. Morgan The effect of stress on reproduction in Atlantic cod: C.E. Wilson The effect of stress on reproduction in Atlantic cod: L.W. Crim All, please let me know if there is a way to combine both the regexes. I had a brain coredump before I gave up. Thanks, Sandip Hasn't someone already fixed this problem? If there isn't a CPAN module to perform standardized bibliographic reference formatting/parsing. I haven't looked at CPAN; did either of you? If a CPAN module doesn't exist, one should! Ken Wolcott
Re: Help with regular expressions
On Mon, May 9, 2011 at 6:35 PM, Kenneth Wolcott kennethwolc...@gmail.comwrote: Hasn't someone already fixed this problem? If there isn't a CPAN module to perform standardized bibliographic reference formatting/parsing. I haven't looked at CPAN; did either of you? If a CPAN module doesn't exist, one should! What standard? Kalthoff K (2001) Analysis of biological development. McGraw-Hill, NY. Or Manning JT, Barley L, Walton J, Lewis-Jones DI, Trivers RL, Singh D, Thornhill R, Rohde P, Bereczkei T, Henzi P, Soler M, Szwed A. (2000) The 2nd:4th digit ratio, sexual dimorphism, population differences, and reproductive success. evidence for sexually antagonistic genes? Evol Hum Behav. 21(3):163-183. Or Berger, M., Lawrence, M., Demichelis, F., Drier, Y., Cibulskis, K., Sivachenko, A., Sboner, A., Esgueva, R., Pflueger, D., Sougnez, C., Onofrio, R., Carter, S., Park, K., Habegger, L., Ambrogio, L., Fennell, T., Parkin, M., Saksena, G., Voet, D., Ramos, A., Pugh, T., Wilkinson, J., Fisher, S., Winckler, W., Mahan, S., Ardlie, K., Baldwin, J., Simons, J., Kitabayashi, N., MacDonald, T., Kantoff, P., Chin, L., Gabriel, S., Gerstein, M., Golub, T., Meyerson, M., Tewari, A., Lander, E., Getz, G., Rubin, M., Garraway, L. (2011). The genomic complexity of primary human prostate cancer Nature, 470 (7333), 214-220 DOI: 10.1038/nature09744 ? If there's a standard, then sure, someone has probably put that into CPAN. The problem is that I don't think that there is, though I'd be glad to be proven wrong. On Mon, May 9, 2011 at 3:14 PM, Tiago Hori tiago.h...@gmail.com wrote: Hi List, Howdy. What I want to be able to do eventually is parse each name separately and associate that with the title. I am not sure how yet, but I haven't even got there. That can range from pretty simple to fairly complex, depending on how much you want to squeeze out of that relationship. If you just want to be able to say Morgan, M.J wrote an article for X journal, titled Y, then that's just a hash (of hashes), and you need to look no further than this mail. But if you also want to say, Journal X has these authors. One of them is Wilson, C.E, who co-wrote article Y, where Crim, L.W. was also a collaborator, and whose primary author is Morgan, M.J., then hashes will probably not cut it anymore (a cyclical hash of hashes might do, but that's pretty tough to handle, and _very_ rough on the eyes). You'll probably want an object model there, or some database interaction. But we are getting ahead of ourselves for now :) foreach (@entries){ if (/((\w)*, (([A-Z].)*),){1,}/){ You probably want some like my @names = /( \w+, (?: [A-Z] \. )+ ,\s* )+/xg instead. my $name = $; Try not to use $ and $` - There's a program-wide speed penalty if you do. Just using capturing groups should make do. It works fine for the first name, but as expected if @entries contain several strings with authors names (I did that by matching the year and storing $` in the @entries) it will match the first author and it will go to the next $entries. Is there a way to match the pattern more than once, but to store each match separately? You are looking for the /g switch. You can look it up in perlretut[0]. For example, would I be able to store Morgan, M.J. as one item in an array and Wilson, C.E. as another one? Sure. the my @names = ... from above will suffice for that. But chances are you want more than that - In general, you have two options. Either you make several small regexes to extract the data piece by piece, or you create a grammar to do the job for you. For the latter, there's two main options: a (?(DEFINE)) pattern, which is Pure Perl and in the language since 5.010, or you pull out Regexp::Grammars from CPAN. They are pretty similar, but Regexp::Grammars is much more powerful, letting you access the full parse tree - so what I'll have to do in two steps in the next snippet, R::G would do in one. Here's my stab at it, using (?(DEFINE))[1], named captures[2], Unicode character properties[3], and a probably unnecessary lookbehind[1] in the split by the end. I made some arbitrary assumptions on the data, like saying that a title can't be longer than 52 characters, or can't have a period in it, or that the journal's name can't have digits in it, which I suppose is a tad disingenuous, but take it as an example, not a solution : P use 5.010; $_ = 'Morgan, M.J., Wilson, C.E., Crim, L.W., 1999. The effect of stress on reproduction in Atlantic cod. J. Fish Biol. 54, 477-488.'; / (?all_names (?ALL_NAMES) ) (?year (?YEAR) )\. \s+ (?title (?TITLE) )\. \s+ (?journal (?JOURNAL) )\. \s* (?edition (?NUM)+ ), \s* (?pages (?NUM)+-(?NUM)+ )\. (?(DEFINE) (?ALL_NAMES ( (?FULL_NAME), \s+)+ ) (?FULL_NAME (?SURNAME), \s* (?INITIALS) ) (?SURNAME \p{Lu}\p{L}* ) (?INITIALS (?:\p{Lu}\.)+ ) (?YEAR \p{PosixDigit}{4} ) (?TITLE [^.]{1,52} ) #Article title (?JOURNAL \P{PosixDigit}+ ) #Journal name (?NUM \p{PosixDigit} )
Re: Regular Expressions Question
On Apr 10, 11:03 pm, jwkr...@shaw.ca (John W. Krahn) wrote: cityuk wrote: Dear All, Hello, This is more of a generic question on regular expressions as my program is working fine but I was just curious. Say you have the following URLs: http://www.test.com/image.gif http://www.test.com/?src=image.gif?width=12 I want to get the type of the image, i.e. the string gif. For the first URL the regular expression .*\.([a-z]{3}) will do the trick while for the second one I am using .*=\([a-z]{3})?.*. Ignoring the fact that the REs can be written better my question is: If I put them together, that is write them as .*\.([a-z]{3})|.*=\([a-z]{3})?.* perl thinks that the or only applies to the characters immediately surrounding it (in this case ) and .). No. The alternation applies to the complete pattern '.*\.([a-z]{3})' OR OK. So if I understood you correctly, given the following (actual) URLs http://beta.images.theglobeandmail.com/archive/01258/election_heads__1258993cl-3.jpg http://storage.canoe.ca/v1/dynamic_resize/?src=http://www.torontosun.com/news/decision2011/2011/04/06/300_harper_boring.jpgsize=248x186 the following pattern ^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$ should match them both. Am I correct? Regards, George '.*=\([a-z]{3})?.*'. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions Question
On Apr 11, 7:21 am, gklc...@googlemail.com (gkl) wrote: On Apr 10, 11:03 pm, jwkr...@shaw.ca (John W. Krahn) wrote: stion on regular expressions as my program is working fine but I was just curious. Say you have the following URLs: http://www.test.com/image.gif http://www.test.com/?src=image.gif?width=12 OK. So if I understood you correctly, given the following (actual) URLs http://beta.images.theglobeandmail.com/archive/01258/election_heads__...http://storage.canoe.ca/v1/dynamic_resize/?src=http://www.torontosun the following pattern ^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$ should match them both. Am I correct? No, there is at least one problem. In your first alternative, the '.*' will also match the literal '?' which the second alternative is matching. See: 'perldoc perlretut' for a review. [ The URI module which was mentioned will be a quicker solution and will work work all cases. ] -- Charles DeRykus See: perldoc perlretut -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions Question
On 11/04/2011 15:21, gkl wrote: OK. So if I understood you correctly, given the following (actual) URLs http://beta.images.theglobeandmail.com/archive/01258/election_heads__1258993cl-3.jpg http://storage.canoe.ca/v1/dynamic_resize/?src=http://www.torontosun.com/news/decision2011/2011/04/06/300_harper_boring.jpgsize=248x186 the following pattern ^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$ should match them both. Am I correct? First of all I notice that the src parameter in your second URL's query is now an absolute URL, whereas your first post had just a file name. Since we cannot anticipate how far and in which direction your problem may grow, it is your responsibility to present the entirety of the possibilities as you know them. Otherwise you will be engaging the world in a goose chase of the wildest sort. If you mean /^\s*.*\.([a-zA-z]{3})$ | ^\S*\?\S*\.([a-zA-z]{3}).*$/ then you must apply the /x modifier, otherwise the spaces at the end of the first option and at the beginning of the second form part of the expressions. As far as I can think, /^\s*.*\.([a-zA-z]{3})$/ is exactly equivalent to /\.([a-zA-z]{3})$/ which, presumably as you intend, will match the first URL and capture 'jpg'. It will fail to match the second URL. While the first option seemed to be considering the possibility of irrelevant leading spaces, the second /^\S*\?\S*\.([a-zA-z]{3}).*$/ is insisting on a sequence of non-spaces from the beginning of the string up to the last possible question mark. Then another sequence of non-spaces up to the last possible dot, followed by three alphas and an ampersand. The subsequent /.*$/ does nothing. I suggest to you that simply /.*\.([a-z]+)/i will match all of the four URLs you have posted so far, and capture from them exactly what you expect. Only you can know the full extent of your problem, and why you refuse the advice you have been offered. I will continue to try to help you. Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions Question
On 11/04/2011 06:43, Shlomi Fish wrote: On Sunday 10 Apr 2011 14:05:49 cityuk wrote: This is more of a generic question on regular expressions as my program is working fine but I was just curious. Say you have the following URLs: http://www.test.com/image.gif http://www.test.com/?src=image.gif?width=12 Don't use regular expressions to parse URLs - instead use URI.pm: http://cpan.uwinnipeg.ca/dist/URI I agree. The program below shows a subroutine which will extract the file type from either form of URL. It first checks to see if there is a 'src' option in the query, using this for the file name if so; otherwise it uses the last segment of the URL path. The file type type is extracted by capturing all trailing non-dot characters from the file name. (I assume your second address should read http://www.test.com/?src=image.gifwidth=12 with an ampersand instead of a second question mark?) HTH, Rob use strict; use warnings; use URI; sub filetype_from_url { my $url = URI-new($_[0]); my %form = $url-query_form; my $file = $form{src} || ($url-path_segments)[-1]; return $file =~ /([^.]+)\z/; } print filetype_from_url('http://www.test.com/image.gif'), \n; print filetype_from_url('http://www.test.com/?src=image.gifwidth=12'), \n; -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regular Expressions Question
Dear All, This is more of a generic question on regular expressions as my program is working fine but I was just curious. Say you have the following URLs: http://www.test.com/image.gif http://www.test.com/?src=image.gif?width=12 I want to get the type of the image, i.e. the string gif. For the first URL the regular expression .*\.([a-z]{3}) will do the trick while for the second one I am using .*=\([a-z]{3})?.*. Ignoring the fact that the REs can be written better my question is: If I put them together, that is write them as .*\.([a-z]{3})|.*=\([a-z]{3})?.* perl thinks that the or only applies to the characters immediately surrounding it (in this case ) and .). Is there a way to say here is a whole RE, here is another and match the first or the second? Regards, George -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions Question
cityuk wrote: Dear All, Hello, This is more of a generic question on regular expressions as my program is working fine but I was just curious. Say you have the following URLs: http://www.test.com/image.gif http://www.test.com/?src=image.gif?width=12 I want to get the type of the image, i.e. the string gif. For the first URL the regular expression .*\.([a-z]{3}) will do the trick while for the second one I am using .*=\([a-z]{3})?.*. Ignoring the fact that the REs can be written better my question is: If I put them together, that is write them as .*\.([a-z]{3})|.*=\([a-z]{3})?.* perl thinks that the or only applies to the characters immediately surrounding it (in this case ) and .). No. The alternation applies to the complete pattern '.*\.([a-z]{3})' OR '.*=\([a-z]{3})?.*'. John -- Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction. -- Albert Einstein -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions Question
On 04/10/2011 04:05 AM, cityuk wrote: Is there a way to say here is a whole RE, here is another and match the first or the second? Jeffrey E.F. Friedl, 2006, Mastering Regular Expressions, 3 e., O'Reilly Media, ISBN 978-0-596-52812-6. http://oreilly.com/catalog/9780596528126/ HTH, David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions Question
On Sunday 10 Apr 2011 14:05:49 cityuk wrote: Dear All, This is more of a generic question on regular expressions as my program is working fine but I was just curious. Say you have the following URLs: http://www.test.com/image.gif http://www.test.com/?src=image.gif?width=12 Don't use regular expressions to parse URLs - instead use URI.pm: http://cpan.uwinnipeg.ca/dist/URI Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ http://www.shlomifish.org/humour/ways_to_do_it.html Electrical Engineering studies. In the Technion. Been there. Done that. Forgot a lot. Remember too much. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular expressions question
Can anyone tell me how to write a regular expression which matches anything _except_ a litteral string ? One could also use a zero-with negative look-ahead assertion: #!/usr/bin/perl -w use strict; while( my $line = DATA ){ if( $line =~ m/^(?!Nomatch)/ ){ print match: $line; } } Thanks a lot for the reply, that worked perfectly in my application. David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regular expressions question
Hi, Can anyone tell me hoq to write a regular expression which matches anything _except_ a litteral string ? For instance, I want to match any line which does not begin with Nomatch. So in the following : Line1 Line2 Nomatch Line3 Line 4 I would match every line except the one containing Nomatch Many thanks, David -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular expressions question
2009/11/17 mangled...@yahoo.com mangled...@yahoo.com: Hi, Hello, Can anyone tell me hoq to write a regular expression which matches anything _except_ a litteral string ? For instance, I want to match any line which does not begin with Nomatch. So in the following : Line1 Line2 Nomatch Line3 Line 4 I would match every line except the one containing Nomatch You would negate the pattern. Something like this: #!/usr/bin/perl use strict; use warnings; while (DATA) { print if ! /^Nomatch/; } __DATA__ Line1 Line2 Nomatch Line3 Line 4 ~ Output: Line1 Line2 Line3 Line 4 see perldoc perlop #Logical-Not and perldoc perlsyn and of course perldoc perlrequick HTH, Dp. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
AW: Regular expressions question
Hi, Dermot paik...@googlemail.com suggested: 2009/11/17 mangled...@yahoo.com mangled...@yahoo.com: Can anyone tell me hoq to write a regular expression which matches anything _except_ a litteral string ? For instance, I want to match any line which does not begin with Nomatch. So in the following : You would negate the pattern. Something like this: #!/usr/bin/perl use strict; use warnings; while (DATA) { print if ! /^Nomatch/; } __DATA__ Line1 Line2 Nomatch Line3 Line 4 One could also use a zero-with negative look-ahead assertion: #!/usr/bin/perl -w use strict; while( my $line = DATA ){ if( $line =~ m/^(?!Nomatch)/ ){ print match: $line; } } __DATA__ Line1 Line2 Nomatch Line3 Line 4 Cheers, Thomas -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular expressions question
On Wed, Nov 18, 2009 at 5:05 PM, Thomas Bätzler t.baetz...@bringe.comwrote: Hi, Dermot paik...@googlemail.com suggested: 2009/11/17 mangled...@yahoo.com mangled...@yahoo.com: Can anyone tell me hoq to write a regular expression which matches anything _except_ a litteral string ? For instance, I want to match any line which does not begin with Nomatch. So in the following : You would negate the pattern. Something like this: #!/usr/bin/perl use strict; use warnings; while (DATA) { print if ! /^Nomatch/; } __DATA__ Line1 Line2 Nomatch Line3 Line 4 One could also use a zero-with negative look-ahead assertion: #!/usr/bin/perl -w use strict; while( my $line = DATA ){ if( $line =~ m/^(?!Nomatch)/ ){ print match: $line; } } __DATA__ Line1 Line2 Nomatch Line3 Line 4 Cheers, Thomas -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/ Look ahead notation works only on relatively recent versions of Perl, if your environment contains things like HP-UX that ships with a decades old version of Perl 5.005 I believe it is (depending on the version of HP-UX of course) you might get in trouble. I would therefore not use it or make the script explicitly require 5.6 or higher just in case. Regards, Rob
Re: Regular Expressions with Incremented Variable Embedded
Raabe, Wesley wrote: I am using regular expressions to alter a text file. Where my original file has three spaces to start a paragraph, I want to replace each instance of three spaces with a bracketed paragraph number, with a counter for paragraph numbers, pgf 1, pgf 2, pgf 3 etc. [...] The WHILE loop that I've crafted is like this: while (IN) { chomp; s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print $para_num;}})\/gi; # Replace three spaces with pgf XX print OUT $_\n; } I'm trying to embed the PERL code based on the PERL tutorial (http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing-Perl-code-in-a-regular-expression, which is noted as an experimental feature. But it doesn't work (using MAC OSX). The output in my text file is pgf (?{my = 1; ++;){print ;}}) at start of each paragraph. Is there a way to do this with AUTO-INCREMENT variable and a FOR loop outside the regular expression in which the value is inserted inside the regular expression? My earlier attempts to do it that way always resulted in no change in the value, just pgf 1 on every paragraph time. I don't understand your g-modifier. Why is it there? I assume that you only want to make the substitution at the start of a line. #!/usr/bin/perl -w use strict; my $fname_inp = test.inp; my $fname_oup = test.oup; { open my $fh_inp, , $fname_inp or die '$fname_inp': , $!; open my $fh_oup, , $fname_oup or die '$fname_oup': , $!; my $pgf = 1; while ( $fh_inp ) { s/^[ ]{3}/pgf $pgf/ and $pgf++; print $fh_oup $_; } close $fh_oup or die '$fname_oup': , $!; } __END__ -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions with Incremented Variable Embedded
Raabe, Wesley wrote: I am using regular expressions to alter a text file. Where my original file has three spaces to start a paragraph, I want to replace each instance of three spaces with a bracketed paragraph number, with a counter for paragraph numbers, pgf 1, pgf 2, pgf 3 etc. The PERL program that I'm using is modeled on the answer to chapter 9, question 3 in the Learning Perl book (4th ed.). The WHILE loop that I've crafted is like this: while (IN) { chomp; s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print $para_num;}})\/gi; # Replace three spaces with pgf XX print OUT $_\n; } I'm trying to embed the PERL code based on the PERL tutorial (http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing- Perl-code-in-a-regular-expression, which is noted as an experimental feature. But it doesn't work (using MAC OSX). The output in my text file is pgf (?{my = 1; ++;){print ;}}) at start of each paragraph. Is there a way to do this with AUTO-INCREMENT variable and a FOR loop outside the regular expression in which the value is inserted inside the regular expression? My earlier attempts to do it that way always resulted in no change in the value, just pgf 1 on every paragraph time. my $para_num; while ( IN ) { s/ /pgf @{[++$para_num]}/g; print OUT; } John -- Those people who think they know everything are a great annoyance to those of us who do.-- Isaac Asimov -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regular Expressions with Incremented Variable Embedded
I am using regular expressions to alter a text file. Where my original file has three spaces to start a paragraph, I want to replace each instance of three spaces with a bracketed paragraph number, with a counter for paragraph numbers, pgf 1, pgf 2, pgf 3 etc. The PERL program that I'm using is modeled on the answer to chapter 9, question 3 in the Learning Perl book (4th ed.). The WHILE loop that I've crafted is like this: while (IN) { chomp; s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print $para_num;}})\/gi; # Replace three spaces with pgf XX print OUT $_\n; } I'm trying to embed the PERL code based on the PERL tutorial (http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing-Perl-code-in-a-regular-expression, which is noted as an experimental feature. But it doesn't work (using MAC OSX). The output in my text file is pgf (?{my = 1; ++;){print ;}}) at start of each paragraph. Is there a way to do this with AUTO-INCREMENT variable and a FOR loop outside the regular expression in which the value is inserted inside the regular expression? My earlier attempts to do it that way always resulted in no change in the value, just pgf 1 on every paragraph time. Thanks, Wesley Raabe wra...@kent.edu Assistant Professor Textual Editing and American Literature Kent State University -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions with Incremented Variable Embedded
On Sat, May 30, 2009 at 23:32, Raabe, Wesley wra...@kent.edu wrote: I am using regular expressions to alter a text file. Where my original file has three spaces to start a paragraph, I want to replace each instance of three spaces with a bracketed paragraph number, with a counter for paragraph numbers, pgf 1, pgf 2, pgf 3 etc. The PERL program that I'm using is modeled on the answer to chapter 9, question 3 in the Learning Perl book (4th ed.). The WHILE loop that I've crafted is like this: while (IN) { chomp; s/\ \ \ /\pgf\ (?{my $para_num = 1; $para_num++;){print $para_num;}})\/gi; # Replace three spaces with pgf XX print OUT $_\n; } I'm trying to embed the PERL code based on the PERL tutorial (http://perldoc.perl.org/perlretut.html#A-bit-of-magic%3a-executing-Perl-code-in-a-regular-expression, which is noted as an experimental feature. But it doesn't work (using MAC OSX). The output in my text file is pgf (?{my = 1; ++;){print ;}}) at start of each paragraph. Is there a way to do this with AUTO-INCREMENT variable and a FOR loop outside the regular expression in which the value is inserted inside the regular expression? My earlier attempts to do it that way always resulted in no change in the value, just pgf 1 on every paragraph time. snip That would be because the second part of a s/// is not a regex, it is a double quote string. What you want is the /e option which interprets the second part as Perl code instead: my $i = 0; while (IN) { s/[ ]{3}/pgf . $i++ . /ge; print; } -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
On Wed, Apr 22, 2009 at 6:12 PM, Chas. Owens chas.ow...@gmail.com wrote: On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc wrote: snip The utf8 pragma affects the whole file, Well, only the part of the file that is parsed after the use utf8; statement, right? snip Hmm, I don't think it would reparse the whole file, but it does run in a BEGIN block...hmm, I must test it. It runs in a begin block, but it is still lexically scoped. Pragmata are very special cases of modules that provide modifications of compile-time behavior, and many of them perform sleight of hand behind the scenes. Here, the sleight of hand is using utf8 to simply add a bit mask to $^H and relying on the the behavior of the compiler hints. The important thing to remember about a BEGIN block that it is run as soon as it is defined, where it is defined. Just because it is executed early in the compile-optimize-run cycle does not mean that it is magically transported to an earlier position in the file. Generally, you want to apply the behavior introduced by a module to have file scope, which is why use statements normally appear early in the file. See perlpragma and the description of $^H in perlrun for details. HTH -- j -- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom! -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
2009/4/24 Jay Savage daggerqu...@gmail.com: snip Hmm, I don't think it would reparse the whole file, but it does run in a BEGIN block...hmm, I must test it. It runs in a begin block, but it is still lexically scoped. Pragmata are very special cases of modules that provide modifications of compile-time behavior, and many of them perform sleight of hand behind the scenes. Here, the sleight of hand is using utf8 to simply add a bit mask to $^H and relying on the the behavior of the compiler hints. The important thing to remember about a BEGIN block that it is run as soon as it is defined, where it is defined. Just because it is executed early in the compile-optimize-run cycle does not mean that it is magically transported to an earlier position in the file. Generally, you want to apply the behavior introduced by a module to have file scope, which is why use statements normally appear early in the file. See perlpragma and the description of $^H in perlrun for details. snip All of this is good information, but for one thing: not all pragmas are lexically scoped. Hence the need to test and/or read the docs. For instance, the re pragma[1] is only partially lexical: #!/usr/bin/perl use strict; use warnings; foo =~ /(o+)/; #re 'debug' still affects this line use re 'debug'; 1. http://perldoc.perl.org/re.html -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
On Fri, Apr 24, 2009 at 15:53, Chas. Owens chas.ow...@gmail.com wrote: snip All of this is good information, but for one thing: not all pragmas are lexically scoped. Hence the need to test and/or read the docs. For instance, the re pragma[1] is only partially lexical: #!/usr/bin/perl use strict; use warnings; foo =~ /(o+)/; #re 'debug' still affects this line use re 'debug'; 1. http://perldoc.perl.org/re.html snip The sigtrap pragma is another example of a pragma that is not lexically scoped. The docs don't say one way or the other, but a quick test proves that it isn't: #!/usr/bin/perl use strict; use warnings; kill 2, $$; sub not_even_called { use sigtrap die = INT; } -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
On Fri, Apr 24, 2009 at 3:53 PM, Chas. Owens chas.ow...@gmail.com wrote: 2009/4/24 Jay Savage daggerqu...@gmail.com: snip Hmm, I don't think it would reparse the whole file, but it does run in a BEGIN block...hmm, I must test it. It runs in a begin block, but it is still lexically scoped. Pragmata are very special cases of modules that provide modifications of compile-time behavior, and many of them perform sleight of hand behind the scenes. Here, the sleight of hand is using utf8 to simply add a bit mask to $^H and relying on the the behavior of the compiler hints. The important thing to remember about a BEGIN block that it is run as soon as it is defined, where it is defined. Just because it is executed early in the compile-optimize-run cycle does not mean that it is magically transported to an earlier position in the file. Generally, you want to apply the behavior introduced by a module to have file scope, which is why use statements normally appear early in the file. See perlpragma and the description of $^H in perlrun for details. snip All of this is good information, but for one thing: not all pragmas are lexically scoped. Hence the need to test and/or read the docs. For instance, the re pragma[1] is only partially lexical: #!/usr/bin/perl use strict; use warnings; foo =~ /(o+)/; #re 'debug' still affects this line use re 'debug'; 1. http://perldoc.perl.org/re.html Agreed, absolutely. My point was that just because something's wrapped in in a BEGIN block doesn't mean one should assume it affects the entire program, or be surprised when it doesn't --j -- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom! -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Gunnar Hjalmarsson wrote: Stanisław T. Findeisen wrote: Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings? For instance, in my regexp: qr/^([.@ \w])*$/ Decode the UTF-8 encoded strings before applying the regex on them. $ perl -MEncode -le ' $utf8_encoded = smörgåsbord; $s = decode UTF-8, $utf8_encoded; print Match if $s =~ /^\w+$/; ' Match $ Thanks, decode helped with this. But can I ask you one more question? What assumptions does Perl make regarding input file (i.e., the program/script file) encoding? Is it so that string literals in Perl are byte arrays in fact? What you type is what you get? STF === http://eisenbits.homelinux.net/~stf/ OpenPGP: DFD9 0146 3794 9CF6 17EA D63F DBF5 8AA8 3B31 FE8A === -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Stanisław T. Findeisen wrote: Gunnar Hjalmarsson wrote: Stanisław T. Findeisen wrote: Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings? For instance, in my regexp: qr/^([.@ \w])*$/ Decode the UTF-8 encoded strings before applying the regex on them. $ perl -MEncode -le ' $utf8_encoded = smörgåsbord; $s = decode UTF-8, $utf8_encoded; print Match if $s =~ /^\w+$/; ' Match $ Thanks, decode helped with this. But can I ask you one more question? What assumptions does Perl make regarding input file (i.e., the program/script file) encoding? AFAIK, it just converts the bytes into Perl's internal format, but it does not assume anything (at least not by default) with respect to the character encoding. Is it so that string literals in Perl are byte arrays in fact? String literals in a Perl script are byte *strings* until decoded. What you type is what you get? Not sure what you mean by that. You may find http://perldoc.perl.org/perlunitut.html helpful. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Gunnar Hjalmarsson wrote: What assumptions does Perl make regarding input file (i.e., the program/script file) encoding? AFAIK, it just converts the bytes into Perl's internal format, but it does not assume anything (at least not by default) with respect to the character encoding. Is it so that string literals in Perl are byte arrays in fact? String literals in a Perl script are byte *strings* until decoded. Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). It's all about UTF8 flag: http://perldoc.perl.org/Encode.html#The-UTF8-flag . STF === http://eisenbits.homelinux.net/~stf/ OpenPGP: DFD9 0146 3794 9CF6 17EA D63F DBF5 8AA8 3B31 FE8A === -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Stanisław T. Findeisen wrote: Gunnar Hjalmarsson wrote: What assumptions does Perl make regarding input file (i.e., the program/script file) encoding? AFAIK, it just converts the bytes into Perl's internal format, but it does not assume anything (at least not by default) with respect to the character encoding. Is it so that string literals in Perl are byte arrays in fact? String literals in a Perl script are byte *strings* until decoded. Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable names or subroutine names. $ perl -MEncode -le ' $s = smörgÃ¥sbord; print length $s; use utf8; print length $s; $s = decode UTF-8, $s; print length $s; ' 13 13 11 $ It's all about UTF8 flag: http://perldoc.perl.org/Encode.html#The-UTF8-flag . Maybe... That's above my head right now, I'm afraid. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Gunnar Hjalmarsson wrote: Stanisław T. Findeisen wrote: With use utf8 (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable names or subroutine names. Or did you possibly mean the utf8::decode() function? -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Gunnar Hjalmarsson wrote: Or did you possibly mean the utf8::decode() function? I mean this: #!/usr/bin/perl use warnings; use strict; # use utf8; use Encode; my $utf8_encoded = smörgåsbord; print('is_utf8: ' . (Encode::is_utf8($utf8_encoded) ? 'TRUE' : 'FALSE') . \n); This outputs FALSE here, but uncomment use utf8 and it gets TRUE. Looks like with use utf8 those string literals aren't ordinary byte strings anymore. Perhaps they are as if Encode::decode had been applied to them? STF === http://eisenbits.homelinux.net/~stf/ OpenPGP: DFD9 0146 3794 9CF6 17EA D63F DBF5 8AA8 3B31 FE8A === -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc wrote: snip Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable names or subroutine names. snip From perldoc utf8[1]: Bytes in the source text that have their high-bit set will be treated as being part of a literal UTF-X sequence. This includes most literals such as identifier names, string constants, and constant regular expression patterns. The utf8 pragma affects the whole file, not just variable and subroutine names. 1. http://perldoc.perl.org/utf8.html -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Stanisław T. Findeisen wrote: I mean this: #!/usr/bin/perl use warnings; use strict; # use utf8; use Encode; my $utf8_encoded = smörgåsbord; print('is_utf8: ' . (Encode::is_utf8($utf8_encoded) ? 'TRUE' : 'FALSE') . \n); This outputs FALSE here, but uncomment use utf8 and it gets TRUE. Looks like with use utf8 those string literals aren't ordinary byte strings anymore. Perhaps they are as if Encode::decode had been applied to them? Yes, it seems to be so. Please also see my reply to Chas.'s post. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Chas. Owens wrote: On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc wrote: snip Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable names or subroutine names. snip From perldoc utf8[1]: Bytes in the source text that have their high-bit set will be treated as being part of a literal UTF-X sequence. This includes most literals such as identifier names, string constants, and constant regular expression patterns. The utf8 pragma affects the whole file, Well, only the part of the file that is parsed after the use utf8; statement, right? not just variable and subroutine names. Yes, I agree on that now. Thanks for the correction. -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc wrote: snip Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable names or subroutine names. snip From perldoc utf8[1]: Bytes in the source text that have their high-bit set will be treated as being part of a literal UTF-X sequence. This includes most literals such as identifier names, string constants, and constant regular expression patterns. The utf8 pragma affects the whole file, Well, only the part of the file that is parsed after the use utf8; statement, right? snip Hmm, I don't think it would reparse the whole file, but it does run in a BEGIN block...hmm, I must test it. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
On Wed, Apr 22, 2009 at 18:12, Chas. Owens chas.ow...@gmail.com wrote: On Wed, Apr 22, 2009 at 17:54, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: On Wed, Apr 22, 2009 at 15:25, Gunnar Hjalmarsson nore...@gunnar.cc wrote: snip Yeah, it looks so. With use utf8 (http://perldoc.perl.org/utf8.html) one can however make them parsed (decoded) (provided they are valid UTF-8). No. The utf8 pragma is about allowing UTF-8 encoded *symbols*, e.g. variable names or subroutine names. snip From perldoc utf8[1]: Bytes in the source text that have their high-bit set will be treated as being part of a literal UTF-X sequence. This includes most literals such as identifier names, string constants, and constant regular expression patterns. The utf8 pragma affects the whole file, Well, only the part of the file that is parsed after the use utf8; statement, right? snip Hmm, I don't think it would reparse the whole file, but it does run in a BEGIN block...hmm, I must test it. snip #!/usr/bn/perl use strict; my $first; BEGIN { $first = é }; my $next = é; use utf8; my $last = é; print first is , utf8::is_utf8($first) ? : not , UTF-8\n; print next is , utf8::is_utf8($next) ? : not , UTF-8\n; print last is , utf8::is_utf8($last) ? : not , UTF-8\n; gives me first is not UTF-8 next is not UTF-8 last is UTF-8 So I would say that it only takes affect for lines after it is used. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
\w regular expressions unicode
Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings? For instance, in my regexp: qr/^([.@ \w])*$/ I am using \w because here: http://perldoc.perl.org/perlretut.html it says: === \w matches a word character (alphanumeric or _), not just [0-9a-zA-Z_] but also digits and characters from non-roman scripts === Unfortunately, this doesn't seem to work with non-ASCII. :-/ Is this a configuration issue? Thanks! STF === http://eisenbits.homelinux.net/~stf/ OpenPGP: DFD9 0146 3794 9CF6 17EA D63F DBF5 8AA8 3B31 FE8A === -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: \w regular expressions unicode
Stanisław T. Findeisen wrote: Hi how to write regular expressions matching against Unicode (eg., UTF-8) strings? For instance, in my regexp: qr/^([.@ \w])*$/ Decode the UTF-8 encoded strings before applying the regex on them. $ perl -MEncode -le ' $utf8_encoded = smörgåsbord; $s = decode UTF-8, $utf8_encoded; print Match if $s =~ /^\w+$/; ' Match $ -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
Chas. Owens wrote: On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson nore...@gunnar.cc wrote: TMTOWTDI use Time::Local; while (DATA) { s{,(.+?),}{ my ($d, $m, $y) = split /\//, $1; my $t = timelocal 0, 0, 0, $d, $m-1, $y; ($d, $m, $y) = (localtime $t)[3..5]; sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d; }e; } snip And this would be the confusing, fragile mess I spoke of. Sorry, but I fail too see how using the s/// operator to extract the date field would be so much more confusing and fragile compared to split() + join(). -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
On Sun, Feb 8, 2009 at 03:49, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson nore...@gunnar.cc wrote: TMTOWTDI use Time::Local; while (DATA) { s{,(.+?),}{ my ($d, $m, $y) = split /\//, $1; my $t = timelocal 0, 0, 0, $d, $m-1, $y; ($d, $m, $y) = (localtime $t)[3..5]; sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d; }e; } snip And this would be the confusing, fragile mess I spoke of. Sorry, but I fail too see how using the s/// operator to extract the date field would be so much more confusing and fragile compared to split() + join(). snip You are calling three functions (one of which is split) and assigning returns three times inside the replacement. Add on top of that the fact that the regex only works for the second field. Compare all of that to calling two much simpler functions, a simple substitution, and one assignment. Try to imagine what happens six months from now when you need to go back and perform a transformation on the fifth field. Are you going to extend the regex to try to capture that value? Or are you just going to rewrite the code to use a split like you should have in the first place? Also, there may be a need to handle commas in the fields at some point in the future. This will entail using a module like Text::CSV. With the split code you can just replace the split with the proper parsing function from the module. With the giant substitution code you pretty much have to rewrite the whole thing. I am all for using advanced features of Perl when it makes the code clearer or more concise, but this code is longer than the split version, involves more functions (including the confusing* localtime and timelocal functions), and doesn't even do error checking on the data. On an unrelated topic, why are you using timelocal? A much better solution is to use the strftime function from the POSIX module: #!/usr/bin/perl use strict; use warnings; use POSIX; while (DATA) { s{,([^,]+),}{ my ($m, $d, $y) = $1 =~ m^([0-9]+)/([0-9]+)/([0-9]+)$ or die $. has an invalid date format; strftime ,%Y%m%d,, 0, 0, 0, $d, $m - 1, $y - 1900; }e; print; } __DATA__ 1,1/1/2009,optional,foo 2,1/2/2009,,bar 3,1/3/2009,,baz Note how the split from your code has been changed to a regex. This is because split is indiscriminate. This was good in my code because it acted as future proofing against more fields being added to the record** (which is unlikely to affect the meaning of earlier fields), but bad here because we know the expected format of the date and the chances of it not being that format and the code still being correct at some point in the future is small. * localtime pretty much only makes sense when you know the C based tm structure it came from and timelocal, besides being a word play that is too clever by half, is worse because it violates that structure***. ** also, if we wanted to throw an error because there were too few or too many fields it would be easily achieved by asking the array how many elements it held. *** http://perldoc.perl.org/Time/Local.html#Year-Value-Interpretation -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
Chas. Owens wrote: On Sun, Feb 8, 2009 at 03:49, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Sorry, but I fail too see how using the s/// operator to extract the date field would be so much more confusing and fragile compared to split() + join(). You are calling three functions (one of which is split) and assigning returns three times inside the replacement. Add on top of that the fact that the regex only works for the second field. Compare all of that to calling two much simpler functions, a simple substitution, and one assignment. Think you are comparing apples and oranges now. Since we don't know what kind of conversion the OP wants to do, I thought we were only discussing the date extracting part of the problem. To clarify, I rewrote my code: use Time::Local; while (DATA) { s{(?=,)(.+?)(?=,)}{ dateconvert($1) }e; print; } sub dateconvert { my ($d, $m, $y) = split /\//, shift; my $t = timelocal 0, 0, 0, $d, $m-1, $y; ($d, $m, $y) = (localtime $t)[3..5]; sprintf '%d-%02d-%02d', $y+1900, $m+1, $d; } __DATA__ TICKER,06/02/09,OPEN,HIGH,LOW,CLOSE,VOLUME,OI TICKER,07/02/09,OPEN,HIGH,LOW,CLOSE,VOLUME,OI TICKER,08/02/97,OPEN,HIGH,LOW,CLOSE,VOLUME,OI In other words, if we are to compare each others code, I believe that s{(?=,)(.+?)(?=,)}{ dateconvert($1) }e; print; ought to be compared with my @record = split /,/, $_; $record[1] = dateconvert( $record[1] ); print join ,, @record; Try to imagine what happens six months from now when you need to go back and perform a transformation on the fifth field. Are you going to extend the regex to try to capture that value? Or are you just going to rewrite the code to use a split like you should have in the first place? Didn't think about that. Maybe I will use split + join. Not a big deal, IMO. I am all for using advanced features of Perl when it makes the code clearer or more concise, but this code is longer than the split version, involves more functions (including the confusing* localtime and timelocal functions), My use of localtime and timelocal is totally unrelated to whether I use the split version or not. and doesn't even do error checking on the data. Not true. timelocal() does error checking. On an unrelated topic, why are you using timelocal? Because of its built-in error checking? ;-) Or maybe because I wanted to use its Year Value Interpretation feature. (Note that I assumed conversion from dd/mm/yy to -mm-dd, and that a date from the 90's is included in my sample data.) A much better solution is to use the strftime function from the POSIX module: Maybe. Somehow I tend to believe that date conversion code becomes more robust if you go to epoch seconds and back. Isn't that what most date and time related modules do behind the scenes, btw? -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Regular Expressions
Hi All, I am a noob in Perl and hence would like some help to what I am sure is a very easy problem. I have got a text file in csv format The format is: TICKER,DATE,OPEN,HIGH,LOW,CLOSE,VOLUME,OI Now my objective is to change the format of the date, and rename the whole file as a .csv So, my strategy is: I want to read the content between the first and second comma, take it in a variable and do the slicing and dicing and write it back. Because I need some real life practice in REGEX, how do you suggest I read the contents between the first and the second comma? Soham Add more friends to your messenger and enjoy! Go to http://messenger.yahoo.com/invite/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
On Sat, Feb 7, 2009 at 08:45, Soham Das soham...@yahoo.co.in wrote: Hi All, I am a noob in Perl and hence would like some help to what I am sure is a very easy problem. I have got a text file in csv format The format is: TICKER,DATE,OPEN,HIGH,LOW,CLOSE,VOLUME,OI Now my objective is to change the format of the date, and rename the whole file as a .csv So, my strategy is: I want to read the content between the first and second comma, take it in a variable and do the slicing and dicing and write it back. Because I need some real life practice in REGEX, how do you suggest I read the contents between the first and the second comma? snip This isn't a job for a regex; it is a job for split: my @record = split ,, $record; $record[1] =~ s{(..)/(..)/()}{$3$1$2} or die line $. has an invalid date format; print join ,, @record; You could say $record =~ s{(.*?),(..)/(..)/(),}{$1,$4$2$3,} or die line $. has an invalid date format; print $record; but the next person to maintain your code may be a little upset at you, especially in the more complicated versions of this type of substitution. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
Chas. Owens wrote: This isn't a job for a regex; it is a job for split: whose first argument is a regex pattern... ;-) -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
On Sat, Feb 7, 2009 at 16:09, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: This isn't a job for a regex; it is a job for split: whose first argument is a regex pattern... ;-) snip Yes and a regex follows in the substitute, but the whole things isn't being done with a regex. Trying to do it with one regex can lead to a confusing and fragile mess. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
Chas. Owens wrote: On Sat, Feb 7, 2009 at 16:09, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: This isn't a job for a regex; it is a job for split: whose first argument is a regex pattern... ;-) snip Yes and a regex follows in the substitute, but the whole things isn't being done with a regex. Trying to do it with one regex can lead to a confusing and fragile mess. TMTOWTDI use Time::Local; while (DATA) { s{,(.+?),}{ my ($d, $m, $y) = split /\//, $1; my $t = timelocal 0, 0, 0, $d, $m-1, $y; ($d, $m, $y) = (localtime $t)[3..5]; sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d; }e; } -- Gunnar Hjalmarsson Email: http://www.gunnar.cc/cgi-bin/contact.pl -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
Chas. Owens wrote: On Sat, Feb 7, 2009 at 08:45, Soham Das soham...@yahoo.co.in wrote: Hi All, I am a noob in Perl and hence would like some help to what I am sure is a very easy problem. I have got a text file in csv format The format is: TICKER,DATE,OPEN,HIGH,LOW,CLOSE,VOLUME,OI Now my objective is to change the format of the date, and rename the whole file as a .csv So, my strategy is: I want to read the content between the first and second comma, take it in a variable and do the slicing and dicing and write it back. Because I need some real life practice in REGEX, how do you suggest I read the contents between the first and the second comma? snip This isn't a job for a regex; it is a job for split: my @record = split ,, $record; $record[1] =~ s{(..)/(..)/()}{$3$1$2} or die line $. has an invalid date format; print join ,, @record; You could say $record =~ s{(.*?),(..)/(..)/(),}{$1,$4$2$3,} or die line $. has an invalid date format; print $record; but the next person to maintain your code may be a little upset at you, especially in the more complicated versions of this type of substitution. $record =~ s|,(..)/(..)/(),|,$3$1$2,| or die Data problem; Rob -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
On Sat, Feb 7, 2009 at 19:21, Rob Dixon rob.di...@gmx.com wrote: snip $record =~ s|,(..)/(..)/(),|,$3$1$2,| or die Data problem; snip Yes, but how would you handle it if this weren't the second field? It is better to have a general solution. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regular Expressions
On Sat, Feb 7, 2009 at 19:11, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: On Sat, Feb 7, 2009 at 16:09, Gunnar Hjalmarsson nore...@gunnar.cc wrote: Chas. Owens wrote: This isn't a job for a regex; it is a job for split: whose first argument is a regex pattern... ;-) snip Yes and a regex follows in the substitute, but the whole things isn't being done with a regex. Trying to do it with one regex can lead to a confusing and fragile mess. TMTOWTDI use Time::Local; while (DATA) { s{,(.+?),}{ my ($d, $m, $y) = split /\//, $1; my $t = timelocal 0, 0, 0, $d, $m-1, $y; ($d, $m, $y) = (localtime $t)[3..5]; sprintf ',%d-%02d-%02d,', $y+1900, $m+1, $d; }e; } snip And this would be the confusing, fragile mess I spoke of. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Comparing files with regular expressions
Given just the idea of the data, can you improve on that? I bet I could! It's interesting how my instinct, when trying to develop a programming solution, is to wrestle with the problem inside the context of the language. As a result, the solutions I come up with tend to be shaped by my limited understanding of that language. I think you're right that this is a case of fluency, that I am fluent in English and my best problem solving skills are most likely in that context. Trying to solve the problem in Perl, I'm likely not using my best skills and thus come up with a poor solution. I also take from your advice, whether you meant it or not, that I should approach my code as if it would be scalable. My solution is probably adequate for a small scale problem but its silliness would quickly be exposed as soon as the data scaled up. Thanks for the advice and inspiration. On Sat, May 3, 2008 at 8:08 PM, Rob Dixon [EMAIL PROTECTED] wrote: rubinsta wrote: Hello, I'm a Perl uber-novice and I'm trying to compare two files in order to exclude items listed on one file from the complete list on the other file. What I have so far prints out a third file listing everything that matches the exclude file from the complete file (which I'm hoping will be a duplicate of the exclude file) just so I can make sure that the comparison script is working. The files are lists of numbers separated by newlines. The exclude file has 333 numbers and the complete file has 9000 numbers. Here's what I have so far: #!/usr/bin/perl use strict; use warnings; open(ALL, all.txt) or die $!; open(EX, exclude.txt) or die $!; open(OUT,'exTest.txt') or die $!; my @ex_lines = EX; my @all_lines = ALL; foreach $all (@all_lines){ foreach $ex (@ex_lines){ if ($ex =~ /(^$all)/){ The lines you have read from the object files are unchomped (include the trailing newline character) and there is no allowance for leading or trailing whitespace. Are you sure of your input data? The regex has an unnecessary capture (parentheses) and isn't tied at the end of the string, although leaving the record separator at the end of $ex and $all has a similar effect. It should really be simply if ($ex eq $all) print OUT $1; The two strings are equal, so print OUT $all; } } } close(ALL); close(EX); Explicit closures are pointless unless the status is verified. All open filehandles will be closed by Perl when it finishes processing the script. (Even if an input file doesn't close cleanly, the damage has already been done when an earlier read failed. If a volume is dismounted while the program is running, for example, without explicit handling of read errors the file will simply appear to be shorter than its true length.) close(OUT); There's no need to close output files unless you're in a fragile environment, or if it is vital that the output information is complete. For instance it may be useful to write close $output or die $!; unlink 'input.txt'; so that the object data was discarded only if the target data was safely written and secured. I realize the nested foreach loops are ugly but I don't know enough to navigate the filehandles, which as I understand, can only be assigned to variables in their entirety as an array. Any thoughts on how this might be done? You should try to solve the problem instead of solving the data. Nearly all of your code is about opening, reading, and closing files. Your solution amounts to: if any of the lines in ALL match any of the lines in EX then print (it) Given just the idea of the data, can you improve on that? For instance, if one or both of the object files are sorted then you may not need to reassess all of the lines for each comparison. Or if the lines could occur more than once in either or both files, then it may be an idea to maintain a record of what comparisons had already been made. Those ideas are independent of Perl, or indeed of any programming language. After that, the line blurs. Programming languages are useful thinking tools for imagining programming solutions, just as natural languages are useful for life's challenges. An idea expressed in Latin can be impossible to recreate intact in French, just a solution in Forth can be inexpressible in C++. But despite its blurriness the line is narrow, so have courage and dash cross it into the implementation, where all languages have ways to open, close, read and write files; ways to handle numbers and strings; conveniences for arrays and constants and, God forbid, error handling. But I encourage you to start at the beginning, and if common sense is more familiar to you than Perl or any other programming language then use that. Your imagination is your best tool. If you were given two piles of line printer paper and were told to find the differences: - what questions would
Re: Comparing files with regular expressions
Aaron Rubinstein wrote: Given just the idea of the data, can you improve on that? I bet I could! I bet you could too :) It's interesting how my instinct, when trying to develop a programming solution, is to wrestle with the problem inside the context of the language. As a result, the solutions I come up with tend to be shaped by my limited understanding of that language. I think you're right that this is a case of fluency, that I am fluent in English and my best problem solving skills are most likely in that context. Trying to solve the problem in Perl, I'm likely not using my best skills and thus come up with a poor solution. It's a frequent assumption that when you working with a tool of any sort, whether it's a knife and fork or a golf club, that you should work with that tool until you are proficient. But unless those tools are prescribed by the rules of the game in play then you should consider alternatives. I often eat from a ladle or wooden spoon when I am cooking, but etiquette says that I may not do the same at table; and getting a ball into a hole half a mile away by hitting it with a stick is not a good solution by any standards. More often than not, a programming language restricts what you can do over what you can describe using English, and while you can always get more out of any language by becoming familiar with it, you are usually becoming familiar with what is impossible or difficult rather than getting used to new exciting possibilities. I also take from your advice, whether you meant it or not, that I should approach my code as if it would be scalable. My solution is probably adequate for a small scale problem but its silliness would quickly be exposed as soon as the data scaled up. Never write off your solution as silly. If it works then it is a solution, and final solutions are almost never the best ones possible. I meant quite the opposite about scalability. My intention was to emphasize that the amount of data changes what is a good solution. It is a useful exercise to imagine that the data is printed on sheets of paper and that you have to solve the problem manually given just an aircraft hangar full of filing cabinets. If you have only a couple of sheets of paper with a single line printed on each, then you can just sit at your desk and write the output. But if you have several stacks of paper then you might want to start using the filing system. Thanks for the advice and inspiration. You're more than welcome. Remember that the best way to solve a problem, whether it's a programming problem or any other sort, is to think about whether it's comparable to any situation you have already come across. It's called abstraction and it's your friend :) Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing files with regular expressions
From: Chas. Owens [EMAIL PROTECTED] On Fri, May 2, 2008 at 10:44 AM, rubinsta [EMAIL PROTECTED] wrote: snip Any thoughts as to why some of the matches are getting missed? snip Not off hand. I will extract your code and do some tests. Can you send me your data or is it sensitive? snip Just out of beginner curiosity, why did you suggest I use the 3 argument filehandle instead of: open(EX, exclude1.txt) or die $! snip Because the three argument version of open is safer. It doesn't matter in the code you wrote because you used a literal string, but if you say open FH, $file or die could not open $file: $!; expecting FH to be a read filehandle and $file contains the filename important, you will wind up with a write filehandle. And that means you were lucky. If the $file contained something like |rm -rf / or rm -rf / | ... Jenda = [EMAIL PROTECTED] === http://Jenda.Krynicky.cz = When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing files with regular expressions
On Sat, May 3, 2008 at 4:42 PM, Jenda Krynicky [EMAIL PROTECTED] wrote: snip [stuff about how two arg open is more dangerous than three arg open And that means you were lucky. If the $file contained something like |rm -rf / or rm -rf / | ... snip Nah, you would be lucky if that were the case: / isn't a valid POSIX filename character. | rm -rf . or rm -rf . | on the other hand is much more dangerous. Not only is . a valid filename character, you also tend to actually have permission to modify the current working directory. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing files with regular expressions
From: Chas. Owens [EMAIL PROTECTED] On Sat, May 3, 2008 at 4:42 PM, Jenda Krynicky [EMAIL PROTECTED] wrote: snip [stuff about how two arg open is more dangerous than three arg open And that means you were lucky. If the $file contained something like |rm -rf / or rm -rf / | ... snip Nah, you would be lucky if that were the case: / isn't a valid POSIX filename character. Why do you think it matters? And | is a valid POSIX filename character? | rm -rf . or rm -rf . | on the other hand is much more dangerous. Not only is . a valid filename character, you also tend to actually have permission to modify the current working directory. You are right about the last issue though. Jenda = [EMAIL PROTECTED] === http://Jenda.Krynicky.cz = When it comes to wine, women and song, wizards are allowed to get drunk and croon as much as they like. -- Terry Pratchett in Sourcery -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing files with regular expressions
rubinsta wrote: Hello, I'm a Perl uber-novice and I'm trying to compare two files in order to exclude items listed on one file from the complete list on the other file. What I have so far prints out a third file listing everything that matches the exclude file from the complete file (which I'm hoping will be a duplicate of the exclude file) just so I can make sure that the comparison script is working. The files are lists of numbers separated by newlines. The exclude file has 333 numbers and the complete file has 9000 numbers. Here's what I have so far: #!/usr/bin/perl use strict; use warnings; open(ALL, all.txt) or die $!; open(EX, exclude.txt) or die $!; open(OUT,'exTest.txt') or die $!; my @ex_lines = EX; my @all_lines = ALL; foreach $all (@all_lines){ foreach $ex (@ex_lines){ if ($ex =~ /(^$all)/){ The lines you have read from the object files are unchomped (include the trailing newline character) and there is no allowance for leading or trailing whitespace. Are you sure of your input data? The regex has an unnecessary capture (parentheses) and isn't tied at the end of the string, although leaving the record separator at the end of $ex and $all has a similar effect. It should really be simply if ($ex eq $all) print OUT $1; The two strings are equal, so print OUT $all; } } } close(ALL); close(EX); Explicit closures are pointless unless the status is verified. All open filehandles will be closed by Perl when it finishes processing the script. (Even if an input file doesn't close cleanly, the damage has already been done when an earlier read failed. If a volume is dismounted while the program is running, for example, without explicit handling of read errors the file will simply appear to be shorter than its true length.) close(OUT); There's no need to close output files unless you're in a fragile environment, or if it is vital that the output information is complete. For instance it may be useful to write close $output or die $!; unlink 'input.txt'; so that the object data was discarded only if the target data was safely written and secured. I realize the nested foreach loops are ugly but I don't know enough to navigate the filehandles, which as I understand, can only be assigned to variables in their entirety as an array. Any thoughts on how this might be done? You should try to solve the problem instead of solving the data. Nearly all of your code is about opening, reading, and closing files. Your solution amounts to: if any of the lines in ALL match any of the lines in EX then print (it) Given just the idea of the data, can you improve on that? For instance, if one or both of the object files are sorted then you may not need to reassess all of the lines for each comparison. Or if the lines could occur more than once in either or both files, then it may be an idea to maintain a record of what comparisons had already been made. Those ideas are independent of Perl, or indeed of any programming language. After that, the line blurs. Programming languages are useful thinking tools for imagining programming solutions, just as natural languages are useful for life's challenges. An idea expressed in Latin can be impossible to recreate intact in French, just a solution in Forth can be inexpressible in C++. But despite its blurriness the line is narrow, so have courage and dash cross it into the implementation, where all languages have ways to open, close, read and write files; ways to handle numbers and strings; conveniences for arrays and constants and, God forbid, error handling. But I encourage you to start at the beginning, and if common sense is more familiar to you than Perl or any other programming language then use that. Your imagination is your best tool. If you were given two piles of line printer paper and were told to find the differences: - what questions would you ask about the problem? - how would you go about it? - what would you want to know about the contents? Once you know the answers, you have a solution. Then you can code it, given knowledge of the language at hand. Many things will change the solution, just as you would do things differently if you had only two sheets of paper to compare, or a two-inch-thick stack. Whether you had to do it every day or it was somebody else's turn in ten years' time. Whether it was obvious that all of the lines on one stack of paper were the same except for a few changes. You get the idea? But unless it is easier for you to formulate solutions in Perl or any other language, then imagine a real-world equivalent and use common sense. Then just code it, and we will help. HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing files with regular expressions
On Sat, May 3, 2008 at 5:57 PM, Jenda Krynicky [EMAIL PROTECTED] wrote: From: Chas. Owens [EMAIL PROTECTED] On Sat, May 3, 2008 at 4:42 PM, Jenda Krynicky [EMAIL PROTECTED] wrote: snip [stuff about how two arg open is more dangerous than three arg open And that means you were lucky. If the $file contained something like |rm -rf / or rm -rf / | ... snip Nah, you would be lucky if that were the case: / isn't a valid POSIX filename character. Why do you think it matters? And | is a valid POSIX filename snip Hmm, you are right. I was an idiot there. I was assuming the file was coming off disk (but named in a way to cause problems for the processing program), but $file could come from anywhere. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Comparing files with regular expressions
Hello, I'm a Perl uber-novice and I'm trying to compare two files in order to exclude items listed on one file from the complete list on the other file. What I have so far prints out a third file listing everything that matches the exclude file from the complete file (which I'm hoping will be a duplicate of the exclude file) just so I can make sure that the comparison script is working. The files are lists of numbers separated by newlines. The exclude file has 333 numbers and the complete file has 9000 numbers. Here's what I have so far: #!/usr/bin/perl use strict; use warnings; open(ALL, all.txt) or die $!; open(EX, exclude.txt) or die $!; open(OUT,'exTest.txt') or die $!; my @ex_lines = EX; my @all_lines = ALL; foreach $all (@all_lines){ foreach $ex (@ex_lines){ if ($ex =~ /(^$all)/){ print OUT $1; } } } close(ALL); close(EX); close(OUT); I realize the nested foreach loops are ugly but I don't know enough to navigate the filehandles, which as I understand, can only be assigned to variables in their entirety as an array. Any thoughts on how this might be done? Thanks! -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing files with regular expressions
On Thu, May 1, 2008 at 4:09 PM, rubinsta [EMAIL PROTECTED] wrote: Hello, I'm a Perl uber-novice and I'm trying to compare two files in order to exclude items listed on one file from the complete list on the other file. What I have so far prints out a third file listing everything that matches the exclude file from the complete file (which I'm hoping will be a duplicate of the exclude file) just so I can make sure that the comparison script is working. The files are lists of numbers separated by newlines. The exclude file has 333 numbers and the complete file has 9000 numbers. Here's what I have so far: #!/usr/bin/perl use strict; use warnings; open(ALL, all.txt) or die $!; open(EX, exclude.txt) or die $!; open(OUT,'exTest.txt') or die $!; snip Use the three argument version of open and lexical filehandles: open my $ex, , exclude.txt or die could not open exclude.txt: $!; snip my @ex_lines = EX; my @all_lines = ALL; snip Using filehandles in list context is a bad idea. It may work now when the files are small, but data almost always grows. Unless you are certain that the file will remain small you should not do this. Use a while loop instead. snip foreach $all (@all_lines){ foreach $ex (@ex_lines){ if ($ex =~ /(^$all)/){ This is testing to see if there are any lines in the exclude file that start with what was in the complete file. That is if the complete file was 1 2 and the exclude file was 10 20 then all lines would be excluded. Is this really what you want? Also, given that you have not surrounded $all with \Q and \E (like /^\Q$all\E/) and metacharacters in $all (like *, ., ?, etc.) will be treated as metacharacters instead of normal characters. Unless the lines in complete are know to be regexes this could be bad. And by bad I mean everything from mismatches to the dreaded (?{system qq(rm -rf $ENV{HOME})}). If you don't have regexes in the complete file but do want to check for its entires as prefixes in the exclude file, you are better off using a prefix tree (aka a trie*). It is an O(m log n)** algorithm, as opposed to the O(n*m) algorithm you are using now. There is at least one Perl implementation: Tree::Trie***. If you don't have regexes in the complete file and do not want to check for entries as prefixes in the exclude file you are better off using a hash set* to test for existence (roughly an O(m+n) solution). Luckily in Perl a hash set is easy to build, you just use a hash variable with the keys being your data and the values all being either undef or 1 depending on your style (I tend to use 1 for simplicity's sake, but I think undef might be smaller). Using a hash also gives you the freedom to use something like DB_FILE** if the files get very large (thus saving memory without having to add much code. snip print OUT $1; } } } close(ALL); close(EX); close(OUT); snip These calls to close at the end of the script are unnecessary. Only call close explicitly if you need to close a file before the filehandle goes out of scope. Another simple tip is to treat STDIN/files on the command line as your complete file and STDOUT as your output file. This form of Perl script is called a filter and is very easy to write and use. What follows is my implementation of the hash set version: #!/usr/bin/perl use strict; use warnings; #this is a hack to make the script runnable #without external data files, in a normal #script you would open a real exclude file #here my $exclude = 1\n2\n3\n; open my $ex, , \$exclude or die could not open the scalar \$exculde as a file: $!; my %exists; $exists{$_} = 1 while $ex; #this is also a hack, in a normal script #you would say #while (my $line = ) { #to get a loop over STDIN or files specified #on the commandline while (my $line = DATA) { print $line unless $exists{$line}; } __DATA__ 1 2 10 20 * http://en.wikipedia.org/wiki/Trie ** This is big O notation, basically it measure the order of magnitude of number of steps needed to complete the algorithm. So, if you had 1,000 lines in exclude and 10,000 lines in complete it would take roughly 10,000,000 steps to complete the algorithm you are using now and only 13,287 with the trie. *** http://search.cpan.org/~avif/Tree-Trie-1.5/Trie.pm http://en.wikipedia.org/wiki/Big_O_notation * basically a hash with no values used for testing of existance of values ** http://perldoc.perl.org/DB_File.html -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Comparing files with regular expressions
Many thanks, Chas. These are all very helpful (and educational!) suggestions. I adapted your example like so (specifying the all.txt on the command-line): #!/usr/bin/perl use strict; use warnings; open my $ex, , exclude.txt or die $!; open my $out, , exTest.txt or die $!; my %exists; $exists{$_} = 1 while $ex; ## I changed the unless to if so I could easily ## compare the output of the script to the ## original exclude.txt file while (my $line = ){ print $out $line if $exists{$line}; } The problem is the exlude.txt and exTest.txt do not match. Everything in the exTest.txt file is also in the exclude.txt file but there are a number of lines that appear in the all.txt and the exclude.txt that do not end up in exTest.txt. The numbers are EANs and are thus all exactly the same format, e.g. 9780657007423. Any thoughts as to why some of the matches are getting missed? Just out of beginner curiosity, why did you suggest I use the 3 argument filehandle instead of: open(EX, exclude1.txt) or die $! Thanks again for all your help! On May 2, 7:41 am, [EMAIL PROTECTED] (Chas. Owens) wrote: On Thu, May 1, 2008 at 4:09 PM, rubinsta [EMAIL PROTECTED] wrote: Hello, I'm a Perl uber-novice and I'm trying to compare two files in order to exclude items listed on one file from the complete list on the other file. What I have so far prints out a third file listing everything that matches the exclude file from the complete file (which I'm hoping will be a duplicate of the exclude file) just so I can make sure that the comparison script is working. The files are lists of numbers separated by newlines. The exclude file has 333 numbers and the complete file has 9000 numbers. Here's what I have so far: #!/usr/bin/perl use strict; use warnings; open(ALL, all.txt) or die $!; open(EX, exclude.txt) or die $!; open(OUT,'exTest.txt') or die $!; snip Use the three argument version of open and lexical filehandles: open my $ex, , exclude.txt or die could not open exclude.txt: $!; snip my @ex_lines = EX; my @all_lines = ALL; snip Using filehandles in list context is a bad idea. It may work now when the files are small, but data almost always grows. Unless you are certain that the file will remain small you should not do this. Use a while loop instead. snip foreach $all (@all_lines){ foreach $ex (@ex_lines){ if ($ex =~ /(^$all)/){ This is testing to see if there are any lines in the exclude file that start with what was in the complete file. That is if the complete file was 1 2 and the exclude file was 10 20 then all lines would be excluded. Is this really what you want? Also, given that you have not surrounded $all with \Q and \E (like /^\Q$all\E/) and metacharacters in $all (like *, ., ?, etc.) will be treated as metacharacters instead of normal characters. Unless the lines in complete are know to be regexes this could be bad. And by bad I mean everything from mismatches to the dreaded (?{system qq(rm -rf $ENV{HOME})}). If you don't have regexes in the complete file but do want to check for its entires as prefixes in the exclude file, you are better off using a prefix tree (aka a trie*). It is an O(m log n)** algorithm, as opposed to the O(n*m) algorithm you are using now. There is at least one Perl implementation: Tree::Trie***. If you don't have regexes in the complete file and do not want to check for entries as prefixes in the exclude file you are better off using a hash set* to test for existence (roughly an O(m+n) solution). Luckily in Perl a hash set is easy to build, you just use a hash variable with the keys being your data and the values all being either undef or 1 depending on your style (I tend to use 1 for simplicity's sake, but I think undef might be smaller). Using a hash also gives you the freedom to use something like DB_FILE** if the files get very large (thus saving memory without having to add much code. snip print OUT $1; } } } close(ALL); close(EX); close(OUT); snip These calls to close at the end of the script are unnecessary. Only call close explicitly if you need to close a file before the filehandle goes out of scope. Another simple tip is to treat STDIN/files on the command line as your complete file and STDOUT as your output file. This form of Perl script is called a filter and is very easy to write and use. What follows is my implementation of the hash set version: #!/usr/bin/perl use strict; use warnings; #this is a hack to make the script runnable #without external data files, in a normal #script you would open a real exclude file #here my $exclude = 1\n2\n3\n; open my $ex, , \$exclude or die could not open the scalar \$exculde as a file: $!; my %exists; $exists{$_} = 1 while $ex; #this is also a hack, in a normal script #you would say #while (my $line = ) { #to
Re: Comparing files with regular expressions
On Fri, May 2, 2008 at 10:44 AM, rubinsta [EMAIL PROTECTED] wrote: snip Any thoughts as to why some of the matches are getting missed? snip Not off hand. I will extract your code and do some tests. Can you send me your data or is it sensitive? snip Just out of beginner curiosity, why did you suggest I use the 3 argument filehandle instead of: open(EX, exclude1.txt) or die $! snip Because the three argument version of open is safer. It doesn't matter in the code you wrote because you used a literal string, but if you say open FH, $file or die could not open $file: $!; expecting FH to be a read filehandle and $file contains the filename important, you will wind up with a write filehandle. Specifying the type of filehandle you want separately from the file is an important safety feature. Using the old version of open is a bad habit you should not develop. You should know it exists (like many of the other bad habits left over from earlier versions of the Language) in case you run into code that uses it, but you shouldn't use it yourself. I would also strongly recommend using lexical filehandles instead of the old bareword style for similar reasons. -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
problem using backslash on brackets in regular expressions
Hi, I have files which contain sentences, where some lines have extra information inside brackets and parentheses. I would like to delete everything contained within brackets or parentheses, including the brackets. I know that I am supposed to use the backslash to turn off the metacharacter properties of brackets and parentheses in a regular expression. I am trying to use the s/// operator to remove it, by doing this: while(INPUT) { $_ =~ s/\[*\]//; $_ =~ s/\(*\)//; print $_; } so if the input is: *MOT: I'm gonna first [//] first I wanna use em all up . then the output I'd like to get is: *MOT: I'm gonna first first I wanna use em all up . but instead what I get is: *MOT: I'm gonna first [// first I wanna use em all up . It only deletes the last piece, the ] bracket. How can I erase the whole thing? Thanks. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: problem using backslash on brackets in regular expressions
-Original Message- From: Daniel McClory [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 22, 2008 16:06 To: beginners@perl.org Subject: problem using backslash on brackets in regular expressions Hi, I have files which contain sentences, where some lines have extra information inside brackets and parentheses. I would like to delete everything contained within brackets or parentheses, including the brackets. I know that I am supposed to use the backslash to turn off the metacharacter properties of brackets and parentheses in a regular expression. I am trying to use the s/// operator to remove it, by doing this: while(INPUT) { $_ =~ s/\[*\]//; What you are saying here is the first bracket can have zero or more occurances followed by a ], which is what you are seeing in your output(ie, the / before the ] is not a [ okay, then ] and replace the ] with nothing. s/\[[^\]]+]//; $_ =~ s/\(*\)//; s/\([^\)]+)//; No reason to do the $_ =~ as by default that is what is going to be done anyway. Wags ;) print $_; } so if the input is: *MOT: I'm gonna first [//] first I wanna use em all up . then the output I'd like to get is: *MOT: I'm gonna first first I wanna use em all up . but instead what I get is: *MOT: I'm gonna first [// first I wanna use em all up . It only deletes the last piece, the ] bracket. How can I erase the whole thing? Thanks. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ ** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. ** -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: problem using backslash on brackets in regular expressions
Daniel McClory schreef: while(INPUT) { $_ =~ s/\[*\]//; $_ =~ s/\(*\)//; print $_; } while ( INPUT ) { s/\[.*?\]//; s/\(.*?\)//; print; } -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: problem using backslash on brackets in regular expressions
Hello, (snip) I am trying to use the s/// operator to remove it, by doing this: while(INPUT) { $_ =~ s/\[*\]//; $_ =~ s/\(*\)//; print $_; } (snip) The method used is incorrect. $_ =~ s/\[*\]//; --- This says that the search is for opening parenthesis (zero or more occurrences of it, since a '*' follows '[') followed by a close parenthesis. (snip) so if the input is: *MOT: I'm gonna first [//] first I wanna use em all up . (snip) In this input, considering ur search pattern, close parenthesis ']' is found. Before that, the character is '/which is not '[' (zero occurrence of it). Thus, it's a valid search. So only ']' is removed. The correct search pattern ought to be: $_ =~ s/\[.*\]//; --- This shall search for an opening parenthesis followed by zero or more characters (.*) followed by a close parenthesis. So if the input is *MOT: I'm gonna first [//] first I wanna use em all up . then output will be: *MOT: I'm gonna first first I wanna use em all up . [!]HOWEVER if there are more than one pair of '[]' then another problem occurs. Eg: Input: I'm gonna first [//] second [//] third I wanna use em all up. Output: I'm gonna first third I wanna use em all up. * as u can see the 2nd 'first' is missing. This is because of the greediness of Perl which tries to match as much of the search pattern as possible. To solve this, we use the '?' operator. Thus, the correct search pattern is $_ =~ s/\[.*?\]//; This will give the output: I'm gonna first second [//] third I wanna use em all up. To remove all such occurrences, use the global search: $_ =~ s/\[.*?\]//g; Use a similar approach for '()'. Regards, Adarsh
Can regular expressions be used as subroutine arguments?
Hello Folks, I need to make a substitution in place for each element of an array, and I need to do this to two arrays. Currently the relevant code fragment (without pragmas) is: foreach my $element (@cddb_artist) { $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/; } foreach my $element (@cddb_track) { $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/; } The above fragment seems to be a good candidate for generalizing into a subroutine. I have two questions regarding this: 1. Can this particular regular expression, involving as it does, matched sub-pattern variables like $1, be used as a subroutine argument, and if so, how? 2. Can arbitrary regular expressions, including /PATTERN/REPLACEMENT/ versions for substitutions, be used as subroutine arguments, and if so, how? TIA. Chandra -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Can regular expressions be used as subroutine arguments?
R (Chandra) Chandrasekhar schreef: foreach my $element (@cddb_artist) { $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/; } foreach my $element (@cddb_track) { $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/; } You can write all that as this single line: s/^.*?([0-9,a-f]{8}):.*$/$1/ for @cddb_artist, @cddb_track; Do you really want the comma inside the character class? 1. Can this particular regular expression, involving as it does, matched sub-pattern variables like $1, be used as a subroutine argument, and if so, how? Only the first part of the substitution is a regular expression. my $re_hex8 = qr/[[:xdigit:]]{8}/; s/^.*?($re_hex8):.*$/$1/ for @cddb_artist, @cddb_track; Alternative: perl -wle' my @cddb_artist = (xyz 12345678: abc); my @cddb_track = (abc fedcba09: xyz); my $re_hex8 = qr/[[:xdigit:]]{8}/; ($_) = m/($re_hex8)(?=:)/ for @cddb_artist, @cddb_track; print for @cddb_artist, @cddb_track; ' 12345678 fedcba09 2. Can arbitrary regular expressions, including /PATTERN/REPLACEMENT/ versions for substitutions, be used as subroutine arguments, and if so, how? Store the parts in variables. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Can regular expressions be used as subroutine arguments?
On Sat, Mar 8, 2008 at 9:59 AM, Dr.Ruud [EMAIL PROTECTED] wrote: snip 2. Can arbitrary regular expressions, including /PATTERN/REPLACEMENT/ versions for substitutions, be used as subroutine arguments, and if so, how? Store the parts in variables. snip Specifically, use the qr// operator to create precompiled regexes that can be stored in a scalar: my $regex = qr/^.*?([0-9,a-f]{8}):.*$/; $string =~ s/$regex/$1/; -- Chas. Owens wonkden.net The most important skill a programmer can have is the ability to read. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Can regular expressions be used as subroutine arguments?
R (Chandra) Chandrasekhar wrote: Hello Folks, I need to make a substitution in place for each element of an array, and I need to do this to two arrays. Currently the relevant code fragment (without pragmas) is: foreach my $element (@cddb_artist) { $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/; } foreach my $element (@cddb_track) { $element =~ s/^.*?([0-9,a-f]{8}):.*$/$1/; } As Dr.Ruud said that could be written as: s/^.*?([0-9,a-f]{8}):.*$/$1/ for @cddb_artist, @cddb_track; But you don't really need the anchors so: s/.*?([0-9,a-f]{8}):.*/$1/ for @cddb_artist, @cddb_track; And if you are not worried about preserving the newline at the end you could do it like this: ($_) = /([0-9,a-f]{8}):/ for @cddb_artist, @cddb_track; The above fragment seems to be a good candidate for generalizing into a subroutine. sub my_sub_something { my $regex = shift; ( $_ ) = /$regex/ for @_; } And call it like this: my_sub_something( qr/([0-9,a-f]{8}):/, @cddb_artist, @cddb_track ); But it would probably be simpler just to use the for statement above. John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order.-- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: entering regular expressions from the keyboard
Jay Savage schreef: Dr.Ruud: Christopher Spears: #print $regexp; Make that print qr/$regexp/; Not sure where your headed with this. My headed? :) It was an alternative for the commented debug line. First, OP wants to print the input back to the user. And I presume that it is more a developer directed print statement. -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: entering regular expressions from the keyboard
On 8/21/07, Dr.Ruud [EMAIL PROTECTED] wrote: Jeff Pang schreef: Christopher Spears: print Enter regular expression: ; chomp(my $regexp = STDIN); $regexp = quotemeta($regexp); Since it specifically asks for a regular expression, I would definitely not do quotemeta(). Exactly. quotemeta() defeats the whole purpose here. We *want* the user to be able to input metacharacters for the match. #print $regexp; Make that print qr/$regexp/; Not sure where your headed with this. First, OP wants to print the input back to the user. it makes sense to do this unmodified, for the most part. Also, qr// doesn't modify the variable, it returns the compiled expression, which is just being thrown away after the print. That means the regex is actually being compiled twice. It probably doesn't, though, make sense to compile the regex before entering the loop, so perhaps something like: chomp(my $regexp = STDIN); print $regexp, \n; $regexp = qr/$regexp/; ... One additional note to Chris: In any case, '$_ =~ \$regxep' is almost certainly not what you're looking for. Since $regexp is a simple scalar and not a reference, your current code is trying to match against something like /SCALAR(0x18231cc)/. HTH, --jay -- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom!
Re: entering regular expressions from the keyboard
On 8/23/07, Jay Savage [EMAIL PROTECTED] wrote: That means the regex is actually being compiled twice. It probably doesn't, though, make sense to compile the regex before entering the loop, so perhaps something like: Make that *does* make sense. -- j -- This email and attachment(s): [ ] blogable; [ x ] ask first; [ ] private and confidential daggerquill [at] gmail [dot] com http://www.tuaw.com http://www.downloadsquad.com http://www.engatiki.org values of β will give rise to dom!
Re: entering regular expressions from the keyboard
On Aug 20, 11:28 pm, [EMAIL PROTECTED] (Christopher Spears) wrote: I'm working on the second exercise of the second chapter. I'm supposed to write a program that asks the user to type a regular expression. The program then uses the regular expression to try to find a match in the directory that I hard coded into the program. Here is what I have so far: #!/usr/bin/perl -w use strict; print Enter regular expression: ; chomp(my $regexp = STDIN); #print $regexp; opendir(CPPDIR,/home/io/chris_cpp/) or die Could not open directory: $!; my @allfiles = readdir CPPDIR; closedir CPPDIR; foreach $_(@allfiles){ if ($_ =~ \$regexp){ print $_.\n; } } My problem lies with the matching part. I'm not sure how to use the string that I stored in the $regexp variable as a regular expression. Any hints? Shawn and Jeff each gave you half of the answer. Jeff pointed out that when your pattern is contained in a variable, you should use quotemeta(). This will backslash any metacharacters the variable might contain, so that they match themselves rather than being special in the pattern match (so any periods match periods, rather than any character, plus signs match plus signs, rather than meaning one or more of the previous, etc): $regexp = quotemeta($regexp) And Shawn pointed out that the proper syntax for a pattern match is: $_ =~ /$regexp/ Those two lines should be combined: $regexp = quotemeta($regexp); foreach $_(@allfiles){ if ($_ =~ /$regexp/){ print $_.\n; } } Or, instead of calling quotemeta() explicitly, you can use the \Q and \E escape sequences to do the backquoting within the pattern match itself: foreach $_ (@allfiles) { if ($_ =~ /\Q$regexp\E/) { print $_ . \n; } } Also note that an experienced Perl programmer would either eliminate the $_ whenever it's not needed: foreach (@allfiles) { if (/\Q$regexp\E/) { print $_\n; } } Or would use a better variable name as the loop iterator: foreach my $file (@allfiles) { if ($file =~ /\Q$regexp\E/) { print $file\n; } } Hope that helps, Paul Lalli -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: entering regular expressions from the keyboard
Jeff Pang schreef: Christopher Spears: print Enter regular expression: ; chomp(my $regexp = STDIN); $regexp = quotemeta($regexp); Since it specifically asks for a regular expression, I would definitely not do quotemeta(). #print $regexp; Make that print qr/$regexp/; -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
entering regular expressions from the keyboard
Hi! I'm trying to get back into Perl again by working through Intermediate Perl. Unfortunately, the Perl part of my brain has atrophied! I'm working on the second exercise of the second chapter. I'm supposed to write a program that asks the user to type a regular expression. The program then uses the regular expression to try to find a match in the directory that I hard coded into the program. Here is what I have so far: #!/usr/bin/perl -w use strict; print Enter regular expression: ; chomp(my $regexp = STDIN); #print $regexp; opendir(CPPDIR,/home/io/chris_cpp/) or die Could not open directory: $!; my @allfiles = readdir CPPDIR; closedir CPPDIR; foreach $_(@allfiles){ if ($_ =~ \$regexp){ print $_.\n; } } My problem lies with the matching part. I'm not sure how to use the string that I stored in the $regexp variable as a regular expression. Any hints? I'm the last person to pretend that I'm a radio. I'd rather go out and be a color television set. -David Bowie Who dares wins -British military motto I generally know what I'm doing. -Buster Keaton -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: entering regular expressions from the keyboard
-Original Message- From: Christopher Spears [EMAIL PROTECTED] Sent: Aug 21, 2007 11:28 AM To: beginners@perl.org Subject: entering regular expressions from the keyboard Hi! I'm trying to get back into Perl again by working through Intermediate Perl. Unfortunately, the Perl part of my brain has atrophied! I'm working on the second exercise of the second chapter. I'm supposed to write a program that asks the user to type a regular expression. The program then uses the regular expression to try to find a match in the directory that I hard coded into the program. Here is what I have so far: #!/usr/bin/perl -w use strict; print Enter regular expression: ; chomp(my $regexp = STDIN); #print $regexp; $regexp = quotemeta($regexp); See also perldoc -f quotemeta. -- Jeff Pang - [EMAIL PROTECTED] http://home.arcor.de/jeffpang/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: entering regular expressions from the keyboard
Christopher Spears wrote: Hi! I'm trying to get back into Perl again by working through Intermediate Perl. Unfortunately, the Perl part of my brain has atrophied! I'm working on the second exercise of the second chapter. I'm supposed to write a program that asks the user to type a regular expression. The program then uses the regular expression to try to find a match in the directory that I hard coded into the program. Here is what I have so far: #!/usr/bin/perl -w use strict; print Enter regular expression: ; chomp(my $regexp = STDIN); #print $regexp; # from here opendir(CPPDIR,/home/io/chris_cpp/) or die Could not open directory: $!; my @allfiles = readdir CPPDIR; closedir CPPDIR; # try: my @allfiles = glob( '*' ); foreach $_(@allfiles){ if ($_ =~ \$regexp){ # bad regular expression. try: if( /$regexp/ ){ print $_.\n; } } My problem lies with the matching part. I'm not sure how to use the string that I stored in the $regexp variable as a regular expression. Any hints? I'm the last person to pretend that I'm a radio. I'd rather go out and be a color television set. -David Bowie Who dares wins -British military motto Of course we'll win; we're British - another British military motto I generally know what I'm doing. -Buster Keaton -- Just my 0.0002 million dollars worth, Shawn For the things we have to learn before we can do them, we learn by doing them. Aristotle -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
regular expressions issue
I created a file called data.txt which contains a bunch of junk, including some IPs. I want $line to be stored in $iphttp://www.tek-tips.com/viewthread.cfm?qid=1382614page=1# . It works, except for the regular expressions which should find only IPs. If I use the regular expression with the grep command in terminal I get only the IPs. Here in Perl I don't get any output. #!/usr/bin/perl @input = `cat ~/ip.txt`; foreach $line (@input){ if($line =~ /[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}/){ $ip = $line; print $ip; } } Any ideas? It's breaking my head. Amichai
Re: regular expressions issue
On 6/27/07, Amichai Teumim [EMAIL PROTECTED] wrote: If I use the regular expression with the grep command in terminal I get only the IPs. Here in Perl I don't get any output. The grep command uses grep's regular expressions, but Perl uses Perl's regular expressions. Alas, everybody's regular expressions are different. Perl's are usually better, of course. But the syntax is always different. @input = `cat ~/ip.txt`; I hope that this is _supposed_ to be a quick-and-dirty program. This works, although it's slower than using a filehandle would be, and it probably uses more memory. Although if you're using the tilde to open a file in the user's home directory, well, that's maybe the best way to do it. /[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}/){ I think in Perl that pattern might be this: /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/ But do you really want to match 999.999.999.999? You don't have to. Have you heard of Regexp::Common? Regexp::Common::net seems to have what you want. /^$RE{net}{IPv4}$/ http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common.pm http://search.cpan.org/dist/Regexp-Common/lib/Regexp/Common/net.pm Even if you don't want to install the module to get just one pattern, you could use the pattern that it supplies, which is sure to be at least as good as anything you would write on your own. Good luck with it! --Tom Phoenix Stonehenge Perl Training -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: regular expressions issue
Amichai Teumim wrote: I created a file called data.txt which contains a bunch of junk, including some IPs. I want $line to be stored in $iphttp://www.tek-tips.com/viewthread.cfm?qid=1382614page=1# . It works, except for the regular expressions which should find only IPs. If I use the regular expression with the grep command in terminal I get only the IPs. Here in Perl I don't get any output. #!/usr/bin/perl @input = `cat ~/ip.txt`; foreach $line (@input){ if($line =~ /[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}\.[[:digit:]]\{1,3\}/){ $ip = $line; print $ip; } } Any ideas? It's breaking my head. Perl doesn't require the braces to be escaped. As it is the regex is matching literal braces in the string which don't exist. Try this: if ($line =~ /[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}/) { : } and, by the way, [0-9] is more concise than [[:digit:]]. HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Using regular expressions with delimitaters
Hi The 8.1.8 =~ /[\d $versao \s]/ will always return true because the square parenthesis ([]) matches the string against one of the chars inside. In this case the \d (digit) matches because you have a digit inside. In your code you wrote 8.1.8 =~ /$version/. This takes the $version a treat it as a regular expression. I don't think that this is what you want. You actually want something like $version =~ /8\.1\.8/. Yaron Kahanovitch - Original Message - From: Rodrigo Tavares [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, April 11, 2007 4:30:58 PM (GMT+0200) Auto-Detected Subject: Using regular expressions with delimitaters Hello, I need to use the delimiter , (one blank space). I read perdoc, i try to use this : if ( 8.1.8 =~ /[\d $versao \s]/) But the expression is always true. Where is the error ? my code : #!/usr/bin/perl $version=`/usr/local/pgsql/bin/pg_ctl --version`; print $version; if ( 8.1.8 =~ /$version/) { print $version\n; } else { print Wrong version !\n; } Output, about program: pg_ctl (PostgreSQL) 8.1.8 Wrong version Best regards, Rodrigo Faria __ Fale com seus amigos de graça com o novo Yahoo! Messenger http://br.messenger.yahoo.com/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Using regular expressions with delimitaters
Hello, I need to use the delimiter , (one blank space). I read perdoc, i try to use this : if ( 8.1.8 =~ /[\d $versao \s]/) But the expression is always true. Where is the error ? my code : #!/usr/bin/perl $version=`/usr/local/pgsql/bin/pg_ctl --version`; print $version; if ( 8.1.8 =~ /$version/) { print $version\n; } else { print Wrong version !\n; } Output, about program: pg_ctl (PostgreSQL) 8.1.8 Wrong version Best regards, Rodrigo Faria __ Fale com seus amigos de graça com o novo Yahoo! Messenger http://br.messenger.yahoo.com/ -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
RE: Using regular expressions with delimitaters
From: Rodrigo Tavares [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 11, 2007 9:31 AM To: beginners@perl.org Subject: Using regular expressions with delimitaters Hello, I need to use the delimiter , (one blank space). I read perdoc, i try to use this : if ( 8.1.8 =~ /[\d $versao \s]/) But the expression is always true. Where is the error ? my code : #!/usr/bin/perl $version=`/usr/local/pgsql/bin/pg_ctl --version`; print $version; if ( 8.1.8 =~ /$version/) { print $version\n; } else { print Wrong version !\n; } Output, about program: pg_ctl (PostgreSQL) 8.1.8 Wrong version Best regards, Rodrigo Faria [] Maybe you are making this too hard... perl -e '$date=`date`; print Is Apr\n if $date =~ /Apr/;' As an example... Hope this helps... [] jwm -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
Re: Using regular expressions with delimitaters
On 4/11/07, Rodrigo Tavares [EMAIL PROTECTED] wrote: snip if ( 8.1.8 =~ /$version/) snip You are using the operators incorrectly. It should look like this: if ($version =~ /8\.1\.8/) The form is variable binding_operator regex. Note that the periods need to be escaped otherwise they will be interpreted as any-character by the regex. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/
grouppin in the regular expressions
Hi nice people, how to specify using regular expressions: match everything but string (xxx) i would do this : $line =~ /[^(xxx)]+/; but, as it was mentioned before () inside character class is not working. what is solution here? thank you! ~i
RE: grouppin in the regular expressions
use !~ vs =~ which is if not so if ( $line !~ /\(xxx\)/ ) { # does not contain (xxx) }else { # does contain } If you have any problems or questions, please let me know. Thanks. Wags ;) David R Wagner Senior Programmer Analyst FedEx Freight 1.408.323.4225x2224 TEL 1.408.323.4449 FAX http://fedex.com/us -Original Message- From: I.B. [mailto:[EMAIL PROTECTED] Sent: Friday, October 13, 2006 12:03 To: beginners@perl.org Subject: grouppin in the regular expressions Hi nice people, how to specify using regular expressions: match everything but string (xxx) i would do this : $line =~ /[^(xxx)]+/; but, as it was mentioned before () inside character class is not working. what is solution here? thank you! ~i ** This message contains information that is confidential and proprietary to FedEx Freight or its affiliates. It is intended only for the recipient named and for the express purpose(s) described therein. Any other use is prohibited. ** -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: grouppin in the regular expressions
I.B. wrote: Hi nice people, Hello, how to specify using regular expressions: match everything but string (xxx) i would do this : $line =~ /[^(xxx)]+/; but, as it was mentioned before () inside character class is not working. what is solution here? Perhaps you want: $line !~ /xxx/; John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: grouppin in the regular expressions
sorry, I didn't fraze my question correctly. example : $line=abcxabcxxabcxxxabc; how to match everything beofre xxx but not xxx itself? the answer i got is to use lookaheads: my $line = abcxxabcxxxabc; if ($line =~ m{(.*?(?:(?!xxx).))xxx}){ print matched: $1\n; } else{ print failed\n; } very cool, thanx everyone ~i On 10/13/06, John W. Krahn [EMAIL PROTECTED] wrote: I.B. wrote: Hi nice people, Hello, how to specify using regular expressions: match everything but string (xxx) i would do this : $line =~ /[^(xxx)]+/; but, as it was mentioned before () inside character class is not working. what is solution here? Perhaps you want: $line !~ /xxx/; John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: grouppin in the regular expressions
I.B. wrote: sorry, I didn't fraze my question correctly. ^ phrase example : $line=abcxabcxxabcxxxabc; how to match everything beofre xxx but not xxx itself? the answer i got is to use lookaheads: my $line = abcxxabcxxxabc; if ($line =~ m{(.*?(?:(?!xxx).))xxx}){ print matched: $1\n; } else{ print failed\n; } Your expression is too complicated: if ( $line =~ /(.*?)xxx/ ) { would accomplish the same thing. $ perl -le'$_ = abcxabcxxabcxxxabc; print $1 if /(.*?(?:(?!xxx).))xxx/' abcxabcxxabc $ perl -le'$_ = abcxabcxxabcxxxabc; print $1 if /(.*?)xxx/' abcxabcxxabc John -- Perl isn't a toolbox, but a small machine shop where you can special-order certain sorts of tools at low cost and in short order. -- Larry Wall -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Regular expressions
HI Sombody help me if i give ([a-z]+)(.*)([a-z]+) as input string output i get is $1 is 'silly' $2 is 'silly' $3 is 'silly' this is wrong according to be book i refer please somone clarify me code i used is as below use strict; use warnings; $_ = '1: A silly sentence (495,a) *BUT* one which will be useful. (3)'; print Enter a regular expression:; my $pattern = STDIN; chomp($pattern); if(/$pattern/){ print The text matches the pattern '$pattern'.\n; print \$1 is '$1'\n if defined $1; print \$2 is '$1'\n if defined $2; print \$3 is '$1'\n if defined $3; print \$4 is '$1'\n if defined $4; print \$5 is '$1'\n if defined $5; }else{ print '$pattern' was not found.\n; }
Re: Regular expressions
Sombody help me if i give ([a-z]+)(.*)([a-z]+) as input string output i get is $1 is 'silly' $2 is 'silly' $3 is 'silly' this is wrong according to be book i refer please somone clarify me code i used is as below This is correct. first word that matches ([a-z]+) is 'silly'. print \$1 is '$1'\n if defined $1; print \$2 is '$1'\n if defined $2; print \$3 is '$1'\n if defined $3; print \$4 is '$1'\n if defined $4; print \$5 is '$1'\n if defined $5; maybe you ment somthing like this: print \$1 is '$1'\n if defined $1; print \$2 is '$2'\n if defined $2; print \$3 is '$3'\n if defined $3; print \$4 is '$4'\n if defined $4; print \$5 is '$5'\n if defined $5; smime.p7s Description: S/MIME Cryptographic Signature
regular expressions
In perldoc under this topic s is listed as Treat string as a single line and m as Treat string as multiples lines. If I have text that has varying spaces at the begging of each line, and I use $string =~ s/^\s+//; It will remove the spaces from in from of the first line but not any other lines. That is clear to me. However, it does not clear all of the leading spaces from all of the lines if I use $string =~ m/^\s+//; In fact I'm getting error message compile error. What am I missing here? Thanks, Bruce Bowen
Re: regular expressions
On Apr 21, 2006, at 16:10, Bowen, Bruce wrote: In perldoc under this topic s is listed as Treat string as a single line and m as Treat string as multiples lines. If I have text that has varying spaces at the begging of each line, and I use $string =~ s/^\s+//; It will remove the spaces from in from of the first line but not any other lines. That is clear to me. However, it does not clear all of the leading spaces from all of the lines if I use $string =~ m/^\s+//; Modifiers go to the end: $string =~ s/^\s+//m; -- fxn -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: regular expressions
Bowen, Bruce wrote: In perldoc under this topic s is listed as Treat string as a single line and m as Treat string as multiples lines. If I have text that has varying spaces at the begging of each line, and I use $string =~ s/^\s+//; It will remove the spaces from in from of the first line but not any other lines. That is clear to me. However, it does not clear all of the leading spaces from all of the lines if I use $string =~ m/^\s+//; In fact I'm getting error message compile error. What am I missing here? perldoc perlop [snip] m/PATTERN/cgimosx ^^ ^ [snip] s/PATTERN/REPLACEMENT/egimosx ^^ ^ The /s option affects the behaviour of the . meta-character. The /m option affects the behaviour of the ^ and $ meta-characters. Assuming you have the string: my $string = one\n two\n three\nfour\n five\n; $string =~ s/.+//; Will produce the string: \n two\n three\nfour\n five\n And: $string =~ s/.+//g; Will produce the string: \n\n\n\n\n While: $string =~ s/.+//s; Will produce the string: $string =~ s/^\s+//; Will produce the string: one\n two\n three\nfour\n five\n (It isn't modified.) While: $string =~ s/^\s+//m; Will produce the string: one\ntwo\n three\nfour\n five\n (Only the first match is changed.) And: $string =~ s/^\s+//mg; Will produce the string: one\ntwo\nthree\nfour\nfive\n John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response