Re: regex matching statements
Hi, Yes, they are the same. I like to use $_ only when the data comes in $_ naturally. Like in a for loop: for (qw< abc >) { if ( !/\w+\d+/ ) { print "not matched"; } } Otherwise, I have to write $_, then I prefer to name the variable something descriptive instead. Makes the code more readable and maintainable down the road. -L On Wednesday, June 19th, 2024 at 3:55 AM, Jeff Peng via beginners wrote: > Hello list, > > are these statements the same in perl? > > $ perl -le '$_="abc";if (!/\w+\d+/){print "not matched"}' > not matched > > $ perl -le '$_="abc";if ($_ !~ /\w+\d+/){print "not matched"}' > not matched > > or which is the better one? > > Thanks. > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex
Mike: > I stand properly scolded. I didn't want to scold anyone, it seems I expressed myself wrong. Sorry for that. Regards, /Karl Hammar -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex
I stand properly scolded. Mike On 1/23/24 07:01, k...@aspodata.se wrote: Please stop using my mail address when replying, I'm on the list and don't want two copies of the same mail (it's not about you Mike). -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex
Please stop using my mail address when replying, I'm on the list and don't want two copies of the same mail (it's not about you Mike). Mike > Why is my Perl not working on that command? > > $ perl -e 'exit(10) if "aaa"=~/a{,2}/;' > Unescaped left brace in regex is illegal here in regex; marked by <-- > HERE in m/a{ <-- HERE ,2}/ at -e line 1. > $ > > But this works: > $ perl -e 'exit(10) if "aaa"=~/a{0,2}/;' > $ > > $ echo $? > 10 > $ On an old debian woody box I get: $ perl -v | grep v5 This is perl, v5.6.1 built for i386-linux $ perl -e 'exit(10) if "aaa"=~/a{,2}/;'; echo $? 0 $ perl -e 'exit(10) if "aaa"=~/a{0,2}/;'; echo $? 10 $ man perlre ... The following standard quantifiers are recognized: * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n}Match exactly n times {n,} Match at least n times {n,m} Match at least n but not more than m times ... So, old perl versions don't have the {,m} quantifier, check your documentation for that. The easy way out is to always use {0,m} instead of {,m}, which is the same thing in modern perl, actually there is no need ever to use the {,m} quantifier. I don't know why I don't get a perl error message above, maybe a bug. /// On a more uptodate system I get: $ perl -v | grep v5 This is perl 5, version 34, subversion 1 (v5.34.1) built for x86_64-linux-thread-multi $ perl -e 'exit(10) if "aaa"=~/a{,2}/;'; echo $? 10 $ perl -e 'exit(10) if "aaa"=~/a{0,2}/;'; echo $? 10 /// If you are interested of the syntax rules, check under "Simple statements" in: (perl 5.6.1) $ man perlsyn Any simple statement may optionally be followed by a SIN- GLE modifier, just before the terminating semicolon (or block ending). The possible modifiers are: if EXPR unless EXPR while EXPR until EXPR foreach EXPR ... (perl 5.34.1) $ man perlsyn ... Statement Modifiers Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating semicolon (or block ending). The possible modifiers are: if EXPR unless EXPR while EXPR until EXPR for LIST foreach LIST when EXPR ... So, modern perl also have "for" and "when". /// Also note that in a compound statement you have to ()'ize the EXPR as in if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK in contrast to for the modifier you don't need to: STATEMENT if EXPR; I prefer to always to use ()' around the expression, since it makes it easier to convert between the two forms. Regards, /Karl Hammar -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex
Why is my Perl not working on that command? $ perl -e 'exit(10) if "aaa"=~/a{,2}/;' Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in m/a{ <-- HERE ,2}/ at -e line 1. $ But this works: $ perl -e 'exit(10) if "aaa"=~/a{0,2}/;' $ $ echo $? 10 $ It sure surprised me that the first one did not work for me. Do I need to upgrade my Perl? $ perl -v This is perl 5, version 30, subversion 0 (v5.30.0) built for x86_64-linux (with 1 registered patch, see perl -V for more detail) snip $ I just went through my Perl documentation and none of it allows {,2}. Learning Perl Second Edition (July 1997) says: "If you leave off the second number, as in /x{5,}/, it means "that many or more" (five or more in this case), and if you leave off the comma, as in /x{5}/, it means "exactly this many" (five x's). To get five or less x's, you must put the zero in, as in /x{0,5}/." Mike On 1/22/24 06:23, Jorge Almeida wrote: Please help me to understand this: $ perl -e 'exit(10) if "aaa"=~/a{,2}/;' $ echo $? $ 10 Thanks Jorge Almeida
Re: regex
Hi, Sometimes the large path is the shortest one. Go through the tutorial in Perl for regular expressions and you will solve your questions and you will learn a lot. About regular expressions are two points of view. First one says that you must learn and use it. The other point of is: if you have a problem and you say I will solve it with regular expressions then you have two problems. Ánimos! Saludos From: Claude Brown via beginners Sent: Monday, January 22, 2024 10:49:50 PM To: k...@aspodata.se ; beginners@perl.org Subject: RE: regex Jorge, Expanding on Karl's answer (and somewhat labouring his point) consider these examples: $a =~ /Jorge/ $a =~ /^Jorge/ $a =~ /Jorge$/ $a =~ /^Jorge$/ This shows that regex providing four different capabilities: - detect "Jorge" anywhere in the string - detect "Jorge" at the start of a string (by adding ^) - detect "Jorge" at the end of a string (by adding $) - detect that the string is exactly "Jorge" (both ^ and $) Replace "Jorge" with your pattern, and the result is the same. Cheers, Claude. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
RE: regex
Jorge, Expanding on Karl's answer (and somewhat labouring his point) consider these examples: $a =~ /Jorge/ $a =~ /^Jorge/ $a =~ /Jorge$/ $a =~ /^Jorge$/ This shows that regex providing four different capabilities: - detect "Jorge" anywhere in the string - detect "Jorge" at the start of a string (by adding ^) - detect "Jorge" at the end of a string (by adding $) - detect that the string is exactly "Jorge" (both ^ and $) Replace "Jorge" with your pattern, and the result is the same. Cheers, Claude. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex
I agree that this is confusing, and I think many resources describing regex in unhelpful ways is partly to blame. descriptions like "pattern that matches against a string" and similar. this implies that a regex has to match the string, but this is not the case. a regex does not have to match the string, instead the string has to satisfy the regex. "aaa" satisfies /a{,2}/ because it contains everything the regex requires. thinking of regex in this way has been a help to me atleast 😊 -L Original Message On 22. jan. 2024, 13:23, Jorge Almeida wrote: > Please help me to understand this: > $ perl -e 'exit(10) if "aaa"=~/a{,2}/;' > $ echo $? > $ 10 > > Thanks > > Jorge Almeida
Re: regex
Yes, the {}l RE modifier has the canonical form {a,b} where a and b are numbers and so that modifies the char before it to match from a to b times, e,g A{1,3} matches one, two or three As. If you leave out the first number, zero is presumed. Hmm, perl 5.30 % perl -E 's ay(10) if "aaa"=~/a{,2}/;' Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in m/a{ <-- HERE ,2}/ at -e line 1. and % perldoc perlre says Quantifiers Quantifiers are used when a particular portion of a pattern needs to match a certain number (or numbers) of times. If there isn't a quantifier the number of times to match is exactly one. The following standard quantifiers are recognized: * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,}Match at least n times {n,m} Match at least n but not more than m times (If a non-escaped curly bracket occurs in a context other than one of the quantifiers listed above, where it does not form part of a backslashed sequence like "\x{...}", it is either a fatal syntax error, or treated as a regular character, generally with a deprecation warning raised. To escape it, you can precede it with a backslash ("\{") or enclose it within square brackets ("[{]"). This change will allow for future syntax extensions (like making the lower bound of a quantifier optional), and better error checking of quantifiers). On Mon, Jan 22, 2024 at 6:59 AM wrote: > Jorge Almeida: > > Please help me to understand this: > > $ perl -e 'exit(10) if "aaa"=~/a{,2}/;' > > $ echo $? > > $ 10 > > In man perlre, under "Regular Expressions" it says: > > {,n}Match at most n times > > So /a{,2}/ matches "", "a", and "aa" and is ignorant about what > comes before and after (basically). That "aa" is followed by a > "a" isn't something the expression prohibits. If you want that > try /^a{,2}$/ instead. > > Regards, > /Karl Hammar > > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: regex
Jorge Almeida: > On Mon, 22 Jan 2024 at 13:00, wrote: > > Jorge Almeida: > > > $ perl -e 'exit(10) if "aaa"=~/a{,2}/;' ... > > {,n}Match at most n times ... > Yes, I read it (several times). I still don't understand it (I understand > what you're saying, and I trust you're right, I just don't understand how > this behaviour matches the description above--- "at most", really?) Just think it like this: on the table there is three diamonds, can you find zero, one, or preferable two diamonds there ? ... > Now, in > perl -e 'print $1,"\n" if "aaa"=~/(a{,2})/;' > $ aa > this is understandable. More or less. Maybe the semantics of /a{,2}/ should > be described as "match any number of consecutive 'a' whatsoever and capture > at most 2 'a' characters... No, it just looks at the first two a's and finds a match, there is still one "a" left, but who cares, you have already got your match. Regards, /Karl Hammar -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex
Jorge Almeida: > Please help me to understand this: > $ perl -e 'exit(10) if "aaa"=~/a{,2}/;' > $ echo $? > $ 10 In man perlre, under "Regular Expressions" it says: {,n}Match at most n times So /a{,2}/ matches "", "a", and "aa" and is ignorant about what comes before and after (basically). That "aa" is followed by a "a" isn't something the expression prohibits. If you want that try /^a{,2}$/ instead. Regards, /Karl Hammar -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex to detect natural language fragment
Thanks very much. @Chankey Pathak, which of those libraries does you recommend for this task? Best regards, Julius On Tue, Sep 14, 2021 at 2:33 AM Ken Peng wrote: > Or use GPT-3 who has a free online API. > https://openai.com/blog/openai-api/ > > regards > > On Mon, Sep 13, 2021 at 11:42 PM Chankey Pathak > wrote: > >> You can look into NLP https://metacpan.org/search?q=nlp >> >> On Mon, 13 Sept 2021 at 21:04, Julius Hamilton < >> juliushamilton...@gmail.com> wrote: >> >>> Hey, >>> >>> I'm not sure if this is possible, and if it's not, I'll explore a better >>> way to do this. >>> >>> I would like to write a script which analyzes if a line of text is >>> (likely) a broken natural language sentence, i.e., it is probably part of a >>> sentence, even if the start or end is not present, rather than it being a >>> fully "complete" linguistic entity, for example, a header of a section, >>> which does not have a period at the end and is not really a sentence, yet >>> is in a complete and unbroken form. >>> >>> I'm pretty sure in principle this will require some kind of syntax >>> parsing. I think I read somewhere regular expressions for some mathematical >>> reason cannot parse tree / nested structures, for example HTML. >>> >>> Does anyone know what some next most ubiquitous, standard tool is for >>> analyzing nested linguistic structures? Is that an XML parser? >>> >>> Thanks very much, >>> Julius >>> >>
Re: Regex to detect natural language fragment
Or use GPT-3 who has a free online API. https://openai.com/blog/openai-api/ regards On Mon, Sep 13, 2021 at 11:42 PM Chankey Pathak wrote: > You can look into NLP https://metacpan.org/search?q=nlp > > On Mon, 13 Sept 2021 at 21:04, Julius Hamilton < > juliushamilton...@gmail.com> wrote: > >> Hey, >> >> I'm not sure if this is possible, and if it's not, I'll explore a better >> way to do this. >> >> I would like to write a script which analyzes if a line of text is >> (likely) a broken natural language sentence, i.e., it is probably part of a >> sentence, even if the start or end is not present, rather than it being a >> fully "complete" linguistic entity, for example, a header of a section, >> which does not have a period at the end and is not really a sentence, yet >> is in a complete and unbroken form. >> >> I'm pretty sure in principle this will require some kind of syntax >> parsing. I think I read somewhere regular expressions for some mathematical >> reason cannot parse tree / nested structures, for example HTML. >> >> Does anyone know what some next most ubiquitous, standard tool is for >> analyzing nested linguistic structures? Is that an XML parser? >> >> Thanks very much, >> Julius >> >
Re: Regex to detect natural language fragment
You can look into NLP https://metacpan.org/search?q=nlp On Mon, 13 Sept 2021 at 21:04, Julius Hamilton wrote: > Hey, > > I'm not sure if this is possible, and if it's not, I'll explore a better > way to do this. > > I would like to write a script which analyzes if a line of text is > (likely) a broken natural language sentence, i.e., it is probably part of a > sentence, even if the start or end is not present, rather than it being a > fully "complete" linguistic entity, for example, a header of a section, > which does not have a period at the end and is not really a sentence, yet > is in a complete and unbroken form. > > I'm pretty sure in principle this will require some kind of syntax > parsing. I think I read somewhere regular expressions for some mathematical > reason cannot parse tree / nested structures, for example HTML. > > Does anyone know what some next most ubiquitous, standard tool is for > analyzing nested linguistic structures? Is that an XML parser? > > Thanks very much, > Julius >
Re: regex help - only one value returned
In your original example: print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi); the interior parentheses in example one terminates the alternation, so the last string is ’sir’. In example two, the alternation is not terminated until the first ‘)', so the last string is ’sir .{5,}?’. followed in the regular expression by the “\n” character. Since in $T ‘miss’ is not followed by an \n, the match fails. Vlado has explained how to group and terminate the alternation without capturing the match result. > On Dec 2, 2020, at 6:08 AM, Gary Stainburn > wrote: > > On 02/12/2020 13:56, Vlado Keselj wrote: >> Well, it seems that the first one is what you want, but you just need to >> use $1 and ignore $2. >> >> You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not >> want for them to be captured in $2, you can use: >> '(?:mr|mrs|miss|dr|prof|sir)'. For example: >> >> print "match3='$1' '$2'\n" if >> ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); >> >> would give output: >> >> match3='Miss Jayne Doe' '' > Perfect, thank you. > > I can't ignore $2 as it's in a loop with other regex that genuinely returns > multiple matches. The amendment to the REGEX worked perfectly. It is always best to save the results of a match with capturing in another variable. The capturing variables $1, $2, etc. are not reassigned if a match fails, so if you use them after a failed match, they will be the values left over from a previous match. So do this: my $salutation = $1; my $name = $2; If you don’t want a possible undefined value, so this instead: my $name = $2 || ''; Jim Gibson j...@gibson.org -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help - only one value returned
On 02/12/2020 13:56, Vlado Keselj wrote: Well, it seems that the first one is what you want, but you just need to use $1 and ignore $2. You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not want for them to be captured in $2, you can use: '(?:mr|mrs|miss|dr|prof|sir)'. For example: print "match3='$1' '$2'\n" if ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); would give output: match3='Miss Jayne Doe' '' Perfect, thank you. I can't ignore $2 as it's in a loop with other regex that genuinely returns multiple matches. The amendment to the REGEX worked perfectly. Gary -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex help - only one value returned
Well, it seems that the first one is what you want, but you just need to use $1 and ignore $2. You do need parentheses in '(mr|mrs|miss|dr|prof|sir)' but if you do not want for them to be captured in $2, you can use: '(?:mr|mrs|miss|dr|prof|sir)'. For example: print "match3='$1' '$2'\n" if ($T=~/^((?:mr|mrs|miss|dr|prof|sir) .{5,}?)\n/smi); would give output: match3='Miss Jayne Doe' '' On Wed, 2 Dec 2020, Gary Stainburn wrote: > I have an array of regex expressions that I apply to text returned from > tesseract. > > Each match that I get then gets stored for future processing. However, I'm > struggling with one regex. > > The problem is that: > > 1) with brackets round the titles it returns two matches. > 2) without brackets, it returns nothing. > > Can anyone point me at the correct syntax please. > > Gary > > [root@dev dev]# ./t > match1='Miss Jayne Doe' 'Miss' > [root@dev dev]# cat t > #!/usr/bin/perl > > use strict; > use warnings; > > my $T=< Customer name and address > Miss Jayne Doe > 19 Their Street > Somewehere > In Yorkshire > IN1 3YY > EOF > > print "match1='$1' '$2'\n" if ($T=~/^((mr|mrs|miss|dr|prof|sir) > .{5,}?)\n/smi); > print "match2='$1' '$2'\n" if ($T=~/^(mr|mrs|miss|dr|prof|sir .{5,}?)\n/smi); > [root@dev dev]# > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex for date
"Asad" wrote in message news:cag3lskh4dphjg18c-jxmo8bcqfd+vix5tep1ytsp4_6pd6z...@mail.gmail.com... Hi All , I need a regex to match the date : Sat Aug 25 08:41:03 2018 and covert into a format :'%m/%d/%Y %H:%M:%S' Thanks, -- Asad Hasan +91 9582111698 Hello Asad, You could use Time::Piece to do this. (Although given a choice, I would use ‘%Y/%m/%d %H:%M:%S’ which sorts naturally in a sorting situation) #!/usr/bin/perl use strict; use warnings; use Time::Piece; my $d = 'Sat Aug 25 08:41:03 2018'; my $dt = Time::Piece->strptime($d, '%a %b %d %H:%M:%S %Y'); say $dt->strftime('%m/%d/%Y %H:%M:%S');
Re: Regex for date
Many Perl modules have been written to parse and manipulate dates and times. Some come with Perl; others are available at www.cpan.org. Check out the Date::Manip, Date::Parse, or DateTime modules. > On Aug 25, 2018, at 4:06 AM, Asad wrote: > > Hi All , > > I need a regex to match the date : Sat Aug 25 08:41:03 2018 and > covert into a format : '%m/%d/%Y %H:%M:%S' > > Thanks, > > -- > Asad Hasan > +91 9582111698 Jim Gibson -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex for date
Thanks, I'll check them out. On Sat, Aug 25, 2018 at 4:53 PM Home Linux Info wrote: > > Hello, > > Maybe not the most beautiful regex out there, hey I'm a noob, but it does > the job right: > ([A-Z][a-z]{2}\s)|([0-9]{2}\s[0-2][0-9](:[0-5][0-9]){2}\s[0-9]{4}) > You can start from here and find a nicer form of this regex. > > On 8/25/18 2:06 PM, Asad wrote: > > Hi All , > > I need a regex to match the date : Sat Aug 25 08:41:03 2018 and > covert into a format : '%m/%d/%Y %H:%M:%S' > > Thanks, > > -- > Asad Hasan > +91 9582111698 > > > -- Asad Hasan +91 9582111698
Re: Regex for date
Hello, Maybe not the most beautiful regex out there, hey I'm a noob, but it does the job right: ([A-Z][a-z]{2}\s)|([0-9]{2}\s[0-2][0-9](:[0-5][0-9]){2}\s[0-9]{4}) You can start from here and find a nicer form of this regex. On 8/25/18 2:06 PM, Asad wrote: Hi All , I need a regex to match the date : Sat Aug 25 08:41:03 2018 and covert into a format : '%m/%d/%Y %H:%M:%S' Thanks, -- Asad Hasan +91 9582111698
Re: Regex for date
Really, no attempt to do it yourself? Mike On 8/25/2018 6:06 AM, beginners-digest-h...@perl.org wrote: Hi All , I need a regex to match the date : Sat Aug 25 08:41:03 2018 and covert into a format : '%m/%d/%Y %H:%M:%S' Thanks, -- Asad Hasan
Re: regex to get the rpm name version
You can put your separators in there as literals to keep them out of captures: $ cat /tmp/ver.pl #!perl while () { if ( /([\w+-]{3,})-([.\d-]+)\./ ) { print "$1 - $2\n"; } print "$_\n"; } __END__ binutils-2.23.52.0.1-12.el7.x86_64 compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33-3.2.3-71.el7.i686 $ perl /tmp/ver.pl binutils - 2.23.52.0.1-12 binutils-2.23.52.0.1-12.el7.x86_64 compat-libcap1 - 1.10-3 compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33 - 3.2.3-71 compat-libstdc++-33-3.2.3-71.el7.i686 But you may want to look at the options for rpm listing. There are many and they can specifically list the package version - you can create your own format for the listings. man rpm --qf|--queryformat QUERYFMT option, followed by the QUERYFMT format string. Query formats are modified versions of the standard printf(3) formatting. The format is made up of static strings (which may include standard C character escapes for newlines, tabs, and other special characters) and printf(3) type formatters. As rpm already knows the type to print, the type specifier must be omitted however, and replaced by the name of the header tag to be printed, enclosed by {} characters. Tag names are case insensitive, and the leading RPMTAG_ portion of the tag name may be omitted as well. On Thu, Aug 9, 2018 at 4:32 PM, Home Linux Info wrote: > > Hello, > > You can begin with "*[a-zA-Z_+-]{3,}[0-9]*" to get the package name, it > needs a little more work for right now it gets the last dash and first > digit of package version. Then you can try "*([^a-zA-Z_+-]{3,})(.\d{1,})* > ". > The first regex gives the following result: > *binutils-2* > *compat-libcap1* > *compat-libstdc++-3* > Which is almost what you need while the second one is more exact as it > gives you: > *2.23.52.0.1-12* > *1.10-3* > *3.2.3-71* > And that looks like exactly what you need. > > I'm no expert in regex but I like to experiment with it to see if I can > extract some parts from a text / string using it. > > Jimmy (bash, perl and python total noob but trying to learn stuff). > > On 27.07.2018 15:54, Asad wrote: > > Hi All , > > I want to get a regex to actually get the rpm name and version > for comparison : > > > binutils-2.23.52.0.1-12.el7.x86_64", > compat-libcap1-1.10-3.el7.x86_64" > compat-libstdc++-33-3.2.3-71.el7.i686 > > (^[a-zA-Z0-9\-]*)\-\d' > > First part of the regular expression is ^[a-zA-Z0-9\-] > > which means search for anything that begins with a letter > > (lower or upper) or a number up until you reach an > > hyphen sign (‘-‘). > > but it fails to match > > compat-libstdc++-33-3.2.3-71.el7.i686 > > Please let me know what regex should i use to extract all 3 > > rpms. > > Also let me know if there are web tools to build regex > > Good websites for regex tutorials. > > Thanks, > > > > -- > Asad Hasan > +91 9582111698 > > > -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: regex to get the rpm name version
Hello, You can begin with "*[a-zA-Z_+-]{3,}[0-9]*" to get the package name, it needs a little more work for right now it gets the last dash and first digit of package version. Then you can try "*([^a-zA-Z_+-]{3,})(.\d{1,})*". The first regex gives the following result: /binutils-2// //compat-libcap1// //compat-libstdc++-3// /Which is almost what you need while the second one is more exact as it gives you: /2.23.52.0.1-12// //1.10-3// //3.2.3-71// /And that looks like exactly what you need. I'm no expert in regex but I like to experiment with it to see if I can extract some parts from a text / string using it. Jimmy (bash, perl and python total noob but trying to learn stuff). On 27.07.2018 15:54, Asad wrote: Hi All , I want to get a regex to actually get the rpm name and version for comparison : binutils-2.23.52.0.1-12.el7.x86_64", compat-libcap1-1.10-3.el7.x86_64" compat-libstdc++-33-3.2.3-71.el7.i686 (^[a-zA-Z0-9\-]*)\-\d' First part of the regular expression is ^[a-zA-Z0-9\-] which means search for anything that begins with a letter (lower or upper) or a number up until you reach an hyphen sign (‘-‘). but it fails to match compat-libstdc++-33-3.2.3-71.el7.i686 Please let me know what regex should i use to extract all 3 rpms. Also let me know if there are web tools to build regex Good websites for regex tutorials. Thanks, -- Asad Hasan +91 9582111698
Re: regex to get the rpm name version
Hi Asad, On Fri, 27 Jul 2018 18:24:39 +0530 Asad wrote: > Hi All , > > I want to get a regex to actually get the rpm name and version for > comparison : > > > binutils-2.23.52.0.1-12.el7.x86_64", > compat-libcap1-1.10-3.el7.x86_64" > compat-libstdc++-33-3.2.3-71.el7.i686 > > (^[a-zA-Z0-9\-]*)\-\d' > > First part of the regular expression is ^[a-zA-Z0-9\-] > > which means search for anything that begins with a letter > > (lower or upper) or a number up until you reach an > > hyphen sign (‘-‘). > > but it fails to match > > compat-libstdc++-33-3.2.3-71.el7.i686 > > Please let me know what regex should i use to extract all 3 > > rpms. > > Also let me know if there are web tools to build regex > > Good websites for regex tutorials. > for that, see: * http://perl-begin.org/topics/regular-expressions/ * https://github.com/aloisdg/awesome-regex * https://www.regular-expressions.info/ > > > > Thanks, > > > > > -- - Shlomi Fish http://www.shlomifish.org/ https://youtu.be/GoEn1YfYTBM - Tiffany Alvord - “Fall Together” C++ is complex, complexifying and complexified. (With apologies to the Oxford English Dictionary). — http://www.shlomifish.org/humour.html Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex to get the rpm name version
But if you have to use a regex, I suggest using the /x modifier to make it easier to read an maintain the regex: #!/usr/bin/perl use strict; use warnings; for my $s (qw/binutils-2.23.52.0.1-12.el7.x86_64 compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33-3.2.3-71.el7.i686/) { my ($name, $version, $build) = $s =~ m{ ^ (.*) # name - (.*) # version - ([0-9]+) # build [.] [^.]+ # os [.] [^.]+ \z # architecture }x; print "n $name v $version b $build\n"; } On Fri, Jul 27, 2018 at 9:14 AM Chas. Owens wrote: > I don't think a regex is the simplest and most maintainable way to get > this information. I think it is probably better to take advantage of the > structure of the string to discard and find information: > > #!/usr/bin/perl > > use strict; > use warnings; > > for my $s (qw/binutils-2.23.52.0.1-12.el7.x86_64 > compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33-3.2.3-71.el7.i686/) { > my @dots = split /\,/, $s; > pop @dots; #get rid of architecture > pop @dots; #get rid of os > my $name_and_version = join "", @dots; > my @dashes = split /-/, $s; > my $build = pop @dashes; > my $version = pop @dashes; > my $name = join "-", @dashes; > print "n $name v $version b $build\n"; > } > > > > On Fri, Jul 27, 2018 at 8:57 AM Asad wrote: > >> Hi All , >> >> I want to get a regex to actually get the rpm name and version >> for comparison : >> >> >> binutils-2.23.52.0.1-12.el7.x86_64", >> compat-libcap1-1.10-3.el7.x86_64" >> compat-libstdc++-33-3.2.3-71.el7.i686 >> >> (^[a-zA-Z0-9\-]*)\-\d' >> >> First part of the regular expression is ^[a-zA-Z0-9\-] >> >> which means search for anything that begins with a letter >> >> (lower or upper) or a number up until you reach an >> >> hyphen sign (‘-‘). >> >> but it fails to match >> >> compat-libstdc++-33-3.2.3-71.el7.i686 >> >> Please let me know what regex should i use to extract all 3 >> >> rpms. >> >> Also let me know if there are web tools to build regex >> >> Good websites for regex tutorials. >> >> >> >> >> Thanks, >> >> >> >> >> >> -- >> Asad Hasan >> +91 9582111698 <+91%2095821%2011698> >> >
RE: regex to get the rpm name version
I would suggest you change your approach and user the query mode of RPM to get your information instead of build up a regexp: rpm -qa --queryformat "%{NAME}\n" Duncs From: Asad [mailto:asad.hasan2...@gmail.com] Sent: 27 July 2018 13:55 To: beginners@perl.org Subject: regex to get the rpm name version Hi All , I want to get a regex to actually get the rpm name and version for comparison : binutils-2.23.52.0.1-12.el7.x86_64", compat-libcap1-1.10-3.el7.x86_64" compat-libstdc++-33-3.2.3-71.el7.i686 (^[a-zA-Z0-9\-]*)\-\d' First part of the regular expression is ^[a-zA-Z0-9\-] which means search for anything that begins with a letter (lower or upper) or a number up until you reach an hyphen sign (‘-‘). but it fails to match compat-libstdc++-33-3.2.3-71.el7.i686 Please let me know what regex should i use to extract all 3 rpms. Also let me know if there are web tools to build regex Good websites for regex tutorials. Thanks, -- Asad Hasan +91 9582111698
Re: regex to get the rpm name version
I don't think a regex is the simplest and most maintainable way to get this information. I think it is probably better to take advantage of the structure of the string to discard and find information: #!/usr/bin/perl use strict; use warnings; for my $s (qw/binutils-2.23.52.0.1-12.el7.x86_64 compat-libcap1-1.10-3.el7.x86_64 compat-libstdc++-33-3.2.3-71.el7.i686/) { my @dots = split /\,/, $s; pop @dots; #get rid of architecture pop @dots; #get rid of os my $name_and_version = join "", @dots; my @dashes = split /-/, $s; my $build = pop @dashes; my $version = pop @dashes; my $name = join "-", @dashes; print "n $name v $version b $build\n"; } On Fri, Jul 27, 2018 at 8:57 AM Asad wrote: > Hi All , > > I want to get a regex to actually get the rpm name and version > for comparison : > > > binutils-2.23.52.0.1-12.el7.x86_64", > compat-libcap1-1.10-3.el7.x86_64" > compat-libstdc++-33-3.2.3-71.el7.i686 > > (^[a-zA-Z0-9\-]*)\-\d' > > First part of the regular expression is ^[a-zA-Z0-9\-] > > which means search for anything that begins with a letter > > (lower or upper) or a number up until you reach an > > hyphen sign (‘-‘). > > but it fails to match > > compat-libstdc++-33-3.2.3-71.el7.i686 > > Please let me know what regex should i use to extract all 3 > > rpms. > > Also let me know if there are web tools to build regex > > Good websites for regex tutorials. > > > > > Thanks, > > > > > > -- > Asad Hasan > +91 9582111698 <+91%2095821%2011698> >
Re: regex matches Chinese characters
Hi Lauren, On Fri, 27 Jul 2018 11:28:42 +0800 "Lauren C." wrote: > greetings, > > I was doing the log statistics stuff using perl. > There are chinese characters in log items. > I tried with regex to match them, but got no luck. > > $ perl -mstrict -le 'my $char="汉语"; print "it is chinese" if $char =~ > /\p{Han}+/' > > $ perl -mstrict -mutf8 -le 'my $char="汉语"; print "it is chinese" if > $char =~ /\p{Han}+/' > > both output nothing. > > My terminal is UTF-8: > According to http://perldoc.perl.org/perlrun.html , you probably need -Mstrict and -Mutf8 instead of the lowercase -m, so "sub import" will get called: shlomif@telaviv1:~$ perl -Mstrict -Mutf8 -le 'my $char="汉语"; print "it is chinese" if $char =~ /\p{Han}+/' it is chinese shlomif@telaviv1:~$ HTH, Shlomi > $ locale > LANG=en_US.UTF-8 > LANGUAGE= > LC_CTYPE="en_US.UTF-8" > LC_NUMERIC="en_US.UTF-8" > LC_TIME="en_US.UTF-8" > LC_COLLATE="en_US.UTF-8" > LC_MONETARY="en_US.UTF-8" > LC_MESSAGES="en_US.UTF-8" > LC_PAPER="en_US.UTF-8" > LC_NAME="en_US.UTF-8" > LC_ADDRESS="en_US.UTF-8" > LC_TELEPHONE="en_US.UTF-8" > LC_MEASUREMENT="en_US.UTF-8" > LC_IDENTIFICATION="en_US.UTF-8" > LC_ALL= > > > Can you help? thanks in advance. > -- - Shlomi Fish http://www.shlomifish.org/ https://github.com/sindresorhus/awesome - curated list of lists Cats are smarter than dogs. You can’t get eight cats to pull a sled through snow.— Source unknown, via Nadav Har’El. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex matches Chinese characters
oops that's perfect. thanks Shlomi. On 2018/7/27 星期五 PM 1:26, Shlomi Fish wrote: Hi Lauren, On Fri, 27 Jul 2018 11:28:42 +0800 "Lauren C." wrote: greetings, I was doing the log statistics stuff using perl. There are chinese characters in log items. I tried with regex to match them, but got no luck. $ perl -mstrict -le 'my $char="汉语"; print "it is chinese" if $char =~ /\p{Han}+/' $ perl -mstrict -mutf8 -le 'my $char="汉语"; print "it is chinese" if $char =~ /\p{Han}+/' both output nothing. My terminal is UTF-8: According to http://perldoc.perl.org/perlrun.html , you probably need -Mstrict and -Mutf8 instead of the lowercase -m, so "sub import" will get called: shlomif@telaviv1:~$ perl -Mstrict -Mutf8 -le 'my $char="汉语"; print "it is chinese" if $char =~ /\p{Han}+/' it is chinese shlomif@telaviv1:~$ HTH, Shlomi $ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Can you help? thanks in advance. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex for date format
Worked perfectly thanks, uri, and same technique works perfectly in postgresql regexp_replace for info On 29 June 2018 at 16:18, Mike Martin wrote: > Thanks > > > On Fri, 29 Jun 2018, 15:48 Uri Guttman, wrote: > >> On 06/29/2018 10:41 AM, Mike Martin wrote: >> >> sorry >> -mm-dd hh:mm:ss.dd >> eg: >> 2018-01-01 12-45-10-456789 to >> 2018-01-01 12:45:10.456789 >> >> >> >> please reply to the list and not to me! >> >> then why did you want lookbehind? this is very easy if you just grab the >> time parts and reassemble them as you want. >> >> $stamp =~ s/\s(\d\d)-(\d\d)-(\d\d)-/ $1:$2:$3./ ; >> >> it uses the space to mark where the time part starts. >> >> uri >> >> >>
Re: Regex for date format
Thanks On Fri, 29 Jun 2018, 15:48 Uri Guttman, wrote: > On 06/29/2018 10:41 AM, Mike Martin wrote: > > sorry > -mm-dd hh:mm:ss.dd > eg: > 2018-01-01 12-45-10-456789 to > 2018-01-01 12:45:10.456789 > > > > please reply to the list and not to me! > > then why did you want lookbehind? this is very easy if you just grab the > time parts and reassemble them as you want. > > $stamp =~ s/\s(\d\d)-(\d\d)-(\d\d)-/ $1:$2:$3./ ; > > it uses the space to mark where the time part starts. > > uri > > >
Re: Regex for date format
On 06/29/2018 10:41 AM, Mike Martin wrote: sorry -mm-dd hh:mm:ss.dd eg: 2018-01-01 12-45-10-456789 to 2018-01-01 12:45:10.456789 please reply to the list and not to me! then why did you want lookbehind? this is very easy if you just grab the time parts and reassemble them as you want. $stamp =~ s/\s(\d\d)-(\d\d)-(\d\d)-/ $1:$2:$3./ ; it uses the space to mark where the time part starts. uri
Re: Regex for date format
On 06/29/2018 09:32 AM, Mike Martin wrote: Hi I am trying to convert a string of the format 2018-01-01 16-45-21-654278 to a proper timestamp string so basically I want to replace all - after the date part i am not sure what you are trying to do. show the after text that you want. a proper timestamp string is not specific enough. if you want to really parse that string, then use Time::Piece and its strptime sub which can parse pretty much any time/date string. uri -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex with HEX ascii chars
Try: binmode(HANDLE) before reading the file. HANDLE is your filehandle. If that doesn't work you might want to supply the text file and a sample script. Mike On 4/12/2018 12:04 PM, beginners-digest-h...@perl.org wrote: I have a text file (created by pdftotext) that I've imported into my script. It contains ASCII characters 251 for crosses and 252 for ticks. If I load the file in gvim and do :as it reports the characters as 251, Hex 00fb, Octal 373 252, hex 00fc, Octal 374 However, when I try to seacch for it using if ($line=~/[\xfb|\xfc]/) { or even just if ($line=~/\xfb/) { it always fails. What am I doing wrong? Gary
Re: regex with HEX ascii chars
On Thu, 2018-04-12 at 17:26 +0100, Gary Stainburn wrote: > I have a text file (created by pdftotext) that I've imported into my > script. > > It contains ASCII characters 251 for crosses and 252 for ticks. ASCII defines 128 characters so those characters are not ASCII. John -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex with HEX ascii chars
On Thursday 12 April 2018 19:53:16 Shlomi Fish wrote: > Perhaps see http://perldoc.perl.org/perlunitut.html - you may need to read > the file as binary or iso8859-1 or whatever. Also see Thanks for this Shlomi. I have looked into that before briefly when doing http gets and reading office documents, but this time I didn't think I was going to need this. > https://github.com/shlomif/how-to-share-code-online and read what Andy > noted. I thought the problem with my concepts rather than the program itself. The following code shows that I was wrong. #!/usr/bin/perl use strict; use warnings; my $line="A û ü û"; my @arr=($line=~/(\xc3.)/g); my $tick="\xc3\xbc"; my $cross="\xc3\xbb"; foreach my $c (split //,$line) { printf "%s = %X %d\n",$c,ord($c),ord($c); } if ($line=~/\xc3\xbb/) { print "true\n";} foreach my $a (@arr) { print "start\n"; if ($a eq $tick) { print "tick\n";} if ($a eq $cross) { print "cross\n";} } [root@lou inet]# ./t1 A = 41 65 = 20 32 � = C3 195 � = BB 187 = 20 32 � = C3 195 � = BC 188 = 20 32 = 20 32 � = C3 195 � = BB 187 true start cross start tick start cross [root@lou inet]# When I went back to gvim I noticed that it started showing two column values as as go past these fields, which should have given me a clue. My production code now includes the following working code: my $tick="\xc3\xbc"; my $cross="\xc3\xbb"; my @ticks=($line=~/(\xc3.)/g); if (scalar(@ticks) == 5) { if ($ticks[0] eq $tick) {$job{sj_mot}='true';} if ($ticks[1] eq $tick) {$wuw='true'; $job{sj_wait}=20;} if ($ticks[2] eq $tick) {$job{sj_c_car}='true';} # 3 = advisor which we don't use if ($ticks[4] eq $tick) {$job{sj_wait}=30;} } else { debugprint(1,"incorrect tick/cross count returned"); } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex with HEX ascii chars
On Thu, 12 Apr 2018 17:26:57 +0100 Gary Stainburn wrote: > I have a text file (created by pdftotext) that I've imported into my script. > > It contains ASCII characters 251 for crosses and 252 for ticks. If I load > the file in gvim and do :as > > it reports the characters as > > 251, Hex 00fb, Octal 373 > 252, hex 00fc, Octal 374 > > However, when I try to seacch for it using > > if ($line=~/[\xfb|\xfc]/) { > > or even just > > if ($line=~/\xfb/) { > > it always fails. What am I doing wrong? > Perhaps see http://perldoc.perl.org/perlunitut.html - you may need to read the file as binary or iso8859-1 or whatever. Also see https://github.com/shlomif/how-to-share-code-online and read what Andy noted. > Gary > -- - Shlomi Fish http://www.shlomifish.org/ https://github.com/shlomif/what-you-should-know-about-automated-testing It’s easier to port a shell than a shell script. — http://en.wikiquote.org/wiki/Larry_Wall Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex with HEX ascii chars
> However, when I try to seacch for it using if ($line=~/[\xfb|\xfc]/) { Note, you're mixing the character class " [ab] " with grouping alternative pipe " ( a | b ) " here > or even just if ($line=~/\xfb/) { Dunno, works here: $ perl -e '$line = "hi" . chr 251 . "ho" . chr 252 ; if ($line=~/[\xfb\xfc]/) { print "yep" } print "\n"' yep $ perl -e '$line = "hi" . chr 250 . "ho" . chr 253 ; if ($line=~/[\xfb\xfc]/) { print "yep" } print "\n"' [crickets] So, I'd guess your $line doesn't have a \xfb or \xfc in it at the time of the test. $ perl -e '$line = "hi" . chr 251 . "ho" . chr 253 ; if ($line=~/([\xfb\xfc])/) { print "yep: $1" } print "\n"' | od -c 000 y e p : 373 \n 007 On Thu, Apr 12, 2018 at 11:26 AM, Gary Stainburn < gary.stainb...@ringways.co.uk> wrote: > I have a text file (created by pdftotext) that I've imported into my > script. > > It contains ASCII characters 251 for crosses and 252 for ticks. If I load > the > file in gvim and do :as > > it reports the characters as > > 251, Hex 00fb, Octal 373 > 252, hex 00fc, Octal 374 > > However, when I try to seacch for it using > > if ($line=~/[\xfb|\xfc]/) { > > or even just > > if ($line=~/\xfb/) { > > it always fails. What am I doing wrong? > > Gary > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: Regex for matching files that don't have type extensions
On Sat, 05 Nov 2016 21:30:12 + Aaron Wells wrote: > True. It could get hairy. Unicode is a pretty vast landscape, and I > think if you only want ASCII word characters to count as things that > could be in a filename, your original [A-Za-z0-9_] is your best bet. > Thanks to the others for their comments. As Ken says: there are > probably more ways to code this. TIMTOWTDI https://en.wikipedia.org/wiki/There%27s_more_than_one_way_to_do_it ;) -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex for matching files that don't have type extensions
From: Aaron Wells True. It could get hairy. Unicode is a pretty vast landscape, and I think if you only want ASCII word characters to count as things that could be in a filename, your original [A-Za-z0-9_] is your best bet. Thanks to the others for their comments. As Ken says: there are probably more ways to code this. Another (shorter) way of writing that can be: /^\w+$/aa Where /aa makes \w mean just [A-Za-z0-9_]. a = ASCII and aa is used for double protection, so only ASCII is used. --Octavian
Re: Regex for matching files that don't have type extensions
On Sat, Nov 5, 2016 at 10:55 AM, Jovan Trujillo wrote: > Hi Aaron, >In perlre I read that \w > " > > \w[3] Match a "word" character (alphanumeric plus "_", plus > other connector punctuation chars plus > Unicode > marks) > > " > > So since I didn't know what these 'other' connection punctuation chars are I > avoided it. Unicode makes things more complicated for me. Do you know? > To exclude Unicode and ensure only ASCII, use the /a modifer, eg, /\w+/a >From perlre: /a is the same as "/u", except that "\d", "\s", "\w", and the Posix character classes are restricted to matching in the ASCII range only. That is, with this modifier, "\d" always means precisely the digits "0" to "9"; "\s" means the five characters "[ \f\n\r\t]"; "\w" means the 63 characters "[A-Za-z0-9_]"; and likewise, all the Posix classes such as "[[:print:]]" match only the appropriate ASCII-range characters. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex for matching files that don't have type extensions
True. It could get hairy. Unicode is a pretty vast landscape, and I think if you only want ASCII word characters to count as things that could be in a filename, your original [A-Za-z0-9_] is your best bet. Thanks to the others for their comments. As Ken says: there are probably more ways to code this. On Sat, Nov 5, 2016, 11:44 AM Kent Fredric wrote: > On 6 November 2016 at 06:14, Jovan Trujillo > wrote: > > > > 1207003PE_GM_09TNPLM2.csv > > > > I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both > > strings. > > So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings > > captured. > > Alternatively, if your use case allows it, it might be more viable to > use negative matching. > > $file !~ /[.]/ and print "$file has no extension" > > There's probably a reason why you're not doing this already, but can't > tell from the context. > > NB: Clearly defining what an "extension" means is also pertinent: > > fooo.csv > fooo.jpg > fooo.jpeg > foo.tar.xz > foo.config > .config > .config.ini > > You probably are just meaning "has a dot" or "has a dot followed by at > most 3 characters", but its hard to tell from context ( and there are > a lot of obvious cases where there is an "extension" suffix that is > greater than 3 characters ) > > > > > > > > -- > Kent > > KENTNL - https://metacpan.org/author/KENTNL > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: Regex for matching files that don't have type extensions
On 6 November 2016 at 06:14, Jovan Trujillo wrote: > > 1207003PE_GM_09TNPLM2.csv > > I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both > strings. > So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings > captured. Alternatively, if your use case allows it, it might be more viable to use negative matching. $file !~ /[.]/ and print "$file has no extension" There's probably a reason why you're not doing this already, but can't tell from the context. NB: Clearly defining what an "extension" means is also pertinent: fooo.csv fooo.jpg fooo.jpeg foo.tar.xz foo.config .config .config.ini You probably are just meaning "has a dot" or "has a dot followed by at most 3 characters", but its hard to tell from context ( and there are a lot of obvious cases where there is an "extension" suffix that is greater than 3 characters ) -- Kent KENTNL - https://metacpan.org/author/KENTNL -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex for matching files that don't have type extensions
Hi Jovan, On Sat, Nov 5, 2016 at 1:14 PM, Jovan Trujillo wrote: > Hi All, > I thought I could use a simple regex to match files like this: > > 1207003PE_GM_09TNPLM2 > > and ignore files with extensions like this: > > 1207003PE_GM_09TNPLM2.csv > > I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both > strings. > So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings > captured. > > What am I doing wrong? > > Thank you, > Jovan > The regular expression *m/[A-Za-z0-9\_]+(?!\.)/* will match, as it will match one or more of the desired characters (*1207003PE_GM_09TNPLM*) that are followed by a character (*2*) that is not a period/dot. There are probably many ways to code this. The simplest may be to run two regular expressions - the first to determine if there is a period/dot (*.*) in the string. HTH, Ken
Re: Regex for matching files that don't have type extensions
Hi Aaron, In perlre I read that \w " - \w[3] Match a "word" character (alphanumeric plus "_", plus - other connector punctuation chars plus Unicode - marks) " So since I didn't know what these 'other' connection punctuation chars are I avoided it. Unicode makes things more complicated for me. Do you know? Thanks, Jovan On Sat, Nov 5, 2016 at 10:27 AM, Aaron Wells wrote: > *predefined > > On Sat, Nov 5, 2016, 10:27 AM Aaron Wells wrote: > >> Hi Jovan. \w is a presidents character classes that is equivalent to >> [A-Za-z0-9_], so this works also: >> m/^\w+$/ >> >> On Sat, Nov 5, 2016, 10:24 AM Jovan Trujillo >> wrote: >> >> Ah, I figured it out. >> m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string >> follows the pattern. Thanks! >> >> On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo < >> jovan.trujil...@gmail.com> wrote: >> >> Hi All, >> I thought I could use a simple regex to match files like this: >> >> 1207003PE_GM_09TNPLM2 >> >> and ignore files with extensions like this: >> >> 1207003PE_GM_09TNPLM2.csv >> >> I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both >> strings. >> So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings >> captured. >> >> What am I doing wrong? >> >> Thank you, >> Jovan >> >> >>
Re: Regex for matching files that don't have type extensions
*predefined On Sat, Nov 5, 2016, 10:27 AM Aaron Wells wrote: > Hi Jovan. \w is a presidents character classes that is equivalent to > [A-Za-z0-9_], so this works also: > m/^\w+$/ > > On Sat, Nov 5, 2016, 10:24 AM Jovan Trujillo > wrote: > > Ah, I figured it out. > m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string > follows the pattern. Thanks! > > On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo > wrote: > > Hi All, > I thought I could use a simple regex to match files like this: > > 1207003PE_GM_09TNPLM2 > > and ignore files with extensions like this: > > 1207003PE_GM_09TNPLM2.csv > > I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both > strings. > So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings > captured. > > What am I doing wrong? > > Thank you, > Jovan > > >
Re: Regex for matching files that don't have type extensions
Hi Jovan. \w is a presidents character classes that is equivalent to [A-Za-z0-9_], so this works also: m/^\w+$/ On Sat, Nov 5, 2016, 10:24 AM Jovan Trujillo wrote: > Ah, I figured it out. > m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string > follows the pattern. Thanks! > > On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo > wrote: > > Hi All, > I thought I could use a simple regex to match files like this: > > 1207003PE_GM_09TNPLM2 > > and ignore files with extensions like this: > > 1207003PE_GM_09TNPLM2.csv > > I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both > strings. > So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings > captured. > > What am I doing wrong? > > Thank you, > Jovan > > >
Re: Regex for matching files that don't have type extensions
Ah, I figured it out. m/^[A-Za-z0-9_]+$/ works because it will only match if the entire string follows the pattern. Thanks! On Sat, Nov 5, 2016 at 10:14 AM, Jovan Trujillo wrote: > Hi All, > I thought I could use a simple regex to match files like this: > > 1207003PE_GM_09TNPLM2 > > and ignore files with extensions like this: > > 1207003PE_GM_09TNPLM2.csv > > I originally though m/[A-Za-z0-9\_]+/ would work, but it captures both > strings. > So then I tried m/[A-Za-z0-9\_]+(?!\.)/ but I still get both strings > captured. > > What am I doing wrong? > > Thank you, > Jovan >
Re: Regex to match "bad" characters in a parameter
"Chris Charley" writes: > You could do that in 1 line - See the following small program. > (The line using a 'grep' solution is commented out. It would work as well). > > > #!/usr/bin/perl > use strict; > use warnings; > > while (my $id = ) { >chomp $id; >#if (grep /itemid=.*?[^\w-]/, split /&/, $id) { >if ($id =~ /itemid/ && $id !~ /itemid=[\w-]+(?:&|$)/) { >print "Bad id: <$id>\n"; >} > } > > __DATA__ > itemid=AT18C&i_AT18C=1&t=main.htm&storeid=1&cols=1&c=detail.htm&ordering=asc > c=detail.htm&itemid=AT18C > itemid=AT18/C > t=main.htm&storeid=1&cols=1&c=detail.htm&ordering=asc This might be a string with a bad item id because there is none: Are you going to process the string, assuming that it is a good item id? How do you determine the beginning of the relevant sequence --- and thus whether the string contains a good item id or not --- when the string might not contain 'itemid' to designate the beginning? I think you might need to work with cleaner definitions, and/or attempt to find the good item ids instead of the bad ones. > itemid=?AT18C > > > When this is run, it prints out: > > Bad id: > Bad id: > > Chris -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex to match "bad" characters in a parameter
On Jan 26, 2016, at 11:22 AM, Chris Charley wrote: > > You could do that in 1 line - See the following small program. Thanks, Chris. That'll do the trick. And the grep alternative is interesting, too. I hadn't thought of that. Regards, Frank -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex to match "bad" characters in a parameter
"SSC_perl" wrote in message news:ef7499af-b4a5-4b07-8c69-3192ef782...@surfshopcart.com... On Jan 25, 2016, at 4:59 PM, Shawn H Corey wrote: Use the negative match operator !~ if( $QUERY_STRING !~ m{ itemid = [-0-9A-Za-z_]+? (?: \& | \z ) }msx ){ print "bad: $QUERY_STRING\n"; } Thanks for that, Shawn. It works perfectly except for one criteria that I inadvertently forgot to >include. It's possible that the string will _not_ contain the itemid parameter at all. When that's >missing, the regex matches and it shouldn't. I guess that's why I was trying to stay with the >positive match operator. I tried inverting your regex: if ( $QUERY_STRING =~ m/ itemid= .*? [^-0-9A-Za-z_]+? .*? (?: \& | \z ) /sx ) { > say "bad: $QUERY_STRING"; } but that doesn't work either. It catches even good item numbers. In the meantime, I got it to work by grabbing the itemid and working with that separately: my $item_id = $1 if ($QUERY_STRING =~ m/ itemid=([^&]*) /x); if ( $item_id =~ m/ [^a-zA-Z0-9_-] /x ) { ... however, I'd like to do that with a single line, if possible, so I don't have to create a new variable >just for that. Thanks, Frank= ### ### Hello Frank, You could do that in 1 line - See the following small program. (The line using a 'grep' solution is commented out. It would work as well). #!/usr/bin/perl use strict; use warnings; while (my $id = ) { chomp $id; #if (grep /itemid=.*?[^\w-]/, split /&/, $id) { if ($id =~ /itemid/ && $id !~ /itemid=[\w-]+(?:&|$)/) { print "Bad id: <$id>\n"; } } __DATA__ itemid=AT18C&i_AT18C=1&t=main.htm&storeid=1&cols=1&c=detail.htm&ordering=asc c=detail.htm&itemid=AT18C itemid=AT18/C t=main.htm&storeid=1&cols=1&c=detail.htm&ordering=asc itemid=?AT18C When this is run, it prints out: Bad id: Bad id: Chris -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex to match "bad" characters in a parameter
On Jan 25, 2016, at 4:59 PM, Shawn H Corey wrote: > > Use the negative match operator !~ > > if( $QUERY_STRING !~ m{ itemid = [-0-9A-Za-z_]+? (?: \& | \z ) }msx ){ >print "bad: $QUERY_STRING\n"; > } Thanks for that, Shawn. It works perfectly except for one criteria that I inadvertently forgot to include. It's possible that the string will _not_ contain the itemid parameter at all. When that's missing, the regex matches and it shouldn't. I guess that's why I was trying to stay with the positive match operator. I tried inverting your regex: if ( $QUERY_STRING =~ m/ itemid= .*? [^-0-9A-Za-z_]+? .*? (?: \& | \z ) /sx ) { say "bad: $QUERY_STRING"; } but that doesn't work either. It catches even good item numbers. In the meantime, I got it to work by grabbing the itemid and working with that separately: my $item_id = $1 if ($QUERY_STRING =~ m/ itemid=([^&]*) /x); if ( $item_id =~ m/ [^a-zA-Z0-9_-] /x ) { ... however, I'd like to do that with a single line, if possible, so I don't have to create a new variable just for that. Thanks, Frank -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex to match "bad" characters in a parameter
On Mon, 25 Jan 2016 16:16:40 -0800 SSC_perl wrote: > I'm trying to find a way to trap bad item numbers. I want to > parse the parameter "itemid=" and then everything up to either an "&" > or end-of-string. A good item number will contain only ASCII > letters, numbers, dashes, and underscores and may terminate with a > "&" or it may not (see samples below). The following string should > test negative in the regex below: > > my $QUERY_STRING = 'itemid=AT18C&i_AT18C=1'; > > but a string containing "itemid=AT18/C" should test positive, since > it has a slash. > > I can catch a single bad character and get it to work, e.g. > > if ( $QUERY_STRING =~ m| itemid= .*? [/]+? .*? &? |x ) { > > but I'd like to do something like this instead to catch others: > > if ( $QUERY_STRING =~ m| itemid= (?: .*? [^a-zA-Z0_-]+ .*? ) &? |x ) > { ... > > Unfortunately, I can't get it to work. I've read perlretut, > but can't see the answer. What am I doing wrong? > > Thanks, > Frank > > Here are a couple of test strings: > > 'itemid=AT18C&i_AT18C=1&t=main.htm&storeid=1&cols=1&c=detail.htm&ordering=asc' > > 'c=detail.htm&itemid=AT18C' > > > > Use the negative match operator !~ if( $QUERY_STRING !~ m{ itemid = [-0-9A-Za-z_]+? (?: \& | \z ) }msx ){ print "bad: $QUERY_STRING\n"; } -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex problem?
On Wed, 25 Nov 2015 17:22:04 + Andrew Solomon wrote: > The only problem I can see is that you want UPPERCASE-1234 and your > regex has lowercase. Try > > (\A[A-Z]+) # match and capture leading alphabetics Please put the anchor outside the capture. And you could use the POSIX conventions: m{ \A ([[:upper:]]+) }msx; This will work with non-English characters. :) -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex problem?
The only problem I can see is that you want UPPERCASE-1234 and your regex has lowercase. Try (\A[A-Z]+) # match and capture leading alphabetics Andrew p.s Why not add "use strict; use warnings", "my $var;" and wear a seat belt when you're driving?:) On Wed, Nov 25, 2015 at 5:09 PM, Rick T wrote: > The following code apparently is not doing what I wanted. My intention was > to confirm that the general format of $student_id was this: several > uppercase letters followed by a hyphen followed by several digits. If not, > it would trigger the die. Unfortunately it seems to always trigger the die. > For example, if I let student_id = triplett-1, the script dies. I’m a > beginner, so I often have trouble seeing the “obvious.” Any suggestions > will be appreciated! > > if ( $student_id =~ > / > (\A[a-z]+) # match and > capture leading alphabetics > - # hyphen > to separate surname from number > ([0-9]+\z) # match and > capture trailing digits > /xms# Perl Best > Practices > ) { > $student_surname = $1; > $student_number = $2; > } > else { > die "Bad general form for student_id: $student_id" > }; > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > > -- Andrew Solomon Mentor@Geekuni http://geekuni.com/ http://www.linkedin.com/in/asolomon
Re: regex capture question
Hi Tiago, Please reply to list if it's a mailing list post - http://shlom.in/reply . On Thu, 18 Jun 2015 10:20:57 -0300 Tiago Hori wrote: > Folks, > > I have the following regex: $_ =~ /(Crosses)(.*)(misses=)(\d+)/s > > It does what I need to do in terms of matching, but I also want to use the > capture parenthesis. The data comes from tab-limited files and I use $4 to > grab the last digits of the match, however it is also matching the trailing > tab. I solved it by stripping of the tabs from the line, but I can figure > out why (\d+) is also matching the tab! > That sounds strange and perl should not do that. Can you post a self-contained and reproducing example that exhibits this behaviour? Sometimes reducing your code to the bare reproducing minimum helps in finding where the problem is. I could also use some information about your system (OS, distribution, perl, versions , CPU architecture, etc.) Regards, Shlomi Fish > T. > -- - Shlomi Fish http://www.shlomifish.org/ http://www.shlomifish.org/humour/ways_to_do_it.html Chuck Norris can construct any logical expression using only AND gates. — http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/ Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex capture question
Hi, Tiago! I can't reproduce such behaviour use Modern::Perl '2014'; my $string = 'Crosses misses=50 '; my (@matches) = ($string =~ /(Crosses)(.*)(misses=)(\d+)/s); use Data::Dumper; print Dumper \@matches; result: $VAR1 = [ 'Crosses', ' ', 'misses=', '50' ]; As you see, no tabs in $matches[3]; Please, publish string that you try to parse with this regexp. чт, 18 июня 2015 г. в 16:24, Tiago Hori : > Folks, > > I have the following regex: $_ =~ /(Crosses)(.*)(misses=)(\d+)/s > > It does what I need to do in terms of matching, but I also want to use the > capture parenthesis. The data comes from tab-limited files and I use $4 to > grab the last digits of the match, however it is also matching the trailing > tab. I solved it by stripping of the tabs from the line, but I can figure > out why (\d+) is also matching the tab! > > T. > > -- > "Education is not to be used to promote obscurantism." - Theodonius > Dobzhansky. > > "Gracias a la vida que me ha dado tanto > Me ha dado el sonido y el abecedario > Con él, las palabras que pienso y declaro > Madre, amigo, hermano > Y luz alumbrando la ruta del alma del que estoy amando > > Gracias a la vida que me ha dado tanto > Me ha dado la marcha de mis pies cansados > Con ellos anduve ciudades y charcos > Playas y desiertos, montañas y llanos > Y la casa tuya, tu calle y tu patio" > > Violeta Parra - Gracias a la Vida > > Tiago S. F. Hori. PhD. > Ocean Science Center-Memorial University of Newfoundland >
Re: regex and parse
OK! Thanks to all! :-) 2014-03-11 15:34 GMT-03:00 Andy Bach : > > On Tue, Mar 11, 2014 at 12:13 PM, Paolo Gianrossi < > paolino.gianro...@gmail.com> wrote: > >> A classic is Mastering Regular Expressions ny Jeffrey E.F. Friedl ( >> http://shop.oreilly.com/product/9781565922570.do) >> > > Just to +1 this - one of the best RE and programming books ever - it's not > just Perl REs but he covers and compares many other languages. It's also > funny, as in LOL! > > > -- > > a > > Andy Bach, > afb...@gmail.com > 608 658-1890 cell > 608 261-5738 wk > -- Ariel
Re: regex and parse
On Tue, Mar 11, 2014 at 12:13 PM, Paolo Gianrossi < paolino.gianro...@gmail.com> wrote: > A classic is Mastering Regular Expressions ny Jeffrey E.F. Friedl ( > http://shop.oreilly.com/product/9781565922570.do) > Just to +1 this - one of the best RE and programming books ever - it's not just Perl REs but he covers and compares many other languages. It's also funny, as in LOL! -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk
Re: regex and parse
On Tue, Mar 11, 2014 at 10:01 AM, Ariel Hosid wrote: > Hello everyone! > Can anyone recommend me literature that treats regular expressions and how > to analyze files? Some perl resources: perldoc perlrequick(Perl regular expressions quick start) perlretut (Perl reg exp tutorial) perlre (Perl regular expressions, the rest of the story) .-- Charles DeRykus -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex and parse
A classic is Mastering Regular Expressions ny Jeffrey E.F. Friedl ( http://shop.oreilly.com/product/9781565922570.do) A quick google search also brings out, e.g. http://stackoverflow.com/questions/4736/learning-regular-expressions with many links to resources. HTH paolo -- Paolo Gianrossi Like my grandma used to say, don't sail an aluminium boat on a gallium lake. (My grandma was a little strange.) -- xkcd On 11 March 2014 18:01, Ariel Hosid wrote: > Hello everyone! > Can anyone recommend me literature that treats regular expressions and how > to analyze files? > Thank you! > > -- > Ariel >
Re: regex and parse
On 11/03/2014 17:01, Ariel Hosid wrote: Can anyone recommend me literature that treats regular expressions and how to analyze files? The best documentation on regular expressions is Perl's own here http://perldoc.perl.org/perlre.html Analysing files is an enormous subject that is difficult to generalise. Perhaps you should read what you can find on the internet and come back here if you have a specific problem? Rob --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex to get version from file name
Great thank you! On Fri, Feb 21, 2014 at 6:02 PM, Jim Gibson wrote: > > On Feb 21, 2014, at 6:21 AM, Wernher Eksteen wrote: > > > Hi all, > > > > From the below file names I only need the version number 1.2.4 without > explicitly specifying it. > > > > check_mk-1.2.4.tar.gz > > check_mk-agent-1.2.4-1.noarch.rpm > > check_mk-agent-logwatch-1.2.4-1.noarch.rpm > > check_mk-agent-oracle-1.2.4-1.noarch.rpm > > mk-livestatus-1.2.4.tar.gz > > mkeventd-1.2.4.tar.gz > > > > What regex can I use to obtain only the string value 1.2.4 from the file > names (or whatever future versions based on the 3 numbers separated by 3 > dots, [0-9].[0-9].[0-9]? > > Here's one that will do any number of digits, provided they are preceded > by a hyphen and followed by a hyphen or period (like all of your samples): > > /-([\d.]+)[.-]/ > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: regex to get version from file name
Thanks, this also worked for me... foreach my $i (@fileList) { push @versions, $i =~ m/\b(\d+\.\d+\.\d+)\b/g; } my %seen; my @unique = grep { ! $seen{$_}++ } @versions; On Sun, Feb 23, 2014 at 4:27 PM, Jim Gibson wrote: > > On Feb 23, 2014, at 5:10 AM, Wernher Eksteen wrote: > > > Hi, > > > > Thanks, but how do I assign the value found by the regex to a variable > so that the "1.2.4" from 6 file names in the array @fileList are print only > once, and if there are other versions found say 1.2.5 and 1.2.6 to print > the unique values from all. > > > > > > From that I want to get the value 1.2.4 and assign it to a variable, if > there are more than one value such as 1.2.5 and 1.2.6 as well, it should > print them too, but only the unique values. > > > > My attempt shown below to print only the value 1.2.4 is as follow, but > it prints out "1.2.41.2.41.2.41.2.41.2.41.2.4" next to each other, if I > pass a newline to $i such as "$i\n" it then prints "11" ? > > > > foreach my $i (@fileList) { > > print $i =~ /\b(\d+\.\d+\.\d+)\b/; > > } > > The parentheses in the above regular expression cause the matched > substrings to be assigned to $1. If you wish to print those values, print > $1 or assign the value of $1 to another variable and print it: > > if( $i =~ /\b(\d+\.\d+\.\d+)\b/ ) { > print "$1\n"; > } > > If you wish to find all of the unique values of what is captured, use the > values as keys in a hash and print the keys after all the lines have been > processed (untested): > > my %unique; > foreach my $i (@fileList) { > if( $i =~ /\b(\d+\.\d+\.\d+)\b/ ) { > $unique{$1}++; > } > for my $number ( sort keys %unique ) { > print "Version $number had $unique{$number} files\n"; > } > > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: regex to get version from file name
On Feb 23, 2014, at 5:10 AM, Wernher Eksteen wrote: > Hi, > > Thanks, but how do I assign the value found by the regex to a variable so > that the "1.2.4" from 6 file names in the array @fileList are print only > once, and if there are other versions found say 1.2.5 and 1.2.6 to print the > unique values from all. > > > From that I want to get the value 1.2.4 and assign it to a variable, if there > are more than one value such as 1.2.5 and 1.2.6 as well, it should print them > too, but only the unique values. > > My attempt shown below to print only the value 1.2.4 is as follow, but it > prints out "1.2.41.2.41.2.41.2.41.2.41.2.4" next to each other, if I pass a > newline to $i such as "$i\n" it then prints "11" ? > > foreach my $i (@fileList) { > print $i =~ /\b(\d+\.\d+\.\d+)\b/; > } The parentheses in the above regular expression cause the matched substrings to be assigned to $1. If you wish to print those values, print $1 or assign the value of $1 to another variable and print it: if( $i =~ /\b(\d+\.\d+\.\d+)\b/ ) { print “$1\n”; } If you wish to find all of the unique values of what is captured, use the values as keys in a hash and print the keys after all the lines have been processed (untested): my %unique; foreach my $i (@fileList) { if( $i =~ /\b(\d+\.\d+\.\d+)\b/ ) { $unique{$1}++; } for my $number ( sort keys %unique ) { print “Version $number had $unique{$number} files\n”; } -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex to get version from file name
On Feb 21, 2014, at 6:21 AM, Wernher Eksteen wrote: > Hi all, > > From the below file names I only need the version number 1.2.4 without > explicitly specifying it. > > check_mk-1.2.4.tar.gz > check_mk-agent-1.2.4-1.noarch.rpm > check_mk-agent-logwatch-1.2.4-1.noarch.rpm > check_mk-agent-oracle-1.2.4-1.noarch.rpm > mk-livestatus-1.2.4.tar.gz > mkeventd-1.2.4.tar.gz > > What regex can I use to obtain only the string value 1.2.4 from the file > names (or whatever future versions based on the 3 numbers separated by 3 > dots, [0-9].[0-9].[0-9]? Here’s one that will do any number of digits, provided they are preceded by a hyphen and followed by a hyphen or period (like all of your samples): /-([\d.]+)[.-]/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex to get version from file name
Thanks, I've changed it to use LWP. I'm not sure how to download the actual file with LWP, so I've tried File::Fetch which works, but it doesn't show download progress/status etc, just hanging blank until the download completes. Any pointers on getting download status/progress details? foreach my $i (@fileList2) { my $file = "$url/$i" if $i =~ m/$getMenuItem/g; chomp($file); my $ff = File::Fetch->new(uri => "$file"); my $where = $ff->fetch() or die $ff->error; } Thanks, Wernher On Sun, Feb 23, 2014 at 4:35 PM, shawn wilson wrote: > Use LWP to get web data - not lynx and the like unless you can't help it. > I prefer using Web::Scraper to parse html but either way it's probably best > not to use a regex (see SO and similar for discussions on the like). > > On Feb 23, 2014 8:13 AM, "Wernher Eksteen" wrote: > > > > Hi, > > > > Thanks, but how do I assign the value found by the regex to a variable > so that the "1.2.4" from 6 file names in the array @fileList are print only > once, and if there are other versions found say 1.2.5 and 1.2.6 to print > the unique values from all. > > > > This is my script thus far. The aim of this script is to connect to the > site, remove all html tags and obtain only the file names I need. > > > > #!/usr/bin/perl > > > > use strict; > > use warnings; > > > > # initiating package names to be used later > > my @getList; > > my @fileList; > > > > # get files using lynx and parse through it > > my $url = "http://mathias-kettner.com/download";; > > open my $in, "lynx -dump $url |" or die $!; > > > > # get the bits we need and push it to an array to further filter what we > need > > while(<$in>){ > > chomp; > > if( /\[(\d+)\](.+)/ ){ > >next if $1 == 1; > > push @getList, "$2\n"; > > } > > } > > > > # filter only the files we need into final array > > foreach my $i (@getList) { > > my @list = split /\s+/, $i; > > push @fileList, "$list[0]\n", if $i =~ /rpm|tar/ && $i !~ /[0-9][a-z]/; > > } > > > > # print the list > > print "\nList of files to be retrieved from $url:\n\n @fileList\n"; > > > > The output is then: > > > > List of files to be retrieved from http://mathias-kettner.com/download: > > > > > > check_mk-1.2.4.tar.gz > > check_mk-agent-1.2.4-1.noarch.rpm > > check_mk-agent-logwatch-1.2.4-1.noarch.rpm > > check_mk-agent-oracle-1.2.4-1.noarch.rpm > > mk-livestatus-1.2.4.tar.gz > > mkeventd-1.2.4.tar.gz > > > > From that I want to get the value 1.2.4 and assign it to a variable, if > there are more than one value such as 1.2.5 and 1.2.6 as well, it should > print them too, but only the unique values. > > > > My attempt shown below to print only the value 1.2.4 is as follow, but > it prints out "1.2.41.2.41.2.41.2.41.2.41.2.4" next to each other, if I > pass a newline to $i such as "$i\n" it then prints "11" ? > > > > foreach my $i (@fileList) { > > print $i =~ /\b(\d+\.\d+\.\d+)\b/; > > } > > > > The 1s are all of the returns of true (or one match). You want to print > "$i\n" if (foo) >
Re: regex to get version from file name
Use LWP to get web data - not lynx and the like unless you can't help it. I prefer using Web::Scraper to parse html but either way it's probably best not to use a regex (see SO and similar for discussions on the like). On Feb 23, 2014 8:13 AM, "Wernher Eksteen" wrote: > > Hi, > > Thanks, but how do I assign the value found by the regex to a variable so that the "1.2.4" from 6 file names in the array @fileList are print only once, and if there are other versions found say 1.2.5 and 1.2.6 to print the unique values from all. > > This is my script thus far. The aim of this script is to connect to the site, remove all html tags and obtain only the file names I need. > > #!/usr/bin/perl > > use strict; > use warnings; > > # initiating package names to be used later > my @getList; > my @fileList; > > # get files using lynx and parse through it > my $url = "http://mathias-kettner.com/download";; > open my $in, "lynx -dump $url |" or die $!; > > # get the bits we need and push it to an array to further filter what we need > while(<$in>){ > chomp; > if( /\[(\d+)\](.+)/ ){ >next if $1 == 1; > push @getList, "$2\n"; > } > } > > # filter only the files we need into final array > foreach my $i (@getList) { > my @list = split /\s+/, $i; > push @fileList, "$list[0]\n", if $i =~ /rpm|tar/ && $i !~ /[0-9][a-z]/; > } > > # print the list > print "\nList of files to be retrieved from $url:\n\n @fileList\n"; > > The output is then: > > List of files to be retrieved from http://mathias-kettner.com/download: > > > check_mk-1.2.4.tar.gz > check_mk-agent-1.2.4-1.noarch.rpm > check_mk-agent-logwatch-1.2.4-1.noarch.rpm > check_mk-agent-oracle-1.2.4-1.noarch.rpm > mk-livestatus-1.2.4.tar.gz > mkeventd-1.2.4.tar.gz > > From that I want to get the value 1.2.4 and assign it to a variable, if there are more than one value such as 1.2.5 and 1.2.6 as well, it should print them too, but only the unique values. > > My attempt shown below to print only the value 1.2.4 is as follow, but it prints out "1.2.41.2.41.2.41.2.41.2.41.2.4" next to each other, if I pass a newline to $i such as "$i\n" it then prints "11" ? > > foreach my $i (@fileList) { > print $i =~ /\b(\d+\.\d+\.\d+)\b/; > } > The 1s are all of the returns of true (or one match). You want to print "$i\n" if (foo)
Re: regex to get version from file name
Hi, Thanks, but how do I assign the value found by the regex to a variable so that the "1.2.4" from 6 file names in the array @fileList are print only once, and if there are other versions found say 1.2.5 and 1.2.6 to print the unique values from all. This is my script thus far. The aim of this script is to connect to the site, remove all html tags and obtain only the file names I need. #!/usr/bin/perl use strict; use warnings; *# initiating package names to be used later*my @getList; my @fileList; *# get files using lynx and parse through it*my $url = " http://mathias-kettner.com/download";; open my $in, "lynx -dump $url |" or die $!; *# get the bits we need and push it to an array to further filter what we need*while(<$in>){ chomp; if( /\[(\d+)\](.+)/ ){ next if $1 == 1; push @getList, "$2\n"; } } *# filter only the files we need into final array* foreach my $i (@getList) { my @list = split /\s+/, $i; push @fileList, "$list[0]\n", if $i =~ /rpm|tar/ && $i !~ /[0-9][a-z]/; } *# print the list* print "\nList of files to be retrieved from $url:\n\n @fileList\n"; *The output is then:* List of files to be retrieved from http://mathias-kettner.com/download: check_mk-1.2.4.tar.gz check_mk-agent-1.2.4-1.noarch.rpm check_mk-agent-logwatch-1.2.4-1.noarch.rpm check_mk-agent-oracle-1.2.4-1.noarch.rpm mk-livestatus-1.2.4.tar.gz mkeventd-1.2.4.tar.gz >From that I want to get the value 1.2.4 and assign it to a variable, if there are more than one value such as 1.2.5 and 1.2.6 as well, it should print them too, but only the unique values. My attempt shown below to print only the value 1.2.4 is as follow, but it prints out "1.2.41.2.41.2.41.2.41.2.41.2.4" next to each other, if I pass a newline to $i such as "$i\n" it then prints "11" ? foreach my $i (@fileList) { print $i =~ /\b(\d+\.\d+\.\d+)\b/; } Thank you, Wernher On Fri, Feb 21, 2014 at 4:27 PM, Shawn H Corey wrote: > On Fri, 21 Feb 2014 16:21:57 +0200 > Wernher Eksteen wrote: > > > Hi all, > > > > From the below file names I only need the version number 1.2.4 without > > explicitly specifying it. > > > > check_mk-1.2.4.tar.gz > > check_mk-agent-1.2.4-1.noarch.rpm > > check_mk-agent-logwatch-1.2.4-1.noarch.rpm > > check_mk-agent-oracle-1.2.4-1.noarch.rpm > > mk-livestatus-1.2.4.tar.gz > > mkeventd-1.2.4.tar.gz > > > > What regex can I use to obtain only the string value 1.2.4 from the > > file names (or whatever future versions based on the 3 numbers > > separated by 3 dots, [0-9].[0-9].[0-9]? > > > > Thanks! > > Wernher > > /\b(\d+\.\d+\.\d+)\b/ > > > -- > Don't stop where the ink does. > Shawn > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: regex to get version from file name
On Fri, 21 Feb 2014 16:21:57 +0200 Wernher Eksteen wrote: > Hi all, > > From the below file names I only need the version number 1.2.4 without > explicitly specifying it. > > check_mk-1.2.4.tar.gz > check_mk-agent-1.2.4-1.noarch.rpm > check_mk-agent-logwatch-1.2.4-1.noarch.rpm > check_mk-agent-oracle-1.2.4-1.noarch.rpm > mk-livestatus-1.2.4.tar.gz > mkeventd-1.2.4.tar.gz > > What regex can I use to obtain only the string value 1.2.4 from the > file names (or whatever future versions based on the 3 numbers > separated by 3 dots, [0-9].[0-9].[0-9]? > > Thanks! > Wernher /\b(\d+\.\d+\.\d+)\b/ -- Don't stop where the ink does. Shawn -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex headache
On 2014-02-03 21:30, Paul Fontenot wrote: Hi, I am attempting to write a regex but it is giving me a headache. I have two log entries 1. Feb 3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR [org.apache.commons.logging.impl.Log4JLogger] 2. Feb 3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR [STDERR] I am using the following "^\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}\s+\w+\s+\[\d{1,2}:\s+\d{1,2}:\d{1, 2},\d{3}\]\s+\w+\s+\[[a-zA-Z0-9.]\]" My problem is this greedy little '.' - I need to just be a period. How do I match #1 and not match #2? I think you should replace "\[[a-zA-Z0-9.]\]" by "\[[^]]+\]". Don't worry of matching, see this as parsing, and skip a line on how it matches, not on how it doesn't match. Hint: start using "named captures". If you are into massively scanning log files, try MCE::Grep. -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: regex headache
On Feb 3, 2014, at 12:30 PM, Paul Fontenot wrote: > Hi, I am attempting to write a regex but it is giving me a headache. > > I have two log entries > > 1. Feb 3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR > [org.apache.commons.logging.impl.Log4JLogger] > 2. Feb 3 12:54:28 cdrtva01a1005 [12: 54:27,532] ERROR [STDERR] > > I am using the following > "^\w+\s+\d{1,2}\s+\d{1,2}:\d{1,2}:\d{1,2}\s+\w+\s+\[\d{1,2}:\s+\d{1,2}:\d{1, > 2},\d{3}\]\s+\w+\s+\[[a-zA-Z0-9.]\]" > > My problem is this greedy little '.' - I need to just be a period. How do I > match #1 and not match #2? You appear to be making the job too difficult. The only difference between lines 1. and 2. is the last column. To differentiate those two, you can do this (assuming the string is in $_): if( /\[STDERR\]/ ) { # process line 2 }else{ # process line 1 } Do you really need to match each field in the entire line? If so, I would try splitting the lines on whitespace and extracting the columns you need that way. Whether or not that works depends upon: 1) how much variation there can be in your log entries, and 2) what exactly you need to extract from each entry. Fixing that regex may not be the most productive approach in the long term. As for your specific question, a period in a character class (e.g., [.]) will match a period. A period in the regex pattern will match any character (except possibly a newline). To match a period character, escape the period: /\./ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex not working correctly
That answers my question. Thanks Robert
Re: Regex not working correctly
On Wed, Dec 11, 2013 at 10:35 AM, punit jain wrote: > > Thanks Shlomi, thats a good idea. However at the same time I was trying to > understand if something is wrong in my regex. Why would $2 capture the > number as I have used :- > > (?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+))) > > This would in my understanding match either number with regex > 91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11} > or with call followed by digits. > > In my case 4 ( price for free consultation call92504060) why would $1 > store an empty string and $2 actually stores the number part ? > There are two sets of capturing parenthesis: * (91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11}) = $1 * (\d+) = $2 The first set stores its match in $1 and the second set in $2. The pipe (or) does not reset the capture counter back to 1. The counter strictly goes from left to right. -- Robert Wohlfarth
RE: Regex not working correctly
Hi, You can try the below pattern. if($line=~/([0-9]{3,})/gs) { print $1; } Thanks, Vijaya -- From: punit jain Sent: 12/11/2013 9:07 PM To: beginners@perl.org Subject: Regex not working correctly Hi, I have a requirement where I need to capture phone number from different strings. The strings could be :- 1. COMP TEL NO 919369721113 for computer science 2. For Best Discount reach 092108493, from 6-9 3. Your booking Confirmed, 9210833321 4. price for free consultation call92504060 5. price for free consultation call92504060number I created a regex as below :- #!/usr/bin/perl my $line= shift @ARGV; if($line =~ /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/) { print "one = $1"; } It works fine for 1, 2,3 and prints number however for 4 and 5 one I get number in $2 rather than $1 tough I have pipe operator to check it. Any clue how to fix this ?
Re: Regex not working correctly
Thanks Shlomi, thats a good idea. However at the same time I was trying to understand if something is wrong in my regex. Why would $2 capture the number as I have used :- (?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+))) This would in my understanding match either number with regex 91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11} or with call followed by digits. In my case 4 ( price for free consultation call92504060) why would $1 store an empty string and $2 actually stores the number part ? Regards, Punit
Re: Regex not working correctly
On Dec 11, 2013, at 7:34 AM, punit jain wrote: > Hi, > > I have a requirement where I need to capture phone number from different > strings. > > The strings could be :- > > > 1. COMP TEL NO 919369721113 for computer science > > 2. For Best Discount reach 092108493, from 6-9 > > 3. Your booking Confirmed, 9210833321 > > 4. price for free consultation call92504060 > > 5. price for free consultation call92504060number > > I created a regex as below :- > > #!/usr/bin/perl > > my $line= shift @ARGV; > > if($line =~ > /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/) > { > print "one = $1"; > > > > } > > It works fine for 1, 2,3 and prints number however for 4 and 5 one I get > number in $2 rather than $1 tough I have pipe operator to check it. > > Any clue how to fix this ? Your first step is to rewrite the regular expression using the extended syntax x modifier and add some whitespace: if($line =~ m{ (?: (?: \D+ | \s+ ) (?: ( 91\d{10} | 0\d{10} | [7-9]\d{9} | 0\d{11} ) | (?: (?: ph | cal ) (\d+) ) ) ) | (?: (?: ( 91\d{10} | 0\d{10} | [7-9]\d{9} | 0\d{11}) | (?: (?: ph | cal ) (\d+) ) ) (?: \D+ | \s+ ) ) }x ) { Then maybe you will have some hope of figuring out why it doesn’t work (I certainly can’t). I suggest you break it up into a series of if-then-else statements: if( $line =~ /91\d{10} | \\d{10} | [7-9]\d{9} | 0\d{11} ) { $number = $1; }elsif( $line =~ (?:ph|cal)\d+ ) { $number = $1; }elsif( … ) { }else{ print “No match for $line”; } You don’t need to do it all in one regex. Debugging each of those smaller regexes will be easier than debugging the whole thing. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex not working correctly
Hi punit, On Wed, 11 Dec 2013 21:04:39 +0530 punit jain wrote: > Hi, > > I have a requirement where I need to capture phone number from different > strings. > > The strings could be :- > > > 1. COMP TEL NO 919369721113 for computer science > > 2. For Best Discount reach 092108493, from 6-9 > > 3. Your booking Confirmed, 9210833321 > > 4. price for free consultation call92504060 > > 5. price for free consultation call92504060number > > I created a regex as below :- > > #!/usr/bin/perl > > my $line= shift @ARGV; > > if($line =~ > /(?:(?:\D+|\s+)(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+|(?:(?:(91\d{10}|0\d{10}|[7-9]\d{9}|0\d{11})|(?:(?:ph|cal)(\d+)))(?:\D+|\s+))/) > { > > print "one = $1"; > > > } > It works fine for 1, 2,3 and prints number however for 4 and 5 one I get > number in $2 rather than $1 tough I have pipe operator to check it. > > Any clue how to fix this ? I suggest you use named captures (a feature of perl-5.10.x-and-above) and then you can do something like: my $my_capture = ($+{'capture1'} // $+{'capture2'}); I think this is the best way to do it. (You can also do $1 // $2, but please don't). Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ The Case for File Swapping - http://shlom.in/file-swap Why can’t we ever attempt to solve a problem in this country without having a “War” on it? -- Rich Thomson, talk.politics.misc Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help needed
On 2013-01-08 13:28, punit jain wrote: { test = ("test123"); test = ("test123","abc"); test = ("test123","abc","xyz"); } { test1 = ("passfile"); test1 = ("passfile","pasfile1"); test1 = ("passfile","pasfile1","user"); } and so on The requirement is to have the file parsing so that final output is :- test = ("test123","abc","xyz"); test1 = ("passfile","pasfile1","user"); So basically only pick the lines with maximum number of options for each type. Or just print the last long line: echo '{ test = ("test123"); test = ("test123","abc"); test = ("test123","abc","xyz"); } { test1 = ("passfile"); test1 = ("passfile","pasfile1"); test1 = ("passfile","pasfile1","user"); } ' |perl -wne'$o=$n||0;$p=$_,next if($n=length)>$o;$n=3;print$p' test = ("test123","abc","xyz"); test1 = ("passfile","pasfile1","user"); Which preserves order too. :) -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help needed
Punit Jain, This is not the optimized code but you can refactor it. This works for the given scenario, no matter the order of input data. Hope it helps to some extent. [code] my $var = ''; my @args = (); my %hash; while () { chomp; my ($var,$arg) = split /=/,$_,2; if($var eq '{') { @args = (); #Reset if we encounter '{' } my @arg1 = split /,/,$arg if defined $arg; if(scalar @arg1 > scalar @args) { $hash{$var} = $arg unless($var eq '{' || $var eq '}'); @args = @arg1; } } foreach my $k (sort keys %hash) { print "$k = $hash{$k}\n"; } __DATA__ { test = ("test123"); test = ("test123","abc","xyz"); test = ("test123","abc"); } { test1 = ("passfile","pasfile1","user"); test1 = ("passfile"); test1 = ("passfile","pasfile1"); } { test2 = ("temp"); test2 = ("temp","temp1"); test2 = ("temp","temp1","username"); } { test3 = ("betty","betty1","jack"); test3 = ("betty","betty1"); test3 = ("betty"); } [/code] [output] test = ("test123","abc","xyz"); test1 = ("passfile","pasfile1","user"); test2 = ("temp","temp1","username"); test3 = ("betty","betty1","jack"); [/output] best, Shaji --- Your talent is God's gift to you. What you do with it is your gift back to God. --- From: punit jain To: "beginners@perl.org" Sent: Tuesday, 8 January 2013 5:58 PM Subject: Regex help needed Hi , I have a file as below : - { test = ("test123"); test = ("test123","abc"); test = ("test123","abc","xyz"); } { test1 = ("passfile"); test1 = ("passfile","pasfile1"); test1 = ("passfile","pasfile1","user"); } and so on The requirement is to have the file parsing so that final output is :- test = ("test123","abc","xyz"); test1 = ("passfile","pasfile1","user"); So basically only pick the lines with maximum number of options for each type. Regards.
Re: Regex help needed
Hi punit jain, Please check my comments below. On Tue, Jan 8, 2013 at 1:28 PM, punit jain wrote: > Hi , > > I have a file as below : - > > { > test = ("test123"); > test = ("test123","abc"); > test = ("test123","abc","xyz"); > } > { > test1 = ("passfile"); > test1 = ("passfile","pasfile1"); > test1 = ("passfile","pasfile1","user"); > } > > and so on > > The requirement is to have the file parsing so that final output is :- > > test = ("test123","abc","xyz"); > test1 = ("passfile","pasfile1","user"); > > So basically only pick the lines with maximum number of options for each > type. > > Regards. > I basically agreed with Jim on this: Jim >> to learn programming will be to attempt writing a program to accomplish your task, Jim >> then post your program if you have trouble getting it to do what you want. However, if I may suggest using hash, if the lines with the maximum number of options for each type *is the last one in each case*. Since, *hash will only permit only one key*. So, splitting each line on "=", one can take key and value for hash. So, based on the data presented, one can write like so: use warnings; use strict; my %collection_hash; while () { chomp; if (/=/) { my ( $key, $value ) = split /=/, $_, 2; $collection_hash{$key} = $value; } } print $_, ' = ', $collection_hash{$_}, $/ for sort keys %collection_hash; __DATA__ { test = ("test123"); test = ("test123","abc"); test = ("test123","abc","xyz"); } { test1 = ("passfile"); test1 = ("passfile","pasfile1"); test1 = ("passfile","pasfile1","user"); } *OUTPUT:* test = ("test123","abc","xyz"); test1 = ("passfile","pasfile1","user"); Please, *NOTE* that this will only work as you want if the last line in each case has the maximum options, this is what the data you showed here presented. -- Tim
Re: Regex help needed
On Jan 8, 2013, at 4:28 AM, punit jain wrote: > Hi , > > I have a file as below : - > > { > test = ("test123"); > test = ("test123","abc"); > test = ("test123","abc","xyz"); > } > { > test1 = ("passfile"); > test1 = ("passfile","pasfile1"); > test1 = ("passfile","pasfile1","user"); > } > > and so on > > The requirement is to have the file parsing so that final output is :- > > test = ("test123","abc","xyz"); > test1 = ("passfile","pasfile1","user"); > > So basically only pick the lines with maximum number of options for each > type. The easiest solution I can think of would be to extract the first token on each line, use that token as a hash key, count the number of commas in each line, and save the line in the hash with the largest number of commas for each key. This will not work if your strings have commas. In that case, you might want to consider using a parsing module, such as Text::CSV, that will correctly handle your input data. You can use Text::CSV to split your input lines into fields and count the number of fields. However, you will first have to extract the quoted strings from the surrounding parentheses. You can use the Text::Balanced module to do that. Both Text::CSV and Text::Balanced are available at CPAN (http;//search.cpan.org). The best way for you to learn programming will be to attempt writing a program to accomplish your task, then post your program if you have trouble getting it to do what you want. Good luck. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex issue
Ya, this code is perfect Punit. This works fine for me too. Regards, Midhun On Thu, Jan 3, 2013 at 4:46 PM, Paul Johnson wrote: > On Thu, Jan 03, 2013 at 03:53:20PM +0530, punit jain wrote: > > Hi, > > > > I am facing issues in parsing using Regex. The problem definition is as > > below : - > > > I want to parse it in such a way that all data with BEGIN and END goes > in > > one file and BEGINDL and ENDDL goes in other with kind of processing I > want > > to so. > > > > I am using below code but doesnot work : - > > What doesn't work? It seems fine to me. > > > #!/usr/bin/perl > > my $file=shift; > > open( FH , "$file" ) or die("open failed: $!\n"); > > open ($fh1, ">/tmp/a"); > > open ($fh2, ">/tmp/b"); > > my $check=0; > > You probably want $check = 2 here. > > > while () { > > #next unless /BEGIN/ .. /END/ || /BEGINDL/ .. /ENDDL/ || eof; > > if($_ =~ /BEGIN$/ || ($check == 0) ) { > > print $fh1 $_; > > $check = 0; > > if($_ =~ /END$/) { > > $check = 2; > > } > > }elsif($_ =~ /BEGINDL/ || ($check == 1)) { > > print $fh2 $_; > > $check = 1; > > if($_ =~ /ENDDL/) { > > $check = 2; > > } > > } > > next unless($check == 2); > > } > > > > Any better suggestion ? > > Depends on how you define better, but perhaps > > $ perl -ne 'print if /BEGIN$/ .. /END$/' < file > /tmp/a > $ perl -ne 'print if /BEGINDL$/ .. /ENDDL$/' < file > /tmp/b > > -- > Paul Johnson - p...@pjcj.net > http://www.pjcj.net > > -- > To unsubscribe, e-mail: beginners-unsubscr...@perl.org > For additional commands, e-mail: beginners-h...@perl.org > http://learn.perl.org/ > > >
Re: Regex issue
On Thu, Jan 03, 2013 at 03:53:20PM +0530, punit jain wrote: > Hi, > > I am facing issues in parsing using Regex. The problem definition is as > below : - > I want to parse it in such a way that all data with BEGIN and END goes in > one file and BEGINDL and ENDDL goes in other with kind of processing I want > to so. > > I am using below code but doesnot work : - What doesn't work? It seems fine to me. > #!/usr/bin/perl > my $file=shift; > open( FH , "$file" ) or die("open failed: $!\n"); > open ($fh1, ">/tmp/a"); > open ($fh2, ">/tmp/b"); > my $check=0; You probably want $check = 2 here. > while () { > #next unless /BEGIN/ .. /END/ || /BEGINDL/ .. /ENDDL/ || eof; > if($_ =~ /BEGIN$/ || ($check == 0) ) { > print $fh1 $_; > $check = 0; > if($_ =~ /END$/) { > $check = 2; > } > }elsif($_ =~ /BEGINDL/ || ($check == 1)) { > print $fh2 $_; > $check = 1; > if($_ =~ /ENDDL/) { > $check = 2; > } > } > next unless($check == 2); > } > > Any better suggestion ? Depends on how you define better, but perhaps $ perl -ne 'print if /BEGIN$/ .. /END$/' < file > /tmp/a $ perl -ne 'print if /BEGINDL$/ .. /ENDDL$/' < file > /tmp/b -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex issue
Hi Punit, some comments on your code: On Thu, 3 Jan 2013 15:53:20 +0530 punit jain wrote: > Hi, > > I am facing issues in parsing using Regex. The problem definition is as > below : - > > A file with data :- > > BEGIN > country Japan > passcode 1123 > listname sales > contact ch...@example.com > contact m...@example.com > END > > BEGIN > country Namibia > passcode 9801 > listname dept > contact l...@example.com > END > > BEGINDL > country US > passcode 4123 > listname Investment > member a...@example.com > member b...@example.com > ENDDL > > BEGIN > country US > passcode 4432 > listname testing > contact lore...@test.com > contact a...@test.com > END > .. > . > ... > .. > . > > I want to parse it in such a way that all data with BEGIN and END goes in > one file and BEGINDL and ENDDL goes in other with kind of processing I want > to so. > > I am using below code but doesnot work : - > > #!/usr/bin/perl use strict; use warnings; > my $file=shift; Don't call variables "file". In your case it should be "filename". > open( FH , "$file" ) or die("open failed: $!\n"); Don't use bareword file handles or two args open. > open ($fh1, ">/tmp/a"); > open ($fh2, ">/tmp/b"); use autodie and three args open. > my $check=0; > while () { chomp and use a lexical variable to iterate over the lines (say $line or $l) instead of $_ which can be clobbered and devastated very easily. > #next unless /BEGIN/ .. /END/ || /BEGINDL/ .. /ENDDL/ || eof; > if($_ =~ /BEGIN$/ || ($check == 0) ) { You probably want « $_ eq 'BEGIN' » instead (after chomp). > print $fh1 $_; > $check = 0; > if($_ =~ /END$/) { > $check = 2; > } > }elsif($_ =~ /BEGINDL/ || ($check == 1)) { > print $fh2 $_; > $check = 1; > if($_ =~ /ENDDL/) { > $check = 2; > } > } > next unless($check == 2); Always label your nexts (and in this case I think it is redundant). See: http://perl-begin.org/tutorials/bad-elements/ Regards, Shlomi Fish -- - Shlomi Fish http://www.shlomifish.org/ Stop Using MSIE - http://www.shlomifish.org/no-ie/ Bigamy: Having one wife too many. Monogamy: The same thing! — Unknown source. Please reply to list if it's a mailing list post - http://shlom.in/reply . -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help
On 22/12/2012 11:15, punit jain wrote: Hi, I have a file like below : - BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD From <>(S___-0003) Tue Nov 16 03:10:15 2010 content-class: urn:content-classes:person Date: Tue, 16 Nov 2010 11:10:15 +0800 Subject: test Message-ID: <644938507.1420> MIME-Version: 1.0 Content-Type: text/x-vcard; charset="utf-8" BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD My requirement is to get all text between BEGIN:VCARD and END:VCARD and all the instances. So o/p should be :- BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD I am using below regex :- my $fh = IO::File->new("$file", "r"); my $script = do { local $/; <$fh> }; close $fh; if ( $script =~ m/ (^BEGIN:VCARD\s*(.*) ^END:VCARD\s+)/sgmix ){ print OUTFILE $1."\n"; } However it just prints 1st instance and not all. Any suggestions ? This is very simply done with Perl's range operator. See the program below. Rob use strict; use warnings; open my $fh, '<', 'vcard.txt' or die $!; while (<$fh>) { print if /^BEGIN:VCARD/ .. /^END:VCARD/; } **output** BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test1 REV:20101116T030833Z UID:644938456.1419. END:VCARD BEGIN:VCARD VERSION:2.1 EMAIL:te...@test.com FN:test2 REV:20101116T031015Z UID:644938507.1420 END:VCARD -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help
On Sat, 22 Dec 2012 16:45:21 +0530 punit jain wrote: > Hi, > > I have a file like below : - [snipped example - vcards with mail headers etc in between] > My requirement is to get all text between BEGIN:VCARD and END:VCARD > and all the instances. So o/p should be :- [...] > I am using below regex :- [...] > Any suggestions ? You've already had a reply indicating how to solve the problem you were having with regexes, so I won't touch on that. What I will advise, is that for any task you're trying to accomplish, there's a pretty good chance someone has already solved that and made code available on CPAN that will help you - so always check CPAN first, to avoid unnecessarily reinventing the wheel each time (unless you're doing so solely for a learning experience, of course). In this case, parsing vcards is likely a common task - a quick look on CPAN turns up Text::vCard::Addressbook: https://metacpan.org/module/Text::vCard::Addressbook From the synopsis: use Text::vCard::Addressbook; my $address_book = Text::vCard::Addressbook->new( { 'source_file' => '/path/to/address.vcf', } ); foreach my $vcard ( $address_book->vcards() ) { print "Got card for " . $vcard->fullname() . "\n"; } It will ignore the non-vcard content in the example you provided, and just provide you easy access to the data from each vcard. That's a much nicer approach than extracting it yourself with regexes. Cheers Dave P -- David Precious ("bigpresh") http://www.preshweb.co.uk/ www.preshweb.co.uk/twitter www.preshweb.co.uk/linkedinwww.preshweb.co.uk/facebook www.preshweb.co.uk/cpanwww.preshweb.co.uk/github -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex help
On Sat, Dec 22, 2012 at 04:45:21PM +0530, punit jain wrote: > Hi, > > I have a file like below : - > > BEGIN:VCARD > VERSION:2.1 > EMAIL:te...@test.com > FN:test1 > REV:20101116T030833Z > UID:644938456.1419. > END:VCARD > > >From <>(S___-0003) Tue Nov 16 03:10:15 2010 > content-class: urn:content-classes:person > Date: Tue, 16 Nov 2010 11:10:15 +0800 > Subject: test > Message-ID: <644938507.1420> > MIME-Version: 1.0 > Content-Type: text/x-vcard; charset="utf-8" > > BEGIN:VCARD > VERSION:2.1 > EMAIL:te...@test.com > FN:test2 > REV:20101116T031015Z > UID:644938507.1420 > END:VCARD > > > > My requirement is to get all text between BEGIN:VCARD and END:VCARD and all > the instances. So o/p should be :- > > BEGIN:VCARD > VERSION:2.1 > EMAIL:te...@test.com > FN:test1 > REV:20101116T030833Z > UID:644938456.1419. > END:VCARD > > BEGIN:VCARD > VERSION:2.1 > EMAIL:te...@test.com > FN:test2 > REV:20101116T031015Z > UID:644938507.1420 > END:VCARD > > I am using below regex :- > > my $fh = IO::File->new("$file", "r"); > my $script = do { local $/; <$fh> }; > close $fh; > if ( >$script =~ m/ > (^BEGIN:VCARD\s*(.*) > ^END:VCARD\s+)/sgmix > ){ > print OUTFILE $1."\n"; > } > > However it just prints 1st instance and not all. It also prints the text between the two instances, right? > Any suggestions ? You need a non greedy match .*? instead of the greedy match .* that you are using. Then you'll need to use while instead of if. Or perhaps you'd prefer: $ perl -ne 'print if /BEGIN:VCARD/ .. /END:VCARD/' < in > out or $ perl -n00e 'print if /^BEGIN:VCARD/' < in > out See perldoc perlrun for the switches and "Range Operators" from perdoc perlop for .. -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex one-liner to find several multi-line blocks of text in a single file
On Nov 1, 2012, at 12:44 AM, Thomas Smith wrote: > Hi, > > I'm trying to search a file for several matching blocks of text. A sample > of what I'm searching through is below. > > What I want to do is match "# START block #" through to the next > "# END block #" and repeat that throughout the file without > matching any of the text that falls between each matched block (that is, > the "ok: some text" lines should not be matched). Here is the one-liner I'm > using: > > perl -p -e '/^# START block #.*# END block #$/s' file.txt > > I've tried a few variations of this but with the same result--a match is > being made from the first "# START block #" to the last "# END > block #", and everything in between... I believe that the ".*", > combined with the "s" modifier, in the regex is causing this match to be > made. The '*' is what's called a "greedy" quantifier. That means it will match as many characters in the string as possible. What the regular expression engine does when it encounters the pattern '.*' is to immediately match it with as many characters as possible. Since your regular expression includes the 's' modifier, this will include newlines as well. When the RE engine sees that there are characters in the pattern after the '.*', it will start removing characters from the end of the substring matched by the '.*' until the subsequent pattern characters are also matched. This will continue until there are no characters matched by the '.*'. The result of all this is that for your pattern, the last '# END block #' substring is the one that will be matched, and the '.*' pattern will match everything between the first '# START block #' and the last '# END block #'. The way to fix this is to make the '*' quantifier "non-greedy" by putting a '?' quantifier after it. With that pattern, the RE engine will match as few characters as possible, and the first START block will pair up with the first subsequent END block. A 'g' modifier will tell the RE engine to start looking after each match for the next match in the string. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex one-liner to find several multi-line blocks of text in a single file
On Thu, Nov 01, 2012 at 12:44:08AM -0700, Thomas Smith wrote: > Hi, > > I'm trying to search a file for several matching blocks of text. A sample > of what I'm searching through is below. > > What I want to do is match "# START block #" through to the next > "# END block #" and repeat that throughout the file without > matching any of the text that falls between each matched block (that is, > the "ok: some text" lines should not be matched). Here is the one-liner I'm > using: > > perl -p -e '/^# START block #.*# END block #$/s' file.txt > > I've tried a few variations of this but with the same result--a match is > being made from the first "# START block #" to the last "# END > block #", and everything in between... I believe that the ".*", > combined with the "s" modifier, in the regex is causing this match to be > made. > > What I'm not sure how to do is tell Perl to search from START to the next > END and then start the search pattern over again with the next START-END > match. > > How might I go about achieving this? perl -ne 'print if /# START block #/ .. /# END block #/' file.txt -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex sending me mad
On 2012-07-27 17:43, Andy Bach wrote: On Fri, Jul 27, 2012 at 10:22 AM, Dr.Ruud wrote: On 2012-07-27 16:58, Andy Bach wrote: if ($model=~/(\S+)\s+(.*)\s*$/) { The \s* in the end does nothing. Well, I was thinking if it's a multi-word second match: v6 Austin Martin Then that would matches the rest of the phrase and trims trailing blanks. The '.*' already picks up any trailing blanks. So they will be in $2. But making it non-greedy, works: perl -wle ' /(\S+)\s+(.*?)\s*$/ and print "<$1><$2>" for "v6 Aston Martin "; ' (which surprised me, and I would never use it like that, because for me it is not explicit enough) -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex sending me mad
On Fri, Jul 27, 2012 at 10:22 AM, Dr.Ruud wrote: > On 2012-07-27 16:58, Andy Bach wrote: > >> if ($model=~/(\S+)\s+(.*)\s*$/) { > > > The \s* in the end does nothing. Well, I was thinking if it's a multi-word second match: v6 Austin Martin Then that would matches the rest of the phrase and trims trailing blanks. > Closer: > /(\S+)\s+(.*\S)/ Yeah, that's better - using the non-whitespace as anchors, so to speak! -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
SOLVED Re: Regex sending me mad
On Friday 27 July 2012 15:58:07 Andy Bach wrote: > Your RE is a bit odd - all that 'non-greedy *' -ness implies troubles. > The first "space star ?" can be greedy, right? You want all the > spaces/white space in a row, or rather don't want - as you're anchored > on the end, this doesn't do anything for the actual RE work. The next > "word char *" means zero or more - you want at least one, right? Word > char or non-white space? The only requirement your RE looks for is > the single blank between capture 1 and 2 - so > Kia\tVenga > > won't work. Actually anything w/o a blank will fail ... don't really > know enough about your data but try maybe: > if ($model=~/(\S+)\s+(.*)\s*$/) { Thanks Andy, Shawn and Jim. The regex I'd supplied was built up over many attempts to get it working, hence the over the top spec. The problem eventually turned out to be that the "space" between the make and model wasn't actually a space, i.e. wasn't ASCII 32. I have now got the people generating the data to generate it correctly and all is now fine, with a much simpler regex. Gary -- Gary Stainburn Group I.T. Manager Ringways Garages http://www.ringways.co.uk -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex sending me mad
On 2012-07-27 16:58, Andy Bach wrote: if ($model=~/(\S+)\s+(.*)\s*$/) { The \s* in the end does nothing. Closer: /(\S+)\s+(.*\S)/ Then play with this: perl -Mstrict -we' my $data= $ARGV[0] ? q{Ford} : qq{ \t Fiat Ulysse 2.1 TD}; printf qq{<%s> <%s>\n}, split( q{ }, $data, 2 ), q{oops}; printf qq{<%s> <%s>\n}, $data =~ / (\S+) \s* ( (?: .* \S )? )/x; ' 1 <> -- Ruud -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex sending me mad
On Fri, Jul 27, 2012 at 9:04 AM, Gary Stainburn wrote: > print STDERR "About to split '$model'\n"; > if ($model=~/ *?(\w*) (.*?) *$/) { > $enqmake=lc($1); > $model=$2; > print STDERR "model split into '$enqmake' '$model'\n"; > } > } # extract make > > This generates: > > enqmake='' model='Kia Venga' > About to split 'Kia Venga' Your RE is a bit odd - all that 'non-greedy *' -ness implies troubles. The first "space star ?" can be greedy, right? You want all the spaces/white space in a row, or rather don't want - as you're anchored on the end, this doesn't do anything for the actual RE work. The next "word char *" means zero or more - you want at least one, right? Word char or non-white space? The only requirement your RE looks for is the single blank between capture 1 and 2 - so Kia\tVenga won't work. Actually anything w/o a blank will fail ... don't really know enough about your data but try maybe: if ($model=~/(\S+)\s+(.*)\s*$/) { -- a Andy Bach, afb...@gmail.com 608 658-1890 cell 608 261-5738 wk -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex sending me mad
On Fri, 27 Jul 2012 07:29:13 -0700 Jim Gibson wrote: > Why aren't you using the split function? > > ($model,$engmake) = split(' ',$model); That would be: ($model,$engmake) = split(' ',$model, 2); See `perldoc -f split` for details. -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. _Perl links_ official site : http://www.perl.org/ beginners' help : http://learn.perl.org/faq/beginners.html advance help: http://perlmonks.org/ documentation : http://perldoc.perl.org/ news: http://perlsphere.net/ repository : http://www.cpan.org/ blog: http://blogs.perl.org/ regional groups : http://www.pm.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex sending me mad
On Jul 27, 2012, at 7:04 AM, Gary Stainburn wrote: > Hi folks. > > I'm struggling to see what I'm doing wrong. I have the following code in one > of my programs but it isn't working as it should. > > > print STDERR "enqmake='$enqmake' model='$model'\n"; > if (!$enqmake && $model) { # extract make > print STDERR "About to split '$model'\n"; > if ($model=~/ *?(\w*) (.*?) *$/) { >$enqmake=lc($1); >$model=$2; >print STDERR "model split into '$enqmake' '$model'\n"; > } > } # extract make > > This generates: > > enqmake='' model='Kia Venga' > About to split 'Kia Venga' > > I have a test script which works fine. Can anyone see what I'm doing wrong? No. Your script works fine for me if I precede it with the following two lines: my $model = 'Kia Venga'; my $engmake; > > #!/usr/bin/perl -w > > use warnings; > use strict; > > my $t='Kia Venga'; > > if ($t=~/ *?(\w*) (.*?) *$/) { > print "1='$1' 2='$2'\n"; > } > > [root@ollie exim]# ~/t > 1='Kia' 2='Venga' Your test script also works fine. Therefore, it must be something else in your larger program. I suggest you use the escape sequence \s for whitespace instead of just using the space character. You should also use the x modifier so that spaces in your pattern will be ignored. That will allow you to determine by inspection what your pattern is really doing. I also use m{ } to delineate the pattern and \z to anchor the end of match instead of $: if( $model =~ m{ \s*? (\w*) \s (.*?) \s* \z }x ) { ... Why aren't you using the split function? ($model,$engmake) = split(' ',$model); -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex character classes: n OR m
On Fri, 6 Jul 2012 18:59:00 +0100 "Adam J. Gamble" wrote: > Dear All, > > I'm taking a (highly belated) first look at Perl today. From a > background in Python, I'm coming to Perl, primarily out of curiosity > with what it can do with regular expressions. > > To get to the point— is it possible to match a character class with a > repeater that requires an exactly *n* OR *m* matches, rather than the > traditional *{n, m}*. I've taken a look at > http://perldoc.perl.org/perlrequick.html#Using-character-classes, > which implies this wouldn't be possible? But, putting faith Perl's > reputation for inherent quirkiness... if possible, I'd love to know > what a solution would look like? Try: m{ (?: .{n} | .{m} ) }msx Of course, replace the period with the character set you're looking for. -- Just my 0.0002 million dollars worth, Shawn Programming is as much about organization and communication as it is about coding. _Perl links_ official site : http://www.perl.org/ beginners' help : http://learn.perl.org/faq/beginners.html advance help: http://perlmonks.org/ documentation : http://perldoc.perl.org/ news: http://perlsphere.net/ repository : http://www.cpan.org/ blog: http://blogs.perl.org/ regional groups : http://www.pm.org/ -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/
Re: Regex character classes: n OR m
On Fri, Jul 06, 2012 at 06:59:00PM +0100, Adam J. Gamble wrote: > Dear All, > > I'm taking a (highly belated) first look at Perl today. From a background > in Python, I'm coming to Perl, primarily out of curiosity with what it can > do with regular expressions. Welcome! > To get to the point— is it possible to match a character class with a > repeater that requires an exactly *n* OR *m* matches, rather than the > traditional *{n, m}*. I've taken a look at > http://perldoc.perl.org/perlrequick.html#Using-character-classes, which > implies this wouldn't be possible? But, putting faith Perl's reputation for > inherent quirkiness... if possible, I'd love to know what a solution would > look like? You're correct that there is no way to do this directly, but if you look at the section just below (Matching this or that) you can see the basis for a solution. So, to match either three or five "a"s for example, you could do this: /^(?:a{3}|a{5})$/ -- Paul Johnson - p...@pjcj.net http://www.pjcj.net -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/