Re: Regular expression: option match after a greedy/non-greedy match
When i run this script i get following Error bash-4.2$ ./regex.pl feature version v5.16.0 required--this is only version v1.160.0 at ./ regex.pl line 4. BEGIN failed--compilation aborted at ./regex.pl line 4. But I am using perl version as swon below. bash-4.2$ perl -v This is perl 5, version 16, subversion 3 (v5.16.3) built for i686-linux Copyright 1987-2012, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using man perl or perldoc perl. If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. On Wed, Sep 17, 2014 at 8:52 AM, Jing Yu logus...@googlemail.com wrote: Hi Viet-Duc Le, On 17 Sep 2014, at 10:23, Viet-Duc Le leviet...@kaist.ac.kr wrote: Greeting from S. Korea ! I am parsing the output of ffmpeg with perl. Particular, I want to print only these lines among the output and capturing the resolution, i.e. 1280x720. Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Stream #0:2(eng): Subtitle: ass (default) . My code is following: # INFO is pipe to ffmpeg # Here, the print $1 $2 $3 $4\n is for debugging . while ( INFO ) { if ( regular expression ) { print $1 $2 $3 $4\n; } } Desirable outputs: - Video 1280 720 Audio Subtitle Regarding the regular expession: 1. /Stream #\d:\d.*(Video|Audio|Subtitle).*(\d+)x(\d+)/ (greedy) - Video 0 720 Q: why does $2 give 0? I remember .* match backward starting from the end of the string. Then it should be Video 1280 720 as output. that '0' is from 128'0', since the '.*' consumes 128. What it does under the hood is .* first will reach to the end of the target string, and then backtract according to the following regex. Once the whole regex is satisfied, it will stop backtracting, although further retracting will possibly also satisfy the regex. 2. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(\d+)x(\d+)/ (non greedy) - Video 1280 720 Q: I can understand this, but again I think (1) should work too. 3. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(?:(\d+)x(\d+))?/ ( non-capturing optional group ) - Video Audio Subtitle Q: It seems that the resolution part is ignored because it is optional. Otherwise, the output will contains Video only as (1) and (2). How can I circumvent this ? that ?: prevents $ variables to capture the matching regex group. I guess you can get rid of it. The trailing ? already tells the regex group to match optionally. It is equivalent to {0,1}. The big problem coming with it is the middle .*?. Since the last part is optional, .*? will just match the least number of char possible, which is nothing. 4. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?.*?$/ - Video Audio Subtitle Q: I tried to match things after the resolution, hoping that it will be captured. Again the ?: prevents it being captured. .+? in the middle is better, now it matches ':'. 5. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?(.*?)$/ ( let's capture the last part) - Videoh264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Audioac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Subtitleass (default) Q: Now $2 and $3 is undef, and the rest of the string went to $4. Again, I am quite puzzled by the output. If it is optional, it is non greedy. So everything goes to the (.*?)$. Please pardon my long email. I hope someone can point out the flaws in my logic. Here, I can match and print Video/Audio/Subtitle separately. But I wish for one expression to match them all, one expression to print them. In general, it is a better practise to add 'x' to your regex to make it more readable. My regex might not be the best, but it works as expected. use strict; use warnings; use 5.16.0; while(DATA){ / (Video|Audio|Subtitle) (?: (?:.) +? (\d+x\d+) || (?:.)+ ) /x and say $1, $2, $3, $4; } __DATA__ Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Stream #0:2(eng): Subtitle: ass (default) The '||' operator will first check the group before it. It will only look at the other group if the first group fails. This puts your resolution group matching as priority, but not necessity. Hope this helps. Jing -- * Don't ask them WHY they hurt you, because all they'll tell you is lies and excuses. Just know they were wrong, and try to move
Re: Regular expression: option match after a greedy/non-greedy match
On 17 Sep 2014, at 17:08, Uday Vernekar vernekaru...@gmail.com wrote: When i run this script i get following Error bash-4.2$ ./regex.pl feature version v5.16.0 required--this is only version v1.160.0 at ./regex.pl line 4. BEGIN failed--compilation aborted at ./regex.pl line 4. But I am using perl version as swon below. bash-4.2$ perl -v This is perl 5, version 16, subversion 3 (v5.16.3) built for i686-linux Copyright 1987-2012, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using man perl or perldoc perl. If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. Strange... I only used 5.16.0 for the feature 'say'. You can of course omit that part and change 'say' to 'print', and hang a \n at the end instead. Cheers, Jing
Re: Regular expression: option match after a greedy/non-greedy match
when i change use 5.16.0; to use feature ':5.10'; it works i get following output bash-4.2$ ./regex.pl Use of uninitialized value $3 in say at ./regex.pl line 7, DATA line 1. Use of uninitialized value $4 in say at ./regex.pl line 7, DATA line 1. Video1280x720 Use of uninitialized value $2 in say at ./regex.pl line 7, DATA line 2. Use of uninitialized value $3 in say at ./regex.pl line 7, DATA line 2. Use of uninitialized value $4 in say at ./regex.pl line 7, DATA line 2. Audio Use of uninitialized value $2 in say at ./regex.pl line 7, DATA line 3. Use of uninitialized value $3 in say at ./regex.pl line 7, DATA line 3. Use of uninitialized value $4 in say at ./regex.pl line 7, DATA line 3. Subtitle how these two use statements differ. use 5.16.0; perl regex.pl works why ./regex.pl doesnt work. it gives following error feature version v5.16.0 required--this is only version v1.160.0 at ./ regex.pl line 4. BEGIN failed--compilation aborted at ./regex.pl line 4. On Wed, Sep 17, 2014 at 2:38 PM, Uday Vernekar vernekaru...@gmail.com wrote: When i run this script i get following Error bash-4.2$ ./regex.pl feature version v5.16.0 required--this is only version v1.160.0 at ./ regex.pl line 4. BEGIN failed--compilation aborted at ./regex.pl line 4. But I am using perl version as swon below. bash-4.2$ perl -v This is perl 5, version 16, subversion 3 (v5.16.3) built for i686-linux Copyright 1987-2012, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using man perl or perldoc perl. If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. On Wed, Sep 17, 2014 at 8:52 AM, Jing Yu logus...@googlemail.com wrote: Hi Viet-Duc Le, On 17 Sep 2014, at 10:23, Viet-Duc Le leviet...@kaist.ac.kr wrote: Greeting from S. Korea ! I am parsing the output of ffmpeg with perl. Particular, I want to print only these lines among the output and capturing the resolution, i.e. 1280x720. Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Stream #0:2(eng): Subtitle: ass (default) . My code is following: # INFO is pipe to ffmpeg # Here, the print $1 $2 $3 $4\n is for debugging . while ( INFO ) { if ( regular expression ) { print $1 $2 $3 $4\n; } } Desirable outputs: - Video 1280 720 Audio Subtitle Regarding the regular expession: 1. /Stream #\d:\d.*(Video|Audio|Subtitle).*(\d+)x(\d+)/ (greedy) - Video 0 720 Q: why does $2 give 0? I remember .* match backward starting from the end of the string. Then it should be Video 1280 720 as output. that '0' is from 128'0', since the '.*' consumes 128. What it does under the hood is .* first will reach to the end of the target string, and then backtract according to the following regex. Once the whole regex is satisfied, it will stop backtracting, although further retracting will possibly also satisfy the regex. 2. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(\d+)x(\d+)/ (non greedy) - Video 1280 720 Q: I can understand this, but again I think (1) should work too. 3. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(?:(\d+)x(\d+))?/ ( non-capturing optional group ) - Video Audio Subtitle Q: It seems that the resolution part is ignored because it is optional. Otherwise, the output will contains Video only as (1) and (2). How can I circumvent this ? that ?: prevents $ variables to capture the matching regex group. I guess you can get rid of it. The trailing ? already tells the regex group to match optionally. It is equivalent to {0,1}. The big problem coming with it is the middle .*?. Since the last part is optional, .*? will just match the least number of char possible, which is nothing. 4. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?.*?$/ - Video Audio Subtitle Q: I tried to match things after the resolution, hoping that it will be captured. Again the ?: prevents it being captured. .+? in the middle is better, now it matches ':'. 5. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?(.*?)$/ ( let's capture the last part) - Videoh264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Audioac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Subtitleass (default) Q: Now $2 and $3 is undef, and the rest of the string went to $4. Again, I am quite puzzled by the output. If it is optional, it is non greedy. So everything goes to the (.*?)$. Please pardon my long email. I hope someone can point out the flaws in my logic. Here, I can match and print
RE: Re: Regular expression: option match after a greedy/non-greedy match
Dear Jing, I was confused when I started out the regular expression. Many thanks for the kind and detailed explanation. After reading more on perl regex, I think I have a better grasp of the greedy/non-greedy concept now. Your code also worked well for my task. Regards, Viet-Duc ---Original Message--- From: Jing Yu logus...@googlemail.com To: Viet-Duc Le leviet...@kaist.ac.kr Sent date: 2014-09-17 12:20:29 GMT +0900 (Asia/Seoul) Subject: Re: Regular expression: option match after a greedy/non-greedy match Hi Viet-Duc Le, On 17 Sep 2014, at 10:23, Viet-Duc Le leviet...@kaist.ac.kr wrote: Greeting from S. Korea ! I am parsing the output of ffmpeg with perl. Particular, I want to print only these lines among the output and capturing the resolution, i.e. 1280x720. Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Stream #0:2(eng): Subtitle: ass (default) . My code is following: # INFO is pipe to ffmpeg # Here, the print $1 $2 $3 $4\n is for debugging . while ( INFO ) { if ( regular expression ) { print $1 $2 $3 $4\n; } } Desirable outputs: - Video 1280 720 Audio Subtitle Regarding the regular expession: 1. /Stream #\d:\d.*(Video|Audio|Subtitle).*(\d+)x(\d+)/ (greedy) - Video 0 720 Q: why does $2 give 0? I remember .* match backward starting from the end of the string. Then it should be Video 1280 720 as output. that '0' is from 128'0', since the '.*' consumes 128. What it does under the hood is .* first will reach to the end of the target string, and then backtract according to the following regex. Once the whole regex is satisfied, it will stop backtracting, although further retracting will possibly also satisfy the regex. 2. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(\d+)x(\d+)/ (non greedy) - Video 1280 720 Q: I can understand this, but again I think (1) should work too. 3. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(?:(\d+)x(\d+))?/ ( non-capturing optional group ) - Video Audio Subtitle Q: It seems that the resolution part is ignored because it is optional. Otherwise, the output will contains Video only as (1) and (2). How can I circumvent this ? that ?: prevents $ variables to capture the matching regex group. I guess you can get rid of it. The trailing ? already tells the regex group to match optionally. It is equivalent to {0,1}. The big problem coming with it is the middle .*?. Since the last part is optional, .*? will just match the least number of char possible, which is nothing. 4. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?.*?$/ - Video Audio Subtitle Q: I tried to match things after the resolution, hoping that it will be captured. Again the ?: prevents it being captured. .+? in the middle is better, now it matches ':'. 5. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?(.*?)$/ ( let's capture the last part) - Videoh264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Audioac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Subtitleass (default) Q: Now $2 and $3 is undef, and the rest of the string went to $4. Again, I am quite puzzled by the output. If it is optional, it is non greedy. So everything goes to the (.*?)$. Please pardon my long email. I hope someone can point out the flaws in my logic. Here, I can match and print Video/Audio/Subtitle separately. But I wish for one expression to match them all, one expression to print them. In general, it is a better practise to add 'x' to your regex to make it more readable. My regex might not be the best, but it works as expected. use strict;use warnings;use 5.16.0; while(DATA){/ (Video|Audio|Subtitle) (?: (?:.) +? (\d+x\d+) || (?:.)+ ) /xand say $1, $2, $3, $4;} __DATA__Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default)Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default)Stream #0:2(eng): Subtitle: ass (default) The '||' operator will first check the group before it. It will only look at the other group if the first group fails. This puts your resolution group matching as priority, but not necessity. Hope this helps.Jing
Regular expression: option match after a greedy/non-greedy match
p{margin:0;padding:0;} Greeting from S. Korea ! I am parsing the output of ffmpeg with perl. Particular, I want to print only these lines among the output and capturing the resolution, i.e. 1280x720. Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Stream #0:2(eng): Subtitle: ass (default) . My code is following: # INFO is pipe to ffmpeg # Here, the print $1 $2 $3 $4\n is for debugging . while ( INFO ) { if ( regular expression ) { print $1 $2 $3 $4\n; } } Desirable outputs: - Video 1280 720 Audio Subtitle Regarding the regular expession: 1. /Stream #\d:\d.*(Video|Audio|Subtitle).*(\d+)x(\d+)/ (greedy) - Video 0 720 Q: why does $2 give 0? I remember .* match backward starting from the end of the string. Then it should be Video 1280 720 as output. 2. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(\d+)x(\d+)/ (non greedy) - Video 1280 720 Q: I can understand this, but again I think (1) should work too. 3. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(?:(\d+)x(\d+))?/ ( non-capturing optional group ) - Video Audio Subtitle Q: It seems that the resolution part is ignored because it is optional. Otherwise, the output will contains Video only as (1) and (2). How can I circumvent this ? 4. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?.*?$/ - Video Audio Subtitle Q: I tried to match things after the resolution, hoping that it will be captured. 5. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?(.*?)$/ ( let's capture the last part) - Videoh264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Audioac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Subtitleass (default) Q: Now $2 and $3 is undef, and the rest of the string went to $4. Again, I am quite puzzled by the output. Please pardon my long email. I hope someone can point out the flaws in my logic. Here, I can match and print Video/Audio/Subtitle separately. But I wish for one expression to match them all, one expression to print them. Best regards, Viet-Duc
Re: Regular expression: option match after a greedy/non-greedy match
Hi Viet-Duc Le, On 17 Sep 2014, at 10:23, Viet-Duc Le leviet...@kaist.ac.kr wrote: Greeting from S. Korea ! I am parsing the output of ffmpeg with perl. Particular, I want to print only these lines among the output and capturing the resolution, i.e. 1280x720. Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Stream #0:2(eng): Subtitle: ass (default) . My code is following: # INFO is pipe to ffmpeg # Here, the print $1 $2 $3 $4\n is for debugging . while ( INFO ) { if ( regular expression ) { print $1 $2 $3 $4\n; } } Desirable outputs: - Video 1280 720 Audio Subtitle Regarding the regular expession: 1. /Stream #\d:\d.*(Video|Audio|Subtitle).*(\d+)x(\d+)/ (greedy) - Video 0 720 Q: why does $2 give 0? I remember .* match backward starting from the end of the string. Then it should be Video 1280 720 as output. that '0' is from 128'0', since the '.*' consumes 128. What it does under the hood is .* first will reach to the end of the target string, and then backtract according to the following regex. Once the whole regex is satisfied, it will stop backtracting, although further retracting will possibly also satisfy the regex. 2. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(\d+)x(\d+)/ (non greedy) - Video 1280 720 Q: I can understand this, but again I think (1) should work too. 3. /Stream #\d:\d.*(Video|Audio|Subtitle).*?(?:(\d+)x(\d+))?/ ( non-capturing optional group ) - Video Audio Subtitle Q: It seems that the resolution part is ignored because it is optional. Otherwise, the output will contains Video only as (1) and (2). How can I circumvent this ? that ?: prevents $ variables to capture the matching regex group. I guess you can get rid of it. The trailing ? already tells the regex group to match optionally. It is equivalent to {0,1}. The big problem coming with it is the middle .*?. Since the last part is optional, .*? will just match the least number of char possible, which is nothing. 4. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?.*?$/ - Video Audio Subtitle Q: I tried to match things after the resolution, hoping that it will be captured. Again the ?: prevents it being captured. .+? in the middle is better, now it matches ':'. 5. /Stream #\d:\d.*(Video|Audio|Subtitle).+?(?:(\d+)x(\d+))?(.*?)$/ ( let's capture the last part) - Videoh264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Audioac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Subtitleass (default) Q: Now $2 and $3 is undef, and the rest of the string went to $4. Again, I am quite puzzled by the output. If it is optional, it is non greedy. So everything goes to the (.*?)$. Please pardon my long email. I hope someone can point out the flaws in my logic. Here, I can match and print Video/Audio/Subtitle separately. But I wish for one expression to match them all, one expression to print them. In general, it is a better practise to add 'x' to your regex to make it more readable. My regex might not be the best, but it works as expected. use strict; use warnings; use 5.16.0; while(DATA){ / (Video|Audio|Subtitle) (?: (?:.) +? (\d+x\d+) || (?:.)+ ) /x and say $1, $2, $3, $4; } __DATA__ Stream #0:0: Video: h264 (High), yuv420p, 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 47.95 tbc (default) Stream #0:1(jpn): Audio: ac3, 48000 Hz, stereo, fltp, 192 kb/s (default) Stream #0:2(eng): Subtitle: ass (default) The '||' operator will first check the group before it. It will only look at the other group if the first group fails. This puts your resolution group matching as priority, but not necessity. Hope this helps. Jing