Re: splitting strings
On Aug 30, 2006, at 3:42 AM, Dr.Ruud wrote: Aaargh, I was suddenly mixing up split /()/ and /()/g. I really shouldn't post anymore without testing. Thank you all for the clarifications regarding split(). I should pay more attention when I read the documentation (or get more sleep). -Hien. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
splitting strings
Hello, Given the string 'abcdefghijklmnopq', I wish to add a line break every 5 characters: abcde fghij klmno pq Method 1 below works, but my split() in method 2 captures 'something unexpected' at each match. Could someone please tell me what is split capturing that I am not seeing? Thanks in advance for your answers, -Hien My script: == #!/usr/bin/perl -w use strict; my $foo = 'abcdefghijklmnopq'; # Method 1 print( \nMethod 1\n ); my $foo_length = length( $foo ); for( my $i = 0; $i $foo_length; $i += 5 ) { my $bar1 = substr( $foo, $i, 5 ); print( $bar1, \n ); } # Method 2 print( \nMethod 2\n ); my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!? my $bar2_nb = @bar2; print( join( \n, @bar2) ); print( \nElements in array = , $bar2_nb, \n ); # 7 elements in the array. __END__ My script's output: === [EMAIL PROTECTED] $ perl weird_string_manipulation.pl Method 1 abcde fghij klmno pq Method 2 abcde fghij klmno pq Elements in array = 7 -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: splitting strings
Hien Le schreef: my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!? Because of the capturing (), split also returns the separators. See perldoc -f split. Suggestion: my @bar2 = split( /./, $foo ); -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: splitting strings
Hien Le wrote: Hello, Hello, Given the string 'abcdefghijklmnopq', I wish to add a line break every 5 characters: abcde fghij klmno pq $ perl -e' my $foo = q[abcdefghijklmnopq]; print $foo\n; $foo =~ s/(.{0,5})/$1\n/g; print $foo; ' abcdefghijklmnopq abcde fghij klmno pq Method 1 below works, but my split() in method 2 captures 'something unexpected' at each match. Could someone please tell me what is split capturing that I am not seeing? split( /X/, 'aXb' ) splits the string using the pattern and returns the list ( 'a', 'b' ). split( /(X)/, 'aXb' ) splits the string using the pattern and returns the list ( 'a', 'X', 'b' ). Everything not in the pattern is returned in the list unless you use capturing parentheses and then everything in the capturing parentheses is returned as well. My script: == #!/usr/bin/perl -w use strict; my $foo = 'abcdefghijklmnopq'; # Method 1 print( \nMethod 1\n ); my $foo_length = length( $foo ); for( my $i = 0; $i $foo_length; $i += 5 ) { my $bar1 = substr( $foo, $i, 5 ); print( $bar1, \n ); } # Method 2 print( \nMethod 2\n ); my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!? my $bar2_nb = @bar2; print( join( \n, @bar2) ); print( \nElements in array = , $bar2_nb, \n ); # 7 elements in the array. __END__ $ perl -e' my $foo = q[abcdefghijklmnopq]; print $foo\n; my @bar = unpack q[(a5)*], $foo; print map $_\n, @bar; ' abcdefghijklmnopq abcde fghij klmno pq $ perl -e' my $foo = q[abcdefghijklmnopq]; print $foo\n; my @bar = $foo =~ /.{0,5}/g; print map $_\n, @bar; ' abcdefghijklmnopq abcde fghij klmno pq John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: splitting strings
On 08/29/2006 06:52 AM, Hien Le wrote: [...] # Method 2 print( \nMethod 2\n ); my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!? [...] The comments made by Dr. Ruud and John W. Krahn are correct. Split is returning the empty strings between delimiter segments in the original string. To zap these out, do this: my @bar2 = grep length, split (/([a-z]{5})/, $foo); Any substrings with a length of zero will be removed by grep length. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: splitting strings
Mumia W. schreef: Hien Le: [...] # Method 2 print( \nMethod 2\n ); my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!? [...] The comments made by Dr. Ruud and John W. Krahn are correct. Split is returning the empty strings between delimiter segments in the original string. To zap these out, do this: my @bar2 = grep length, split (/([a-z]{5})/, $foo); Any substrings with a length of zero will be removed by grep length. Huh? Why not just remove the capturing ()? Again: perldoc -f split -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: splitting strings
On 08/29/2006 05:02 PM, Dr.Ruud wrote: Mumia W. schreef: Hien Le: [...] # Method 2 print( \nMethod 2\n ); my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!? [...] The comments made by Dr. Ruud and John W. Krahn are correct. Split is returning the empty strings between delimiter segments in the original string. To zap these out, do this: my @bar2 = grep length, split (/([a-z]{5})/, $foo); Any substrings with a length of zero will be removed by grep length. Huh? Why not just remove the capturing ()? Again: perldoc -f split Without the capturing parentheses, split will remove every sequence of five alphabetic characters from the output. Only 'pq' will remain: use Data::Dumper; my $foo = 'abcdefghijklmnopq'; my @foo = split /[a-z]{5}/, $foo; print Dumper([EMAIL PROTECTED]); __END__ That program prints this: $VAR1 = [ '', '', '', 'pq' ]; -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: splitting strings
Mumia W. schreef: Dr.Ruud: Mumia W.: my @bar2 = grep length, split (/([a-z]{5})/, $foo); Any substrings with a length of zero will be removed by grep length. Huh? Why not just remove the capturing ()? Without the capturing parentheses, split will remove every sequence of five alphabetic characters from the output. Only 'pq' will remain: use Data::Dumper; my $foo = 'abcdefghijklmnopq'; my @foo = split /[a-z]{5}/, $foo; print Dumper([EMAIL PROTECTED]); __END__ That program prints this: $VAR1 = [ '', '', '', 'pq' ]; Aaargh, I was suddenly mixing up split /()/ and /()/g. I really shouldn't post anymore without testing. #!/usr/bin/perl use warnings ; use strict ; use Data::Dumper ; my $foo = 'abcdefghijklmnopq' ; my @foo = ($foo =~ /([a-z]{1,5})/g) ; print Dumper([EMAIL PROTECTED]); __END__ $VAR1 = [ 'abcde', 'fghij', 'klmno', 'pq' ]; -- Affijn, Ruud Gewoon is een tijger. -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] http://learn.perl.org/ http://learn.perl.org/first-response
Re: splitting strings with quoted white space
On Wednesday 06 June 2001 22:59, Jeff 'japhy' Pinyan wrote: On Jun 6, Accountant Bob said: How about this: (the same but unrolled) my @elements; push @elements, $1 while /\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or I think that /\G\s*((?:(?:\\.)|[^\\])*?)/gc is shorter and also matches all \X sequences (the trick is that \\. is longer than [^\\] -- Ondrej Par Internet Securities Software Engineer e-mail: [EMAIL PROTECTED] Phone: +420 2 222 543 45 ext. 112
Re: splitting strings with quoted white space
On Jun 7, Ondrej Par said: On Wednesday 06 June 2001 22:59, Jeff 'japhy' Pinyan wrote: On Jun 6, Accountant Bob said: How about this: (the same but unrolled) my @elements; push @elements, $1 while /\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or I think that /\G\s*((?:(?:\\.)|[^\\])*?)/gc is shorter and also matches all \X sequences (the trick is that \\. is longer than [^\\] The formula for unrolling the loop is NORMAL* (SPECIAL NORMAL*)* Here, NORMAL is /[^\\]/, and SPECIAL is /\\./ -- at least, I'm using \\., since I want any backslash to pass through ok. Thus, our regex is: push @elements, $1 while /\G\s*([^\\]*(?:\\.[^\\]*)*)/gc or /\G\s*'([^\\']*(?:\\.[^\\']*)*)'/gc or /\G\s*(\S+)/gc; Of course, that last regex can changed to your whims... -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun. Are you a Monk? http://www.perlmonks.com/ http://forums.perlguru.com/ Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/ Acacia Fraternity, Rensselaer Chapter. Brother #734 ** Manning Publications, Co, is publishing my Perl Regex book **
RE: splitting strings with quoted white space
On Jun 7, Accountant Bob said: can any one explain to me why this doesn't seem to work: push @elements, $2 while /\G\s*(['])([^\\\1]*(?:\\.[^\\\1]*)*)\1/gc or /\G(\s*)(\S+)/gc; # k i know that's kinda kloogy, but I'm experimenting. Let's find out why: friday:~ $ explain \G\s*(['])([^\\\1]*(?:\\.[^\\\1]*)*)\1 [snip] -- [^\\\1]* any character except: '\\', '\1' (0 or more times (matching the most amount possible)) -- [snip] As you see, putting \1 in a character class matches the character \1. That's not what we wanted; but character classes must be known at the regex's compile-time. You could do: push @matches, $+ while /\G\s*(['])((??{[^$1]*)(?:\\.(??{[^$1]*))*)/gc or /\G\s*(\S+)/gc; But that is ugly, and requires Perl 5.6.0+. -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun. Are you a Monk? http://www.perlmonks.com/ http://forums.perlguru.com/ Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/ Acacia Fraternity, Rensselaer Chapter. Brother #734 ** Manning Publications, Co, is publishing my Perl Regex book **
Re: splitting strings with quoted white space
On 05 Jun 2001 17:49:53 -0700, Peter Cornelius wrote: snip local $_ = 'name = quoted string with space'; snip If your pattern always looks like this then try: #!/usr/bin/perl use strict;#make me behave my $name; #holds the key part of config my $value; #the value of part of config my $delim = =; #the delimiter between key and value open(FILE, shift) or die Could not open $_:$!; #open first argument or #die trying while (FILE) { #while there are lines left assign next line to $_ chomp; #remove record seperator from line (ie \n) unless (/ ($delim) /) { die Bad file data } #match = or #die trying $name = $`; #put everthing before the match into $name $value = $'; #put everything after the match into $value #print uses [] to make white space easier to see print name = [$name] :: value = [$value]\n; } close FILE; Its output looks like this: [cowens@cowens cowens]$ cat data this = this is another line that = this is a line that countains =, oh no! bugger = me [cowens@cowens cowens]$ ./test.pl data name = [this] :: value = [this is another line] name = [that] :: value = [this is a line that countains =, oh no!] name = [bugger] :: value = [me] -- Today is Boomtime, the 11st day of Confusion in the YOLD 3167
Re: splitting strings with quoted white space
Ondrej == Ondrej Par [EMAIL PROTECTED] writes: Ondrej my $line = 'whatever this \'line is\''; Ondrej $line =~ s/\s*$//; Ondrej my @parts; Ondrej while ($line ne '') { Ondrej if ($line =~ m/^\s*(['])((?:(?:\\.)|[^\\])*?)\1(.*)/) { Ondrej push @parts, $2; Ondrej $line = $3; Ondrej } elsif ($line =~ m/^\s*(\S+)(.*)/) { Ondrej push @parts, $1; Ondrej $line = $2; Ondrej } Ondrej } That's a good approach, but maybe this one is more straightforward: $_ = q{whatever this 'line is'}; my @elements; push @elements, $1 while /\G\s*(.*?)/gc or /\G\s*'(.*?)'/gc or /\G\s*(\S+)/gc; print map $_, @elements; The use of scalar /\G./gc to inchworm along a string is a powerful technique. -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: splitting strings with quoted white space
On Jun 6, Randal L. Schwartz said: my @elements; push @elements, $1 while /\G\s*(.*?)/gc or /\G\s*'(.*?)'/gc or /\G\s*(\S+)/gc; Randal, would you mind if I used this as an example of \G and /gc in my regex book? Due credit would be given, of course. -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun. Are you a Monk? http://www.perlmonks.com/ http://forums.perlguru.com/ Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/ Acacia Fraternity, Rensselaer Chapter. Brother #734 ** Manning Publications, Co, is publishing my Perl Regex book **
Re: splitting strings with quoted white space
Jeff == Jeff 'japhy' Pinyan [EMAIL PROTECTED] writes: Jeff On Jun 6, Randal L. Schwartz said: my @elements; push @elements, $1 while /\G\s*(.*?)/gc or /\G\s*'(.*?)'/gc or /\G\s*(\S+)/gc; Jeff Randal, would you mind if I used this as an example of \G and /gc in my Jeff regex book? Due credit would be given, of course. Yeah, of course you can use it. Did you mean that, and this, to go to the beginners list? :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
Re: splitting strings with quoted white space
On Wednesday 06 June 2001 18:19, Randal L. Schwartz wrote: That's a good approach, but maybe this one is more straightforward: $_ = q{whatever this 'line is'}; my @elements; push @elements, $1 while /\G\s*(.*?)/gc or /\G\s*'(.*?)'/gc or /\G\s*(\S+)/gc; print map $_, @elements; The use of scalar /\G./gc to inchworm along a string is a powerful technique. Yes, this is better. With one exception - you're not handling \' and \ (but this can be copied from previous example). -- Ondrej Par Internet Securities Software Engineer e-mail: [EMAIL PROTECTED] Phone: +420 2 222 543 45 ext. 112
Re: splitting strings with quoted white space
On Jun 6, Randal L. Schwartz said: Jeff == Jeff 'japhy' Pinyan [EMAIL PROTECTED] writes: Jeff On Jun 6, Randal L. Schwartz said: my @elements; push @elements, $1 while /\G\s*(.*?)/gc or /\G\s*'(.*?)'/gc or /\G\s*(\S+)/gc; Jeff Randal, would you mind if I used this as an example of \G and /gc in my Jeff regex book? Due credit would be given, of course. Yeah, of course you can use it. Did you mean that, and this, to go to the beginners list? :) Yes. I'd like to make any newcomers aware of the book, and I'd like any input anyone else has one the regex. Ondrej, I believe, just mentioned the lack of backslash-support. -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun. Are you a Monk? http://www.perlmonks.com/ http://forums.perlguru.com/ Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/ Acacia Fraternity, Rensselaer Chapter. Brother #734 ** Manning Publications, Co, is publishing my Perl Regex book **
Re: splitting strings with quoted white space
Randal == Randal L Schwartz [EMAIL PROTECTED] writes: Randal my @elements; Randal push @elements, $1 while Randal /\G\s*((?:[^\\]|\\|)*)/gc or Randal /\G\s*'((?:[^\\']|\\'|)*)'/gc or Randal /\G\s*([^\s']\S*)/gc; Randal Leaving undefined something like \X as malformed. :) Which can be tested with die unless /\G\z/g; :-) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
RE: splitting strings with quoted white space
How about this: (the same but unrolled) my @elements; push @elements, $1 while /\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or /\G\s*'([^\\']*(?:\\['\\][^\\']*)*)'/gc or /\G\s*([^\s']\S*)/gc; is there actually an advantage to doing this? -Original Message- From: Randal L. Schwartz [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 06, 2001 10:43 AM To: Ondrej Par Cc: Peter Cornelius; [EMAIL PROTECTED] Subject: Re: splitting strings with quoted white space Ondrej == Ondrej Par [EMAIL PROTECTED] writes: Ondrej On Wednesday 06 June 2001 18:19, Randal L. Schwartz wrote: That's a good approach, but maybe this one is more straightforward: $_ = q{whatever this 'line is'}; my @elements; push @elements, $1 while /\G\s*(.*?)/gc or /\G\s*'(.*?)'/gc or /\G\s*(\S+)/gc; print map $_, @elements; The use of scalar /\G./gc to inchworm along a string is a powerful technique. Ondrej Yes, this is better. With one exception - you're not handling \' and \ (but Ondrej this can be copied from previous example). my @elements; push @elements, $1 while /\G\s*((?:[^\\]|\\|)*)/gc or /\G\s*'((?:[^\\']|\\'|)*)'/gc or /\G\s*([^\s']\S*)/gc; Leaving undefined something like \X as malformed. :) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 [EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/ Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
RE: splitting strings with quoted white space
On Jun 6, Accountant Bob said: How about this: (the same but unrolled) my @elements; push @elements, $1 while /\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or /\G\s*'([^\\']*(?:\\['\\][^\\']*)*)'/gc or /\G\s*([^\s']\S*)/gc; is there actually an advantage to doing this? Yes, as is discussed (at length) in J. Friedl's Mastering Regular Expressions. In fact, matching quoted strings in unrolled form is a very big part of his chapter on crafting regexes. Unrolling the loop can be a timesaver, since .*? can be slowish. -- Jeff japhy Pinyan [EMAIL PROTECTED] http://www.pobox.com/~japhy/ I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun. Are you a Monk? http://www.perlmonks.com/ http://forums.perlguru.com/ Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/ Acacia Fraternity, Rensselaer Chapter. Brother #734 ** Manning Publications, Co, is publishing my Perl Regex book **