Re: splitting strings

2006-08-30 Thread Hien Le

On Aug 30, 2006, at 3:42 AM, Dr.Ruud wrote:


Aaargh, I was suddenly mixing up split /()/ and /()/g. I really
shouldn't post anymore without testing.


Thank you all for the clarifications regarding split(). I should pay  
more attention when I read the documentation (or get more sleep).

-Hien.

--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




splitting strings

2006-08-29 Thread Hien Le

Hello,

Given the string 'abcdefghijklmnopq', I wish to add a line break  
every 5 characters:


abcde
fghij
klmno
pq

Method 1 below works, but my split() in method 2 captures 'something  
unexpected' at each match. Could someone please tell me what is split  
capturing that I am not seeing?


Thanks in advance for your answers,
-Hien


My script:
==

#!/usr/bin/perl -w

use strict;

my $foo = 'abcdefghijklmnopq';

# Method 1
print( \nMethod 1\n );
my $foo_length = length( $foo );
for( my $i = 0; $i  $foo_length; $i += 5 )
{
my $bar1 = substr( $foo, $i, 5 );
print( $bar1, \n );
}

# Method 2
print( \nMethod 2\n );
my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!?
my $bar2_nb = @bar2;
print( join( \n, @bar2) );
print( \nElements in array = , $bar2_nb, \n ); # 7 elements in  
the array.


__END__

My script's output:
===

[EMAIL PROTECTED] $ perl weird_string_manipulation.pl

Method 1
abcde
fghij
klmno
pq

Method 2

abcde

fghij

klmno
pq
Elements in array = 7


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: splitting strings

2006-08-29 Thread Dr.Ruud
Hien Le schreef:

 my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!?

Because of the capturing (), split also returns the separators.

See perldoc -f split.


Suggestion:

  my @bar2 = split( /./, $foo );

-- 
Affijn, Ruud

Gewoon is een tijger.



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: splitting strings

2006-08-29 Thread John W. Krahn
Hien Le wrote:
 Hello,

Hello,

 Given the string 'abcdefghijklmnopq', I wish to add a line break  every
 5 characters:
 
 abcde
 fghij
 klmno
 pq

$ perl -e'
my $foo = q[abcdefghijklmnopq];
print $foo\n;
$foo =~ s/(.{0,5})/$1\n/g;
print $foo;
'
abcdefghijklmnopq
abcde
fghij
klmno
pq


 Method 1 below works, but my split() in method 2 captures 'something 
 unexpected' at each match. Could someone please tell me what is split 
 capturing that I am not seeing?

split( /X/, 'aXb' ) splits the string using the pattern and returns the list (
'a', 'b' ).  split( /(X)/, 'aXb' ) splits the string using the pattern and
returns the list ( 'a', 'X', 'b' ).  Everything not in the pattern is returned
in the list unless you use capturing parentheses and then everything in the
capturing parentheses is returned as well.


 My script:
 ==
 
 #!/usr/bin/perl -w
 
 use strict;
 
 my $foo = 'abcdefghijklmnopq';
 
 # Method 1
 print( \nMethod 1\n );
 my $foo_length = length( $foo );
 for( my $i = 0; $i  $foo_length; $i += 5 )
 {
 my $bar1 = substr( $foo, $i, 5 );
 print( $bar1, \n );
 }
 
 # Method 2
 print( \nMethod 2\n );
 my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!?
 my $bar2_nb = @bar2;
 print( join( \n, @bar2) );
 print( \nElements in array = , $bar2_nb, \n ); # 7 elements in  the
 array.
 
 __END__

$ perl -e'
my $foo = q[abcdefghijklmnopq];
print $foo\n;
my @bar = unpack q[(a5)*], $foo;
print map $_\n, @bar;
'
abcdefghijklmnopq
abcde
fghij
klmno
pq

$ perl -e'
my $foo = q[abcdefghijklmnopq];
print $foo\n;
my @bar = $foo =~ /.{0,5}/g;
print map $_\n, @bar;
'
abcdefghijklmnopq
abcde
fghij
klmno
pq



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: splitting strings

2006-08-29 Thread Mumia W.

On 08/29/2006 06:52 AM, Hien Le wrote:

[...]
# Method 2
print( \nMethod 2\n );
my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces ?!?
[...]


The comments made by Dr. Ruud and John W. Krahn are correct. 
Split is returning the empty strings between delimiter 
segments in the original string. To zap these out, do this:


my @bar2 = grep length, split (/([a-z]{5})/, $foo);

Any substrings with a length of zero will be removed by grep 
length.




--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: splitting strings

2006-08-29 Thread Dr.Ruud
Mumia W. schreef:
 Hien Le:

 [...]
 # Method 2
 print( \nMethod 2\n );
 my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces
 ?!? [...]

 The comments made by Dr. Ruud and John W. Krahn are correct.
 Split is returning the empty strings between delimiter
 segments in the original string. To zap these out, do this:

 my @bar2 = grep length, split (/([a-z]{5})/, $foo);

 Any substrings with a length of zero will be removed by grep
 length.

Huh? Why not just remove the capturing ()?

Again: perldoc -f split

-- 
Affijn, Ruud

Gewoon is een tijger.



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: splitting strings

2006-08-29 Thread Mumia W.

On 08/29/2006 05:02 PM, Dr.Ruud wrote:

Mumia W. schreef:

Hien Le:



[...]
# Method 2
print( \nMethod 2\n );
my @bar2 = split( /([a-z]{5})/, $foo );# Captures white-spaces
?!? [...]

The comments made by Dr. Ruud and John W. Krahn are correct.
Split is returning the empty strings between delimiter
segments in the original string. To zap these out, do this:

my @bar2 = grep length, split (/([a-z]{5})/, $foo);

Any substrings with a length of zero will be removed by grep
length.


Huh? Why not just remove the capturing ()?

Again: perldoc -f split



Without the capturing parentheses, split will remove every 
sequence of five alphabetic characters from the output. Only 
'pq' will remain:


use Data::Dumper;
my $foo = 'abcdefghijklmnopq';
my @foo = split /[a-z]{5}/, $foo;
print Dumper([EMAIL PROTECTED]);

__END__

That program prints this:

$VAR1 = [
  '',
  '',
  '',
  'pq'
];



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: splitting strings

2006-08-29 Thread Dr.Ruud
Mumia W. schreef:
 Dr.Ruud:
 Mumia W.:

 my @bar2 = grep length, split (/([a-z]{5})/, $foo);

 Any substrings with a length of zero will be removed by grep
 length.

 Huh? Why not just remove the capturing ()?

 Without the capturing parentheses, split will remove every
 sequence of five alphabetic characters from the output. Only
 'pq' will remain:

 use Data::Dumper;
 my $foo = 'abcdefghijklmnopq';
 my @foo = split /[a-z]{5}/, $foo;
 print Dumper([EMAIL PROTECTED]);

 __END__

 That program prints this:

 $VAR1 = [
'',
'',
'',
'pq'
  ];

Aaargh, I was suddenly mixing up split /()/ and /()/g. I really
shouldn't post anymore without testing.

#!/usr/bin/perl
  use warnings ;
  use strict ;

  use Data::Dumper ;
  my $foo = 'abcdefghijklmnopq' ;
  my @foo = ($foo =~ /([a-z]{1,5})/g) ;
  print Dumper([EMAIL PROTECTED]);

__END__

$VAR1 = [
  'abcde',
  'fghij',
  'klmno',
  'pq'
];

-- 
Affijn, Ruud

Gewoon is een tijger.



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/ http://learn.perl.org/first-response




Re: splitting strings with quoted white space

2001-06-07 Thread Ondrej Par

On Wednesday 06 June 2001 22:59, Jeff 'japhy' Pinyan wrote:
 On Jun 6, Accountant Bob said:
 How about this: (the same but unrolled)
 
 my @elements;
 push @elements, $1 while
/\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or

I think that 
/\G\s*((?:(?:\\.)|[^\\])*?)/gc

is shorter and also matches all \X sequences (the trick is that \\. is longer 
than [^\\]

-- 
Ondrej Par
Internet Securities
Software Engineer
e-mail: [EMAIL PROTECTED]
Phone: +420 2 222 543 45 ext. 112




Re: splitting strings with quoted white space

2001-06-07 Thread Jeff 'japhy' Pinyan

On Jun 7, Ondrej Par said:

On Wednesday 06 June 2001 22:59, Jeff 'japhy' Pinyan wrote:
 On Jun 6, Accountant Bob said:
 How about this: (the same but unrolled)
 
 my @elements;
 push @elements, $1 while
/\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or

I think that 
   /\G\s*((?:(?:\\.)|[^\\])*?)/gc

is shorter and also matches all \X sequences (the trick is that \\. is longer 
than [^\\]

The formula for unrolling the loop is

  NORMAL* (SPECIAL NORMAL*)*

Here, NORMAL is /[^\\]/, and SPECIAL is /\\./ -- at least, I'm using \\.,
since I want any backslash to pass through ok.

Thus, our regex is:

  push @elements, $1 while
/\G\s*([^\\]*(?:\\.[^\\]*)*)/gc or
/\G\s*'([^\\']*(?:\\.[^\\']*)*)'/gc or
/\G\s*(\S+)/gc;

Of course, that last regex can changed to your whims...

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk?  http://www.perlmonks.com/ http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
**  Manning Publications, Co, is publishing my Perl Regex book  **




RE: splitting strings with quoted white space

2001-06-07 Thread Jeff 'japhy' Pinyan

On Jun 7, Accountant Bob said:

can any one explain to me why this doesn't seem to work:
  push @elements, $2 while
/\G\s*(['])([^\\\1]*(?:\\.[^\\\1]*)*)\1/gc or
/\G(\s*)(\S+)/gc;   # k i know that's kinda kloogy, but I'm
experimenting.

Let's find out why:

friday:~ $ explain
\G\s*(['])([^\\\1]*(?:\\.[^\\\1]*)*)\1

[snip]

--
[^\\\1]* any character except: '\\', '\1' (0 or
 more times (matching the most amount
 possible))
--

[snip]

As you see, putting \1 in a character class matches the character
\1.  That's not what we wanted; but character classes must be known at
the regex's compile-time.

You could do:

  push @matches, $+ while
/\G\s*(['])((??{[^$1]*)(?:\\.(??{[^$1]*))*)/gc or
/\G\s*(\S+)/gc;

But that is ugly, and requires Perl 5.6.0+.

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk?  http://www.perlmonks.com/ http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
**  Manning Publications, Co, is publishing my Perl Regex book  **




Re: splitting strings with quoted white space

2001-06-06 Thread Chas Owens

On 05 Jun 2001 17:49:53 -0700, Peter Cornelius wrote:
snip
 
  local $_ = 'name = quoted string with space';
 
snip

If your pattern always looks like this then try:

#!/usr/bin/perl

use strict;#make me behave

my $name;  #holds the key part of config
my $value; #the value of part of config
my $delim = =;   #the delimiter between key and value

open(FILE, shift) or die Could not open $_:$!; #open first argument or
 #die trying

while (FILE) { #while there are lines left assign next line to $_
chomp; #remove record seperator from line (ie \n)
unless (/ ($delim) /) { die Bad file data } #match  =  or
  #die trying 
$name = $`;  #put everthing before the match into $name
$value = $'; #put everything after the match into $value
#print uses [] to make white space easier to see
print name = [$name] :: value = [$value]\n;
}
close FILE;

Its output looks like this:

[cowens@cowens cowens]$ cat data
this = this is another line
that = this is a line that countains =, oh no!
bugger = me
[cowens@cowens cowens]$ ./test.pl data
name = [this] :: value = [this is another line]
name = [that] :: value = [this is a line that countains =, oh no!]
name = [bugger] :: value = [me]



-- 
Today is Boomtime, the 11st day of Confusion in the YOLD 3167






Re: splitting strings with quoted white space

2001-06-06 Thread Randal L. Schwartz

 Ondrej == Ondrej Par [EMAIL PROTECTED] writes:

Ondrej my $line = 'whatever this \'line is\'';

Ondrej $line =~ s/\s*$//;

Ondrej my @parts;
Ondrej while ($line ne '') {
Ondrej if ($line =~ m/^\s*(['])((?:(?:\\.)|[^\\])*?)\1(.*)/) {
Ondrej push @parts, $2;
Ondrej $line = $3;
Ondrej } elsif ($line =~ m/^\s*(\S+)(.*)/) {
Ondrej push @parts, $1;
Ondrej $line = $2;
Ondrej }
Ondrej }

That's a good approach, but maybe this one is more straightforward:

$_ = q{whatever this 'line is'};

my @elements;
push @elements, $1 while
  /\G\s*(.*?)/gc or
  /\G\s*'(.*?)'/gc or
  /\G\s*(\S+)/gc;

print map $_, @elements;

The use of scalar /\G./gc to inchworm along a string is a powerful
technique.

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Re: splitting strings with quoted white space

2001-06-06 Thread Jeff 'japhy' Pinyan

On Jun 6, Randal L. Schwartz said:

my @elements;
push @elements, $1 while
  /\G\s*(.*?)/gc or
  /\G\s*'(.*?)'/gc or
  /\G\s*(\S+)/gc;

Randal, would you mind if I used this as an example of \G and /gc in my
regex book?  Due credit would be given, of course.

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk?  http://www.perlmonks.com/ http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
**  Manning Publications, Co, is publishing my Perl Regex book  **




Re: splitting strings with quoted white space

2001-06-06 Thread Randal L. Schwartz

 Jeff == Jeff 'japhy' Pinyan [EMAIL PROTECTED] writes:

Jeff On Jun 6, Randal L. Schwartz said:
 my @elements;
 push @elements, $1 while
 /\G\s*(.*?)/gc or
 /\G\s*'(.*?)'/gc or
 /\G\s*(\S+)/gc;

Jeff Randal, would you mind if I used this as an example of \G and /gc in my
Jeff regex book?  Due credit would be given, of course.

Yeah, of course you can use it.  Did you mean that, and this, to go to
the beginners list? :)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



Re: splitting strings with quoted white space

2001-06-06 Thread Ondrej Par

On Wednesday 06 June 2001 18:19, Randal L. Schwartz wrote:
 That's a good approach, but maybe this one is more straightforward:

 $_ = q{whatever this 'line is'};

 my @elements;
 push @elements, $1 while
   /\G\s*(.*?)/gc or
   /\G\s*'(.*?)'/gc or
   /\G\s*(\S+)/gc;

 print map $_, @elements;

 The use of scalar /\G./gc to inchworm along a string is a powerful
 technique.

Yes, this is better. With one exception - you're not handling \' and \ (but 
this can be copied from previous example).

-- 
Ondrej Par
Internet Securities
Software Engineer
e-mail: [EMAIL PROTECTED]
Phone: +420 2 222 543 45 ext. 112




Re: splitting strings with quoted white space

2001-06-06 Thread Jeff 'japhy' Pinyan

On Jun 6, Randal L. Schwartz said:

 Jeff == Jeff 'japhy' Pinyan [EMAIL PROTECTED] writes:

Jeff On Jun 6, Randal L. Schwartz said:
 my @elements;
 push @elements, $1 while
 /\G\s*(.*?)/gc or
 /\G\s*'(.*?)'/gc or
 /\G\s*(\S+)/gc;

Jeff Randal, would you mind if I used this as an example of \G and /gc in my
Jeff regex book?  Due credit would be given, of course.

Yeah, of course you can use it.  Did you mean that, and this, to go to
the beginners list? :)

Yes.  I'd like to make any newcomers aware of the book, and I'd like any
input anyone else has one the regex.  Ondrej, I believe, just mentioned
the lack of backslash-support.

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk?  http://www.perlmonks.com/ http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
**  Manning Publications, Co, is publishing my Perl Regex book  **





Re: splitting strings with quoted white space

2001-06-06 Thread Randal L. Schwartz

 Randal == Randal L Schwartz [EMAIL PROTECTED] writes:

Randal my @elements;
Randal push @elements, $1 while
Randal   /\G\s*((?:[^\\]|\\|)*)/gc or
Randal   /\G\s*'((?:[^\\']|\\'|)*)'/gc or
Randal   /\G\s*([^\s']\S*)/gc;

Randal Leaving undefined something like \X as malformed. :)

Which can be tested with

die unless /\G\z/g;

:-)

-- 
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!



RE: splitting strings with quoted white space

2001-06-06 Thread Accountant Bob

How about this: (the same but unrolled)

my @elements;
push @elements, $1 while
   /\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or
   /\G\s*'([^\\']*(?:\\['\\][^\\']*)*)'/gc or
   /\G\s*([^\s']\S*)/gc;

is there actually an advantage to doing this?

-Original Message-
From: Randal L. Schwartz [mailto:[EMAIL PROTECTED]]
Sent: Wednesday, June 06, 2001 10:43 AM
To: Ondrej Par
Cc: Peter Cornelius; [EMAIL PROTECTED]
Subject: Re: splitting strings with quoted white space


 Ondrej == Ondrej Par [EMAIL PROTECTED] writes:

Ondrej On Wednesday 06 June 2001 18:19, Randal L. Schwartz wrote:
 That's a good approach, but maybe this one is more straightforward:

 $_ = q{whatever this 'line is'};

 my @elements;
 push @elements, $1 while
 /\G\s*(.*?)/gc or
 /\G\s*'(.*?)'/gc or
 /\G\s*(\S+)/gc;

 print map $_, @elements;

 The use of scalar /\G./gc to inchworm along a string is a powerful
 technique.

Ondrej Yes, this is better. With one exception - you're not handling \' and
\ (but
Ondrej this can be copied from previous example).

my @elements;
push @elements, $1 while
  /\G\s*((?:[^\\]|\\|)*)/gc or
  /\G\s*'((?:[^\\']|\\'|)*)'/gc or
  /\G\s*([^\s']\S*)/gc;

Leaving undefined something like \X as malformed. :)

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
[EMAIL PROTECTED] URL:http://www.stonehenge.com/merlyn/
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl
training!




RE: splitting strings with quoted white space

2001-06-06 Thread Jeff 'japhy' Pinyan

On Jun 6, Accountant Bob said:

How about this: (the same but unrolled)

my @elements;
push @elements, $1 while
   /\G\s*([^\\]*(?:\\[\\][^\\]*)*)/gc or
   /\G\s*'([^\\']*(?:\\['\\][^\\']*)*)'/gc or
   /\G\s*([^\s']\S*)/gc;

is there actually an advantage to doing this?

Yes, as is discussed (at length) in J. Friedl's Mastering Regular
Expressions.  In fact, matching quoted strings in unrolled form is a very
big part of his chapter on crafting regexes.

Unrolling the loop can be a timesaver, since .*? can be slowish.

-- 
Jeff japhy Pinyan  [EMAIL PROTECTED]  http://www.pobox.com/~japhy/
I am Marillion, the wielder of Ringril, known as Hesinaur, the Winter-Sun.
Are you a Monk?  http://www.perlmonks.com/ http://forums.perlguru.com/
Perl Programmer at RiskMetrics Group, Inc. http://www.riskmetrics.com/
Acacia Fraternity, Rensselaer Chapter. Brother #734
**  Manning Publications, Co, is publishing my Perl Regex book  **