Re: Subpatterns in Grep?

2012-10-08 Thread jmichel
This sounds fine for my type of use. 
Thanks again.

Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit :


 To do this, just modify your existing pattern to find successive pairs of 
 matching lines and combine their contents: 

 Find:   (\d{6})(.+)(?:\r\1(.+)) 

 Replace:\1\2\3 

 and then repeatedly apply Replace All until all line pairs which start 
 with 
 the same numeric prefix have been consolidated to single lines. 

 (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of 
 Replace All; for groups of 64 lines or fewer, 6 passes; etc.) 


 PS: John Delacour's text filter is a much nicer general solution; the only 
 advantage of the above is it doesn't require knowledge of Perl. 


 Regards, 

  Patrick Woolsey 
 == 
 Bare Bones Software, Inc. http://www.barebones.com/ 


Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit :

 At 23:50 -0700 10/05/2012, jmichel wrote: 
 Thanks for these explanations. They confirm what I suspected. 
 Assuming that the number of lines in one group can never exceed, say, 15 
 or so, could one circumvent the difficulty by explicitly repeating the 
 search pattern a sufficient number of times? 

 Yes, and please see below. 


 Then the problem would be to ensure a match also in the case when the 
 number of lines is smaller. Any idea on how that could be achieved? Could 
 conditional matching help (I am not familiar with those advanced 
 features)? 

 To do this, just modify your existing pattern to find successive pairs of 
 matching lines and combine their contents: 

 Find:   (\d{6})(.+)(?:\r\1(.+)) 

 Replace:\1\2\3 

 and then repeatedly apply Replace All until all line pairs which start 
 with 
 the same numeric prefix have been consolidated to single lines. 

 (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of 
 Replace All; for groups of 64 lines or fewer, 6 passes; etc.) 


 PS: John Delacour's text filter is a much nicer general solution; the only 
 advantage of the above is it doesn't require knowledge of Perl. 


 Regards, 

  Patrick Woolsey 
 == 
 Bare Bones Software, Inc. http://www.barebones.com/ 


Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit :

 At 23:50 -0700 10/05/2012, jmichel wrote: 
 Thanks for these explanations. They confirm what I suspected. 
 Assuming that the number of lines in one group can never exceed, say, 15 
 or so, could one circumvent the difficulty by explicitly repeating the 
 search pattern a sufficient number of times? 

 Yes, and please see below. 


 Then the problem would be to ensure a match also in the case when the 
 number of lines is smaller. Any idea on how that could be achieved? Could 
 conditional matching help (I am not familiar with those advanced 
 features)? 

 To do this, just modify your existing pattern to find successive pairs of 
 matching lines and combine their contents: 

 Find:   (\d{6})(.+)(?:\r\1(.+)) 

 Replace:\1\2\3 

 and then repeatedly apply Replace All until all line pairs which start 
 with 
 the same numeric prefix have been consolidated to single lines. 

 (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of 
 Replace All; for groups of 64 lines or fewer, 6 passes; etc.) 


 PS: John Delacour's text filter is a much nicer general solution; the only 
 advantage of the above is it doesn't require knowledge of Perl. 


 Regards, 

  Patrick Woolsey 
 == 
 Bare Bones Software, Inc. http://www.barebones.com/ 



-- 
-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit





Re: Subpatterns in Grep?

2012-10-06 Thread jmichel
This sounds amazingly powerful and flexible. Thanks a lot. I will try it 
asap.
The only problem is that I will need to learn Perl if I want to be able to 
write such scripts…

Le samedi 6 octobre 2012 01:25:58 UTC+2, eremita a écrit :


 On 5 Oct 2012, at 16:18, jmichel jmi.m...@gmail.com javascript: 
 wrote: 

  I have a file consisting of groups of lines (unknown number of lines in 
 each group). 
  Each line begins by a 6 digit number, followed by an unknown sequence of 
 words and numbers. 
  Consecutive lines starting with the same number form a group. 
  My problem is to combine lines from each group into a single line, 
 keeping only the first occurrence of the distinctive number. 
  I have been able to find groups using the pattern 
  (\d{6})(.+)(?:\r\1(.+))+ 
  However, this does not appear to store the expressions matching the 
 inner parentheses into separate variables. 
  Is there a way to achieve the desired replacement using grep? 

 Using regular expressions yes, but you need a routine.  If you put a file 
 containing 
 this Perl Script in ~/LibraryApplication Support/BBEdit/Text Filters, it 
 will do what 
 you want.  Open the Text Filters palette from the Window menu and you will 
 see the 
 filter.  Double-click it or click on Run or, if its a frequent task, 
 assign a shortcut to the 
 script. 

 Save this as ???.pl 

 #!/usr/bin/perl 
 my %hash; 
 my $six_digits = [0-9]{6}; 
 my $remaining_text = .*; 
 my $delimiter = ; # or ,  for example 
 while () { 
 if  ( /^($six_digits)($remaining_text)/ ) { 
 $hash{$1} .= $2 # append the text after the 6 digits 
 } 
 } 
 for (sort {$a=$b} keys %hash) { 
 print $_$delimiter$hash{$_}\n 
 } 

 #JD 




-- 
-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit





Re: Subpatterns in Grep?

2012-10-06 Thread jmichel
Thanks for these explanations. They confirm what I suspected.
Assuming that the number of lines in one group can never exceed, say, 15 or 
so, could one circumvent the difficulty by explicitly repeating the search 
pattern a sufficient number of times? 
Then the problem would be to ensure a match also in the case when the 
number of lines is smaller. Any idea on how that could be achieved? Could 
conditional matching help (I am not familiar with those advanced 
features)?

Le vendredi 5 octobre 2012 20:01:21 UTC+2, Patrick Woolsey a écrit :

 At 08:18 -0700 10/05/2012, jmichel wrote: 
 I have a file consisting of groups of lines (unknown number of lines in 
 each group). 
 Each line begins by a 6 digit number, followed by an unknown sequence of 
 words and numbers. 
 Consecutive lines starting with the same number form a group. 
 My problem is to combine lines from each group into a single line, 
 keeping 
 only the first occurrence of the distinctive number. 
 I have been able to find groups using the pattern 
 (\d{6})(.+)(?:\r\1(.+))+ 
  
 However, this does not appear to store the expressions matching the inner 
 parentheses into separate variables. 
  
 Is there a way to achieve the desired replacement using grep? 


 Provided I understand the task correctly, though your pattern should match 
 all such groups of lines, I don't see any way to restructure the matched 
 text in a single step. 

 (A relatively easy brute force solution would be to concatenate all 
 matching line pairs, then rinse  repeat. :) 

 As to your question about storage: 

 Though the contents of that inner subpattern (.+) are being captured N 
 times (where N is the number of lines within the match), only the last 
 instance matched will be stored and available by reference to that 
 subpattern. 

[ As an aside for anyone else who may be wondering, this part of 
  the pattern (?: ) consists of `non-capturing parentheses` which 
  do not themselves store matched text. ] 

 For example, if you apply the following search  replace patterns: 

 Find:  (?:(\d{6})\r)+ 
 Replace:   \1 

 to this text: 

 111222 
 333444 
 555666 

 the result will be: 

 555666 


 Regards, 

  Patrick Woolsey 
 == 
 Bare Bones Software, Inc. http://www.barebones.com/ 


-- 
-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit





Re: Subpatterns in Grep?

2012-10-06 Thread jmichel
Thanks again. This is reasonably simple for short groups of lines.


Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit :

 At 23:50 -0700 10/05/2012, jmichel wrote: 
 Thanks for these explanations. They confirm what I suspected. 
 Assuming that the number of lines in one group can never exceed, say, 15 
 or so, could one circumvent the difficulty by explicitly repeating the 
 search pattern a sufficient number of times? 

 Yes, and please see below. 


 Then the problem would be to ensure a match also in the case when the 
 number of lines is smaller. Any idea on how that could be achieved? Could 
 conditional matching help (I am not familiar with those advanced 
 features)? 

 To do this, just modify your existing pattern to find successive pairs of 
 matching lines and combine their contents: 

 Find:   (\d{6})(.+)(?:\r\1(.+)) 

 Replace:\1\2\3 

 and then repeatedly apply Replace All until all line pairs which start 
 with 
 the same numeric prefix have been consolidated to single lines. 

 (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of 
 Replace All; for groups of 64 lines or fewer, 6 passes; etc.) 


 PS: John Delacour's text filter is a much nicer general solution; the only 
 advantage of the above is it doesn't require knowledge of Perl. 


 Regards, 

  Patrick Woolsey 
 == 
 Bare Bones Software, Inc. http://www.barebones.com/ 



-- 
-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit





Subpatterns in Grep?

2012-10-05 Thread jmichel
I have a file consisting of groups of lines (unknown number of lines in 
each group). 
Each line begins by a 6 digit number, followed by an unknown sequence of 
words and numbers. 
Consecutive lines starting with the same number form a group.
My problem is to combine lines from each group into a single line, keeping 
only the first occurrence of the distinctive number.
I have been able to find groups using the pattern
(\d{6})(.+)(?:\r\1(.+))+
However, this does not appear to store the expressions matching the inner 
parentheses into separate variables. 
Is there a way to achieve the desired replacement using grep?

Thanks in advance

-- 
-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit





Re: Subpatterns in Grep?

2012-10-05 Thread Patrick Woolsey
At 08:18 -0700 10/05/2012, jmichel wrote:
I have a file consisting of groups of lines (unknown number of lines in
each group).
Each line begins by a 6 digit number, followed by an unknown sequence of
words and numbers.
Consecutive lines starting with the same number form a group.
My problem is to combine lines from each group into a single line, keeping
only the first occurrence of the distinctive number.
I have been able to find groups using the pattern
(\d{6})(.+)(?:\r\1(.+))+

However, this does not appear to store the expressions matching the inner
parentheses into separate variables.

Is there a way to achieve the desired replacement using grep?


Provided I understand the task correctly, though your pattern should match
all such groups of lines, I don't see any way to restructure the matched
text in a single step.

(A relatively easy brute force solution would be to concatenate all
matching line pairs, then rinse  repeat. :)

As to your question about storage:

Though the contents of that inner subpattern (.+) are being captured N
times (where N is the number of lines within the match), only the last
instance matched will be stored and available by reference to that
subpattern.

   [ As an aside for anyone else who may be wondering, this part of
 the pattern (?: ) consists of `non-capturing parentheses` which
 do not themselves store matched text. ]

For example, if you apply the following search  replace patterns:

Find:  (?:(\d{6})\r)+
Replace:   \1

to this text:

111222
333444
555666

the result will be:

555666


Regards,

 Patrick Woolsey
==
Bare Bones Software, Inc. http://www.barebones.com/

-- 
-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit





Re: Subpatterns in Grep?

2012-10-05 Thread John Delacour

On 5 Oct 2012, at 16:18, jmichel jmi.mig...@gmail.com wrote:

 I have a file consisting of groups of lines (unknown number of lines in each 
 group). 
 Each line begins by a 6 digit number, followed by an unknown sequence of 
 words and numbers. 
 Consecutive lines starting with the same number form a group.
 My problem is to combine lines from each group into a single line, keeping 
 only the first occurrence of the distinctive number.
 I have been able to find groups using the pattern
 (\d{6})(.+)(?:\r\1(.+))+
 However, this does not appear to store the expressions matching the inner 
 parentheses into separate variables. 
 Is there a way to achieve the desired replacement using grep?

Using regular expressions yes, but you need a routine.  If you put a file 
containing
this Perl Script in ~/LibraryApplication Support/BBEdit/Text Filters, it will 
do what
you want.  Open the Text Filters palette from the Window menu and you will see 
the
filter.  Double-click it or click on Run or, if its a frequent task, assign a 
shortcut to the
script.

Save this as ???.pl

#!/usr/bin/perl
my %hash;
my $six_digits = [0-9]{6};
my $remaining_text = .*;
my $delimiter = ; # or ,  for example
while () {
if  ( /^($six_digits)($remaining_text)/ ) {
$hash{$1} .= $2 # append the text after the 6 digits
}
}
for (sort {$a=$b} keys %hash) {
print $_$delimiter$hash{$_}\n
}

#JD


-- 
-- 
You received this message because you are subscribed to the 
BBEdit Talk discussion group on Google Groups.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a feature request or would like to report a problem, 
please email supp...@barebones.com rather than posting to the group.
Follow @bbedit on Twitter: http://www.twitter.com/bbedit





subpatterns in grep (maybe)

2009-05-26 Thread mmayer344

Hi folks,

I'm running through some logs of search queries, trying to pull out
the ones containing variations on United Nations. I am trying to write
a regular expression that can match a two letter format (such as UN,
U.N. or U. N.) or a two word format (United Nations, united
+nation).

So far my flailing attempts have been able to match all of the above,
but they are also yielding any other word beginning with un.

I want to match these:

un
u.n.
u. n.
united nation
united +nation

...but not these:

united states
UNDP
unsuccessful


Any help would be much appreciated. The log files are rather large so
if possible I'd prefer a solution that works in one pass.

Thanks,
Marcy

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups BBEdit Talk group.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a specific feature request or would like to report a suspected (or 
confirmed) problem with the software, please email to supp...@barebones.com 
rather than posting to the group.
-~--~~~~--~~--~--~---



Re: subpatterns in grep (maybe)

2009-05-26 Thread Ronald J Kimball

On Tue, May 26, 2009 at 09:45:25AM -0700, mmayer344 wrote:
 
 Hi folks,
 
 I'm running through some logs of search queries, trying to pull out
 the ones containing variations on United Nations. I am trying to write
 a regular expression that can match a two letter format (such as UN,
 U.N. or U. N.) or a two word format (United Nations, united
 +nation).
 
 So far my flailing attempts have been able to match all of the above,
 but they are also yielding any other word beginning with un.
 
 I want to match these:
 
 un
 u.n.
 u. n.
 united nation
 united +nation
 
 ...but not these:
 
 united states
 UNDP
 unsuccessful

Try this:

\bun\b|\bu\.\s*n\.|\bunited\s+\+?nation

\b matches on a word-boundary, i.e. the position between a word character
[a-zA-Z0-9_] and a non-word character [^a-zA-Z0-9_].

Ronald

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google
Groups BBEdit Talk group.
To post to this group, send email to bbedit@googlegroups.com
To unsubscribe from this group, send email to
bbedit+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/bbedit?hl=en
If you have a specific feature request or would like to report a suspected (or 
confirmed) problem with the software, please email to supp...@barebones.com 
rather than posting to the group.
-~--~~~~--~~--~--~---