Re: Subpatterns in Grep?
This sounds fine for my type of use. Thanks again. Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit : To do this, just modify your existing pattern to find successive pairs of matching lines and combine their contents: Find: (\d{6})(.+)(?:\r\1(.+)) Replace:\1\2\3 and then repeatedly apply Replace All until all line pairs which start with the same numeric prefix have been consolidated to single lines. (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of Replace All; for groups of 64 lines or fewer, 6 passes; etc.) PS: John Delacour's text filter is a much nicer general solution; the only advantage of the above is it doesn't require knowledge of Perl. Regards, Patrick Woolsey == Bare Bones Software, Inc. http://www.barebones.com/ Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit : At 23:50 -0700 10/05/2012, jmichel wrote: Thanks for these explanations. They confirm what I suspected. Assuming that the number of lines in one group can never exceed, say, 15 or so, could one circumvent the difficulty by explicitly repeating the search pattern a sufficient number of times? Yes, and please see below. Then the problem would be to ensure a match also in the case when the number of lines is smaller. Any idea on how that could be achieved? Could conditional matching help (I am not familiar with those advanced features)? To do this, just modify your existing pattern to find successive pairs of matching lines and combine their contents: Find: (\d{6})(.+)(?:\r\1(.+)) Replace:\1\2\3 and then repeatedly apply Replace All until all line pairs which start with the same numeric prefix have been consolidated to single lines. (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of Replace All; for groups of 64 lines or fewer, 6 passes; etc.) PS: John Delacour's text filter is a much nicer general solution; the only advantage of the above is it doesn't require knowledge of Perl. Regards, Patrick Woolsey == Bare Bones Software, Inc. http://www.barebones.com/ Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit : At 23:50 -0700 10/05/2012, jmichel wrote: Thanks for these explanations. They confirm what I suspected. Assuming that the number of lines in one group can never exceed, say, 15 or so, could one circumvent the difficulty by explicitly repeating the search pattern a sufficient number of times? Yes, and please see below. Then the problem would be to ensure a match also in the case when the number of lines is smaller. Any idea on how that could be achieved? Could conditional matching help (I am not familiar with those advanced features)? To do this, just modify your existing pattern to find successive pairs of matching lines and combine their contents: Find: (\d{6})(.+)(?:\r\1(.+)) Replace:\1\2\3 and then repeatedly apply Replace All until all line pairs which start with the same numeric prefix have been consolidated to single lines. (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of Replace All; for groups of 64 lines or fewer, 6 passes; etc.) PS: John Delacour's text filter is a much nicer general solution; the only advantage of the above is it doesn't require knowledge of Perl. Regards, Patrick Woolsey == Bare Bones Software, Inc. http://www.barebones.com/ -- -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: Subpatterns in Grep?
This sounds amazingly powerful and flexible. Thanks a lot. I will try it asap. The only problem is that I will need to learn Perl if I want to be able to write such scripts… Le samedi 6 octobre 2012 01:25:58 UTC+2, eremita a écrit : On 5 Oct 2012, at 16:18, jmichel jmi.m...@gmail.com javascript: wrote: I have a file consisting of groups of lines (unknown number of lines in each group). Each line begins by a 6 digit number, followed by an unknown sequence of words and numbers. Consecutive lines starting with the same number form a group. My problem is to combine lines from each group into a single line, keeping only the first occurrence of the distinctive number. I have been able to find groups using the pattern (\d{6})(.+)(?:\r\1(.+))+ However, this does not appear to store the expressions matching the inner parentheses into separate variables. Is there a way to achieve the desired replacement using grep? Using regular expressions yes, but you need a routine. If you put a file containing this Perl Script in ~/LibraryApplication Support/BBEdit/Text Filters, it will do what you want. Open the Text Filters palette from the Window menu and you will see the filter. Double-click it or click on Run or, if its a frequent task, assign a shortcut to the script. Save this as ???.pl #!/usr/bin/perl my %hash; my $six_digits = [0-9]{6}; my $remaining_text = .*; my $delimiter = ; # or , for example while () { if ( /^($six_digits)($remaining_text)/ ) { $hash{$1} .= $2 # append the text after the 6 digits } } for (sort {$a=$b} keys %hash) { print $_$delimiter$hash{$_}\n } #JD -- -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: Subpatterns in Grep?
Thanks for these explanations. They confirm what I suspected. Assuming that the number of lines in one group can never exceed, say, 15 or so, could one circumvent the difficulty by explicitly repeating the search pattern a sufficient number of times? Then the problem would be to ensure a match also in the case when the number of lines is smaller. Any idea on how that could be achieved? Could conditional matching help (I am not familiar with those advanced features)? Le vendredi 5 octobre 2012 20:01:21 UTC+2, Patrick Woolsey a écrit : At 08:18 -0700 10/05/2012, jmichel wrote: I have a file consisting of groups of lines (unknown number of lines in each group). Each line begins by a 6 digit number, followed by an unknown sequence of words and numbers. Consecutive lines starting with the same number form a group. My problem is to combine lines from each group into a single line, keeping only the first occurrence of the distinctive number. I have been able to find groups using the pattern (\d{6})(.+)(?:\r\1(.+))+ However, this does not appear to store the expressions matching the inner parentheses into separate variables. Is there a way to achieve the desired replacement using grep? Provided I understand the task correctly, though your pattern should match all such groups of lines, I don't see any way to restructure the matched text in a single step. (A relatively easy brute force solution would be to concatenate all matching line pairs, then rinse repeat. :) As to your question about storage: Though the contents of that inner subpattern (.+) are being captured N times (where N is the number of lines within the match), only the last instance matched will be stored and available by reference to that subpattern. [ As an aside for anyone else who may be wondering, this part of the pattern (?: ) consists of `non-capturing parentheses` which do not themselves store matched text. ] For example, if you apply the following search replace patterns: Find: (?:(\d{6})\r)+ Replace: \1 to this text: 111222 333444 555666 the result will be: 555666 Regards, Patrick Woolsey == Bare Bones Software, Inc. http://www.barebones.com/ -- -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: Subpatterns in Grep?
Thanks again. This is reasonably simple for short groups of lines. Le samedi 6 octobre 2012 17:04:43 UTC+2, Patrick Woolsey a écrit : At 23:50 -0700 10/05/2012, jmichel wrote: Thanks for these explanations. They confirm what I suspected. Assuming that the number of lines in one group can never exceed, say, 15 or so, could one circumvent the difficulty by explicitly repeating the search pattern a sufficient number of times? Yes, and please see below. Then the problem would be to ensure a match also in the case when the number of lines is smaller. Any idea on how that could be achieved? Could conditional matching help (I am not familiar with those advanced features)? To do this, just modify your existing pattern to find successive pairs of matching lines and combine their contents: Find: (\d{6})(.+)(?:\r\1(.+)) Replace:\1\2\3 and then repeatedly apply Replace All until all line pairs which start with the same numeric prefix have been consolidated to single lines. (E.g. for groups of 16 lines or fewer, this will take at most 4 passes of Replace All; for groups of 64 lines or fewer, 6 passes; etc.) PS: John Delacour's text filter is a much nicer general solution; the only advantage of the above is it doesn't require knowledge of Perl. Regards, Patrick Woolsey == Bare Bones Software, Inc. http://www.barebones.com/ -- -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Subpatterns in Grep?
I have a file consisting of groups of lines (unknown number of lines in each group). Each line begins by a 6 digit number, followed by an unknown sequence of words and numbers. Consecutive lines starting with the same number form a group. My problem is to combine lines from each group into a single line, keeping only the first occurrence of the distinctive number. I have been able to find groups using the pattern (\d{6})(.+)(?:\r\1(.+))+ However, this does not appear to store the expressions matching the inner parentheses into separate variables. Is there a way to achieve the desired replacement using grep? Thanks in advance -- -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: Subpatterns in Grep?
At 08:18 -0700 10/05/2012, jmichel wrote: I have a file consisting of groups of lines (unknown number of lines in each group). Each line begins by a 6 digit number, followed by an unknown sequence of words and numbers. Consecutive lines starting with the same number form a group. My problem is to combine lines from each group into a single line, keeping only the first occurrence of the distinctive number. I have been able to find groups using the pattern (\d{6})(.+)(?:\r\1(.+))+ However, this does not appear to store the expressions matching the inner parentheses into separate variables. Is there a way to achieve the desired replacement using grep? Provided I understand the task correctly, though your pattern should match all such groups of lines, I don't see any way to restructure the matched text in a single step. (A relatively easy brute force solution would be to concatenate all matching line pairs, then rinse repeat. :) As to your question about storage: Though the contents of that inner subpattern (.+) are being captured N times (where N is the number of lines within the match), only the last instance matched will be stored and available by reference to that subpattern. [ As an aside for anyone else who may be wondering, this part of the pattern (?: ) consists of `non-capturing parentheses` which do not themselves store matched text. ] For example, if you apply the following search replace patterns: Find: (?:(\d{6})\r)+ Replace: \1 to this text: 111222 333444 555666 the result will be: 555666 Regards, Patrick Woolsey == Bare Bones Software, Inc. http://www.barebones.com/ -- -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
Re: Subpatterns in Grep?
On 5 Oct 2012, at 16:18, jmichel jmi.mig...@gmail.com wrote: I have a file consisting of groups of lines (unknown number of lines in each group). Each line begins by a 6 digit number, followed by an unknown sequence of words and numbers. Consecutive lines starting with the same number form a group. My problem is to combine lines from each group into a single line, keeping only the first occurrence of the distinctive number. I have been able to find groups using the pattern (\d{6})(.+)(?:\r\1(.+))+ However, this does not appear to store the expressions matching the inner parentheses into separate variables. Is there a way to achieve the desired replacement using grep? Using regular expressions yes, but you need a routine. If you put a file containing this Perl Script in ~/LibraryApplication Support/BBEdit/Text Filters, it will do what you want. Open the Text Filters palette from the Window menu and you will see the filter. Double-click it or click on Run or, if its a frequent task, assign a shortcut to the script. Save this as ???.pl #!/usr/bin/perl my %hash; my $six_digits = [0-9]{6}; my $remaining_text = .*; my $delimiter = ; # or , for example while () { if ( /^($six_digits)($remaining_text)/ ) { $hash{$1} .= $2 # append the text after the 6 digits } } for (sort {$a=$b} keys %hash) { print $_$delimiter$hash{$_}\n } #JD -- -- You received this message because you are subscribed to the BBEdit Talk discussion group on Google Groups. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a feature request or would like to report a problem, please email supp...@barebones.com rather than posting to the group. Follow @bbedit on Twitter: http://www.twitter.com/bbedit
subpatterns in grep (maybe)
Hi folks, I'm running through some logs of search queries, trying to pull out the ones containing variations on United Nations. I am trying to write a regular expression that can match a two letter format (such as UN, U.N. or U. N.) or a two word format (United Nations, united +nation). So far my flailing attempts have been able to match all of the above, but they are also yielding any other word beginning with un. I want to match these: un u.n. u. n. united nation united +nation ...but not these: united states UNDP unsuccessful Any help would be much appreciated. The log files are rather large so if possible I'd prefer a solution that works in one pass. Thanks, Marcy --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups BBEdit Talk group. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a specific feature request or would like to report a suspected (or confirmed) problem with the software, please email to supp...@barebones.com rather than posting to the group. -~--~~~~--~~--~--~---
Re: subpatterns in grep (maybe)
On Tue, May 26, 2009 at 09:45:25AM -0700, mmayer344 wrote: Hi folks, I'm running through some logs of search queries, trying to pull out the ones containing variations on United Nations. I am trying to write a regular expression that can match a two letter format (such as UN, U.N. or U. N.) or a two word format (United Nations, united +nation). So far my flailing attempts have been able to match all of the above, but they are also yielding any other word beginning with un. I want to match these: un u.n. u. n. united nation united +nation ...but not these: united states UNDP unsuccessful Try this: \bun\b|\bu\.\s*n\.|\bunited\s+\+?nation \b matches on a word-boundary, i.e. the position between a word character [a-zA-Z0-9_] and a non-word character [^a-zA-Z0-9_]. Ronald --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups BBEdit Talk group. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a specific feature request or would like to report a suspected (or confirmed) problem with the software, please email to supp...@barebones.com rather than posting to the group. -~--~~~~--~~--~--~---