bug#26576: -v when used with -C
On Thu, Apr 20, 2017 at 02:34:47PM -0500, Eric Blake wrote: On 04/20/2017 11:51 AM, Assaf Gordon wrote: If I may suggest the following sed program: $ sed -n ':x 1,2{N;bx} ; /UGLY/{ N;N;z;bx }; /./P;N;D' file Works as long as lines 1 and 2 do not contain UGLY. But misbehaves if UGLY appears early: [...] Also misbehaves if two occurrences of UGLY appear with overlapping context: [...] May be fixable with even more magic, perhaps by using the hold buffer to track the status of the last three lines, and suppressing output if any of the last three inputs were UGLY. But more complicated than I want to spend time on for the sake of this email. Good catch, thanks for pointing this out. Indeed, that was an ad-hoc script, suitible for some limited scenarios but not robust as a general solution. -assaf
bug#26576: -v when used with -C
On 04/20/2017 11:51 AM, Assaf Gordon wrote: > If I may suggest the following sed program: > > $ cat file > a > b > c > bla1 > bla2 > UGLY > bla3 > bla4 > e > f > g > > $ sed -n ':x 1,2{N;bx} ; /UGLY/{ N;N;z;bx }; /./P;N;D' file Works as long as lines 1 and 2 do not contain UGLY. But misbehaves if UGLY appears early: $ printf '2\nUGLY\n3\n4\nc\nd\n' | sed -n ':x 1,2{N;bx}; /UGLY/{N;N;z;bx}; /./P;N;D' d Oops - missed c. Also misbehaves if two occurrences of UGLY appear with overlapping context: $ printf 'a\nb\n1\n2\nUGLY\n3\nUGLY\n4\n5\nc\nd\n' | sed -n ':x 1,2{N;bx}; /UGLY/{N;N;z;bx}; /./P;N;D' a b 4 5 c d Oops - didn't filter 4 and 5. May be fixable with even more magic, perhaps by using the hold buffer to track the status of the last three lines, and suppressing output if any of the last three inputs were UGLY. But more complicated than I want to spend time on for the sake of this email. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
bug#26576: -v when used with -C
Yes those are brilliant uses of sed. However for now ‘-v’ ‘--invert-match’ Invert the sense of matching, to select non-matching lines. (‘-v’ is specified by POSIX.) perhaps should mention that "-v is processed before -C, -A, and -B, not after."
bug#26576: -v when used with -C
Hello, On Thu, Apr 20, 2017 at 11:26:47AM -0500, Eric Blake wrote: On 04/20/2017 10:37 AM, 積丹尼 Dan Jacobson wrote: I want to do $ cat file|some_program but I must must exclude the UGLY line and its two neighbors. OK I have found the UGLY line, and its two neighbors $ grep -C 2 UGLY file bla bla UGLY bla bla but I have no way to exclude them before piping to some_program. It's very corner case, so I'm not sure it's worth burning an option and complicating grep to do this, plus waiting for a future version of grep with the proposed new option to percolate to your machines, when you already accomplish the same task using existing tools (admittedly with more complexity). If I may suggest the following sed program: $ cat file a b c bla1 bla2 UGLY bla3 bla4 e f g $ sed -n ':x 1,2{N;bx} ; /UGLY/{ N;N;z;bx }; /./P;N;D' file a b c e f g The combination of N/P/D commands use sed's pattern space as a fifo buffer (N appends a new line, P prints the last line, D deletes the last line). In between, if the pattern space contains the marker UGLY, the entire buffer is deleted and the cycle is restarted. Specifically: 1. ':x 1,2{N;bx}' => Load the buffer with the first two lines. 2. '/UGLY/ {N;N;z;bx}' => If the marker is found in the pattern space (which should contain 3 lines now), consume two more lines (N;N), clear the buffer (z) and jump to the beginning. 'z' is GNU extension. It can be replaced with 's/.*//'. 3. '/./P' => If the pattern space isn't empty, print up to the first line; 4. 'N;D' => Read the next line from the input file and append it to the pattern space, Delete the last line from the pattern space (the same line that was printed with 'P'). The following program can be used to learn a bit more about how the N/P/D commands work. It uses 'l' to the print content of the pattern space, and you can see it behaves like a FIFO: $ sed -n ':x 1,2{N;bx} ; l;P;N;D' file a\nb\nc$ a b\nc\nbla1$ b c\nbla1\nbla2$ c bla1\nbla2\nUGLY$ bla1 bla2\nUGLY\nbla3$ bla2 UGLY\nbla3\nbla4$ UGLY bla3\nbla4\ne$ bla3 bla4\ne\nf$ bla4 e\nf\ng$ e More information about sed's buffers can be found here: https://www.gnu.org/software/sed/manual/sed.html#advanced-sed hope this helps, regards, - assaf
bug#26576: -v when used with -C
On 04/20/2017 11:38 AM, 積丹尼 Dan Jacobson wrote: > Yes, if somebody ever adds this option perhaps call it --compliment. Except that you mean --complement (you are not praising the lines, but making an opposite selection of lines). -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
bug#26576: -v when used with -C
Yes, if somebody ever adds this option perhaps call it --compliment.
bug#26576: -v when used with -C
On 04/20/2017 10:37 AM, 積丹尼 Dan Jacobson wrote: > I want to do > $ cat file|some_program > but I must must exclude the UGLY line and its two neighbors. > > OK I have found the UGLY line, and its two neighbors > $ grep -C 2 UGLY file > bla > bla > UGLY > bla > bla > > but I have no way to exclude them before piping to some_program. So it sounds like you are asking for some sort of new --invert-output, which toggles which lines to display. Revisiting my example, it would change: $ seq 10 | grep -C 25 3 4 5 6 7 into: $ seq 10 | grep -C 25 --invert-output 1 2 -- 8 9 10 as well as: $ seq 10 | grep -C 2 -v 5 1 2 3 4 5 6 7 8 9 10 $ seq 10 | grep -C 2 -v '[3-8]' 1 2 3 4 -- 7 8 9 10 into: $ seq 10 | grep -C 2 -v 5 --invert-output $ seq 10 | grep -C 2 -v '[3-8]' --invert-output 5 6 It's very corner case, so I'm not sure it's worth burning an option and complicating grep to do this, plus waiting for a future version of grep with the proposed new option to percolate to your machines, when you already accomplish the same task using existing tools (admittedly with more complexity). For example, you can use sed twice if the data is in a file that can be re-read or easily regenerated (in this case, I'm skipping d, h, and any line within -C1 of the ugly lines): $ printf %s\\n a b c d e f g h i j > file $ ugly=$(sed -n '/[dh]/ =' file) $ sed "$(for line in $ugly; do echo "$((line-1)),$((line+1))d;"; done)" file a b f j Or it should be easy enough to write an awk script that stashes all input lines into one array, then checks for regular expression matches, and sets multiple entries in a corresponding poison array to 1 (based on how many lines of context you want to poison), then in an END block only print out lines if the corresponding poison[] entry is not 1. Although I'll leave that as an exercise for the reader. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
bug#26576: -v when used with -C
I want to do $ cat file|some_program but I must must exclude the UGLY line and its two neighbors. OK I have found the UGLY line, and its two neighbors $ grep -C 2 UGLY file bla bla UGLY bla bla but I have no way to exclude them before piping to some_program.
bug#26576: -v when used with -C
On 04/20/2017 10:14 AM, 積丹尼 Dan Jacobson wrote: > Mmmm, OK, but grep still needs an additional future option to print just > the missing set... What output are you wanting? If all you want is the non-matching lines, don't ask for context (since the context will include matching lines). If you want your request to be acted on, please demonstrate with some sample input and the resulting output you want to accomplish, and then we can help you figure out if that particular output can already be generated using existing options. But your vague request to "print just the missing set" doesn't tell me what you really want. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
bug#26576: -v when used with -C
Mmmm, OK, but grep still needs an additional future option to print just the missing set...
bug#26576: -v when used with -C
tag 26576 notabug thanks On 04/20/2017 09:39 AM, 積丹尼 Dan Jacobson wrote: > You know if this only gets five lines, > grep -C 2ZZZ 1.vcf|wc - 1.vcf > 5 5 197 - >16861731 83630 1.vcf > then this > grep -C 2 -v ZZZ 1.vcf|wc - 1.vcf >16861731 83630 - >16861731 83630 1.vcf > should get all EXCEPT five lines. Not necessarily true. Let's simplify your example to something that doesn't require knowing the contents of 1.vcf: $ seq 10 | grep -C 25 3 4 5 6 7 That says show all lines that match the regex '5', as well as (up to) 2 context lines on either side. So we get a total output of five lines, even though only one of those five lines actually matched. Now the converse: $ seq 10 | grep -C 2 -v 5 1 2 3 4 5 6 7 8 9 10 That says to show all lines that do not match the regex '5', as well as (up to) 2 context lines on either side. So we get a total output of ten lines, but that is comprised of 4 matching lines, 1 context line, and 5 more matching lines (grep was smart enough to consolidate the two tail lines after 4 and the two head lines before 6 into a single output line, rather than displaying two independent chunks). For further proof that -C and -v are correctly working together, try something that excludes enough context lines to actually get two hunks: $ seq 10 | grep -C 2 -v '[3-8]' 1 2 3 4 -- 7 8 9 10 Now you're matching 2 lines, then 2 lines tail context, then a hunk separator, then 2 lines head context, then 2 more matching lines. Therefore, I'm tagging this as not a bug. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org signature.asc Description: OpenPGP digital signature
bug#26576: -v when used with -C
You know if this only gets five lines, grep -C 2ZZZ 1.vcf|wc - 1.vcf 5 5 197 - 16861731 83630 1.vcf then this grep -C 2 -v ZZZ 1.vcf|wc - 1.vcf 16861731 83630 - 16861731 83630 1.vcf should get all EXCEPT five lines.