Re: Displaying only data matching a pattern?
On Sun, Mar 15, 2009 at 6:38 PM, Lloyd Kvam wrote: > grep should have a -o option to only output the match That was brought up -- and useful it is -- but it has problems when you only want to see the match but your target pattern has to include delimiters. For those so interested, the complete thread can be found here: http://thread.gmane.org/gmane.org.user-groups.linux.gnhlug/16551 -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Sun, 2009-03-15 at 11:20 -0400, Steven W. Orr wrote: > So I figured it'd be a better job for grep, > =>however It appears that it's printing the entire matching line and I > =>only want the match on the pattern to display. > => grep should have a -o option to only output the match -- Lloyd Kvam Venix Corp DLSLUG/GNHLUG library http://dlslug.org/library.html http://www.librarything.com/catalog/dlslug http://www.librarything.com/rsshtml/recent/dlslug http://www.librarything.com/rss/recent/dlslug ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Sun, Mar 15, 2009 at 11:20 AM, Steven W. Orr wrote: > ... all manner of external tools, sed, awk, > perl, cut, ad nauseum. Just use bash. Well, this is on MS-Windows. bash is as external as sed, awk, etc., are. :) That said, more techniques are always good to offer. :) -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Monday, Feb 2nd 2009 at 11:19 -, quoth kenta: =>I've been working with some large files and need to extract a piece of =>info, unfortunately there's a bunch of junk around the part that I =>want. Example: =>foofoo:A1234567890B\barbar =>foofoo:C9234567890E\barbar =>foofoo:A8234567890B\barbar =>foofoo:F7234567890D\barbar => =>What I had done the first pass to get what I wanted was to use sed and =>do a s/foofoo:// to get rid of the stuff in front, and then do the =>same for the \barbar in the back. However what would be easier for me =>is if I could just extract the pattern of [a-fA-F0-9] when they appear =>n times in a row. I couldn't seem to figure out if sed could display =>only the part I wanted. So I figured it'd be a better job for grep, =>however It appears that it's printing the entire matching line and I =>only want the match on the pattern to display. => =>Otherwise, I want as an end result: => =>A1234567890B =>C9234567890E =>A8234567890B =>F7234567890D => =>Here's a caveat: I need to do this on a Windows box and am limited to =>using the ports of GNU utilities (using =>http://unxutils.sourceforge.net/) otherwise I'd do this in perl and =>wouldn't be asking :/ I'm thinking maybe I'm just missing something =>simple here. => =>Any quick solutions? My Google-fu is weak today. I'm a little behind in my mail, but I looked at the proffered solutions and people seem to want to use all manner of external tools, sed, awk, perl, cut, ad nauseum. Just use bash. If you set qq to one of the example values qq='foofoo:A1234567890B\barbar' then all you have to do is to use bash to take it apart. No external programs required. ww=${qq#foofoo:*} echo ${ww%\\barbar} so to put it all together while read a_line do ww=${qq#foofoo:*} echo ${ww%\\barbar} done < some_data_file echo 'Ta da' exit "$time_to_drink_beer" -- Time flies like the wind. Fruit flies like a banana. Stranger things have .0. happened but none stranger than this. Does your driver's license say Organ ..0 Donor?Black holes are where God divided by zero. Listen to me! We are all- 000 individuals! What if this weren't a hypothetical question? steveo at syslang.net ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 1:18 PM, kenta wrote: > I did have to change the single quotes to double quotes > for some reason ... Single-quotes (') have no significance to the Windows shell. So the shell would still have done tokenization on the spaces within the awk script, and awk expects its script to be a single C argument. However, double-quotes are still used to constrain the text within to a single token, similar to the Unix shells. Unlike Unix, the Windows shell does not strip the quotes (IIRC). It leaves them intact as part of the token. I would guess most Unix ports introduce some kind of compatibility layer to handle argument quoting. The Windows shell is a crock, if you haven't figured that out yet. :) -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 12:07 PM, kenta wrote: > for some reason the gnu-port of grep for win32 has no -o option. :( Hmmm, the binary I have does. However, come to think of it, I did update a bunch of the utilities from another distribution; maybe that made the difference. If you care, it was this distribution: http://gnuwin32.sourceforge.net/ Very similar to unxutils, but with some stuff is more current, and has some additional stuff. Be warned that DLL hell applies. -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 1:06 PM, Shawn O'Shea wrote: > You could try it this way in gawk: > $ cat foo > foofoo:A1234567890B\barbar > foofoo:C9234567890E\barbar > foofoo:A8234567890B\barbar > foofoo:F7234567890D\barbar > $ gawk --posix '{ if (match($0,/[[:xdigit:]]{12}/)) print > substr($0,RSTART,RLENGTH) }' foo > A1234567890B > C9234567890E > A8234567890B > F7234567890D > > > You need --posix for [:xdigit:] and the curly braces ( {} ) to work. This > basically says: > if the line ($0) matches 12 hexdigits ([:xdigit:]) in a row: >print a substring of the line ($0) starting at RSTART and going for > RLENGTH characters > match() sets RSTART and RLENGTH for you on a match. > > -Shawn Tested here and this does what I'm looking for, and importantly works with the gnu-port of gawk for win32. I did have to change the single quotes to double quotes for some reason, but it's working. Thanks to everyone for helping out! Definitely beats my multiple sed steps. -Kenta ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 12:07 PM, kenta wrote: > > > Actually I need to get the info regardless of delimiters so matching > any hex digits of a certain length works for me. I'd sent an e-mail > before but I used the wrong address so maybe it didn't go through, for > som reason the gnu-port of grep for win32 has no -o option. :( > You could try it this way in gawk: $ cat foo foofoo:A1234567890B\barbar foofoo:C9234567890E\barbar foofoo:A8234567890B\barbar foofoo:F7234567890D\barbar $ gawk --posix '{ if (match($0,/[[:xdigit:]]{12}/)) print substr($0,RSTART,RLENGTH) }' foo A1234567890B C9234567890E A8234567890B F7234567890D You need --posix for [:xdigit:] and the curly braces ( {} ) to work. This basically says: if the line ($0) matches 12 hexdigits ([:xdigit:]) in a row: print a substring of the line ($0) starting at RSTART and going for RLENGTH characters match() sets RSTART and RLENGTH for you on a match. -Shawn ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 11:58 AM, Ben Scott wrote: > On Mon, Feb 2, 2009 at 11:46 AM, Alan Johnson wrote: >>-o, --only-matching >> Print only the matched (non-empty) parts of a matching >> line, with each such part on a separate output line. > > Oh. Hey, that's neat; I didn't know about that one. So Kenta could > -- possibly -- do this: > >grep -o -E [[:xdigit:]]+ > > However, that will generate false positive matches on anything that > happens to match a hex digit and isn't within Kenta's field delimiters > (colon and backslash). May or may not be an issue, depending on what > "foo" and "bar" really are. And you can't use any kind of delimiter > matching without also including the delimiters in the output with > grep, can you? Actually I need to get the info regardless of delimiters so matching any hex digits of a certain length works for me. I'd sent an e-mail before but I used the wrong address so maybe it didn't go through, for som reason the gnu-port of grep for win32 has no -o option. :( -Kenta ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 11:58 AM, Michael ODonnell wrote: >> sed s/.*:\([[:xdigit:]]*\)\\.*/\1/ > > That looks good to me, though I assume he meant to show that > expression in single quotes. Nope. The Windows NT shell (CMD.EXE) has different meta-characters from Bourne and company. In particular, asterisk (*), backslash (\), and square brackets ([]) are *not* shell meta-characters. > Also, I can't remember if those character class notations count > as Extended Regular Expressions but ... The sed from unxutils apparently only supports Basic Regular Expressions. At least, it doesn't recognize the + one-or-more modifier. It does recognize the named character classes, though. I did run my command line on a whole one test case. ;-) According to a random "sed" man page I found on the web, the named character classes are part of POSIX BREs. -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 11:46 AM, Alan Johnson wrote: >-o, --only-matching > Print only the matched (non-empty) parts of a matching > line, with each such part on a separate output line. Oh. Hey, that's neat; I didn't know about that one. So Kenta could -- possibly -- do this: grep -o -E [[:xdigit:]]+ However, that will generate false positive matches on anything that happens to match a hex digit and isn't within Kenta's field delimiters (colon and backslash). May or may not be an issue, depending on what "foo" and "bar" really are. And you can't use any kind of delimiter matching without also including the delimiters in the output with grep, can you? -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
Ben wrote: > sed s/.*:\([[:xdigit:]]*\)\\.*/\1/ That looks good to me, though I assume he meant to show that expression in single quotes. Also, I can't remember if those character class notations count as Extended Regular Expressions but, if so, some versions of sed might want something like -r on the command line to enable their use. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 11:45 AM, Ben Scott wrote: > There might be a way to do this with awk, but my knowledge of awk is > mostly limited to using it to print columns. :) > Me too! (like some brain-dead AOLer) All I have ever used awk for is picking out a column that is white-space delimited, but it is annoying for any other delimiter. cut is great for single-character delimiters, but is annoying for white space. I fall back to regex for multiple-character delimiters. People keep telling me that awk is good for other things, then start showing me some Klingon syntaxt and I just knod and smile with the understand that we disagree on the definition of "good". =) To be fair, even awk '{print $5}' is a bear to get right until you type it 50 times. I used to aways switch the positioning of the ' and the {. Oh well. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
kenta writes: > Otherwise, I want as an end result: > > A1234567890B > C9234567890E > A8234567890B > F7234567890D How about?: sed 's/^[a-zA-Z0-9][a-zA-Z0-9]*:\([A-Z][0-9][0-9]*[A-Z]\).*/\1/' I've made the regexp here a little bit tight in order to prevent false positives. Regards, --kevin -- GnuPG ID: B280F24EMeet me by the knuckles alumni.unh.edu!kdcof the skinny-bone tree. http://kdc-blog.blogspot.com/ -- Tom Waits ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
Your man-fu might be a bit off too. ;) From the grep man page: -o, --only-matching Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line. Also, if you have cut, it will probably run faster with: cut -d ':' -f 2 $file | cut -d '\' -f 1 or | cut -d ':' -f 2 | cut -d '\' -f 1 No regex generally means faster run times. Enjoy! On Mon, Feb 2, 2009 at 11:19 AM, kenta wrote: > I've been working with some large files and need to extract a piece of > info, unfortunately there's a bunch of junk around the part that I > want. Example: > foofoo:A1234567890B\barbar > foofoo:C9234567890E\barbar > foofoo:A8234567890B\barbar > foofoo:F7234567890D\barbar > > What I had done the first pass to get what I wanted was to use sed and > do a s/foofoo:// to get rid of the stuff in front, and then do the > same for the \barbar in the back. However what would be easier for me > is if I could just extract the pattern of [a-fA-F0-9] when they appear > n times in a row. I couldn't seem to figure out if sed could display > only the part I wanted. So I figured it'd be a better job for grep, > however It appears that it's printing the entire matching line and I > only want the match on the pattern to display. > > Otherwise, I want as an end result: > > A1234567890B > C9234567890E > A8234567890B > F7234567890D > > Here's a caveat: I need to do this on a Windows box and am limited to > using the ports of GNU utilities (using > http://unxutils.sourceforge.net/) otherwise I'd do this in perl and > wouldn't be asking :/ I'm thinking maybe I'm just missing something > simple here. > > Any quick solutions? My Google-fu is weak today. > > -Kenta > ___ > gnhlug-discuss mailing list > gnhlug-discuss@mail.gnhlug.org > http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/ > -- Alan Johnson a...@datdec.com ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Re: Displaying only data matching a pattern?
On Mon, Feb 2, 2009 at 11:19 AM, kenta wrote: > extract the pattern of [a-fA-F0-9] when they > appear n times in a row. > > foofoo:A1234567890B\barbar > A1234567890B I *think* this will do what you want: sed s/.*:\([[:xdigit:]]*\)\\.*/\1/ By grouping something within parenthesis, you can backreference it in the replacement string. By grouping the part you're interested, but then having an ungrouped match for everything else, you can replace each line with just the part you're interested in. [[:xdigit:]] matches any hex digit; a little clearer than the literal pattern. The literal backslash has to be escaped, contributing to leaning toothpick syndrome. Perl's better at this (and also available for 'doze), but you specified using only the unxutils package. There might be a way to do this with awk, but my knowledge of awk is mostly limited to using it to print columns. :) -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
Displaying only data matching a pattern?
I've been working with some large files and need to extract a piece of info, unfortunately there's a bunch of junk around the part that I want. Example: foofoo:A1234567890B\barbar foofoo:C9234567890E\barbar foofoo:A8234567890B\barbar foofoo:F7234567890D\barbar What I had done the first pass to get what I wanted was to use sed and do a s/foofoo:// to get rid of the stuff in front, and then do the same for the \barbar in the back. However what would be easier for me is if I could just extract the pattern of [a-fA-F0-9] when they appear n times in a row. I couldn't seem to figure out if sed could display only the part I wanted. So I figured it'd be a better job for grep, however It appears that it's printing the entire matching line and I only want the match on the pattern to display. Otherwise, I want as an end result: A1234567890B C9234567890E A8234567890B F7234567890D Here's a caveat: I need to do this on a Windows box and am limited to using the ports of GNU utilities (using http://unxutils.sourceforge.net/) otherwise I'd do this in perl and wouldn't be asking :/ I'm thinking maybe I'm just missing something simple here. Any quick solutions? My Google-fu is weak today. -Kenta ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/