Re: Displaying only data matching a pattern?

2009-03-15 Thread Ben Scott
On Sun, Mar 15, 2009 at 6:38 PM, Lloyd Kvam  wrote:
> grep should have a -o option to only output the match

  That was brought up -- and useful it is -- but it has problems when
you only want to see the match but your target pattern has to include
delimiters.

  For those so interested, the complete thread can be found here:

http://thread.gmane.org/gmane.org.user-groups.linux.gnhlug/16551

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-03-15 Thread Lloyd Kvam
On Sun, 2009-03-15 at 11:20 -0400, Steven W. Orr wrote:
> So I figured it'd be a better job for grep,
> =>however It appears that it's printing the entire matching line and I
> =>only want the match on the pattern to display.
> => 
grep should have a -o option to only output the match

-- 
Lloyd Kvam
Venix Corp
DLSLUG/GNHLUG library
http://dlslug.org/library.html
http://www.librarything.com/catalog/dlslug
http://www.librarything.com/rsshtml/recent/dlslug
http://www.librarything.com/rss/recent/dlslug

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-03-15 Thread Ben Scott
On Sun, Mar 15, 2009 at 11:20 AM, Steven W. Orr  wrote:
> ... all manner of external tools, sed, awk,
> perl, cut, ad nauseum. Just use bash.

  Well, this is on MS-Windows.  bash is as external as sed, awk, etc., are.   :)

  That said, more techniques are always good to offer.  :)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-03-15 Thread Steven W. Orr
On Monday, Feb 2nd 2009 at 11:19 -, quoth kenta:

=>I've been working with some large files and need to extract a piece of
=>info, unfortunately there's a bunch of junk around the part that I
=>want.  Example:
=>foofoo:A1234567890B\barbar
=>foofoo:C9234567890E\barbar
=>foofoo:A8234567890B\barbar
=>foofoo:F7234567890D\barbar
=>
=>What I had done the first pass to get what I wanted was to use sed and
=>do a  s/foofoo:// to get rid of the stuff in front, and then do the
=>same for the \barbar in the back.  However what would be easier for me
=>is if I could just extract the pattern of [a-fA-F0-9] when they appear
=>n times in a row. I couldn't seem to figure out if sed could display
=>only the part I wanted.  So I figured it'd be a better job for grep,
=>however It appears that it's printing the entire matching line and I
=>only want the match on the pattern to display.
=>
=>Otherwise, I want as an end result:
=>
=>A1234567890B
=>C9234567890E
=>A8234567890B
=>F7234567890D
=>
=>Here's a caveat: I need to do this on a Windows box and am limited to
=>using the ports of  GNU utilities (using
=>http://unxutils.sourceforge.net/) otherwise I'd do this in perl and
=>wouldn't be asking :/   I'm thinking maybe I'm just missing something
=>simple here.
=>
=>Any quick solutions?  My Google-fu is weak today.

I'm a little behind in my mail, but I looked at the proffered solutions 
and people seem to want to use all manner of external tools, sed, awk, 
perl, cut, ad nauseum. Just use bash.

If you set qq to one of the example values
qq='foofoo:A1234567890B\barbar'
then all you have to do is to use bash to take it apart. No external 
programs required.

ww=${qq#foofoo:*}
echo ${ww%\\barbar}

so to put it all together

while read a_line
do
ww=${qq#foofoo:*}
echo ${ww%\\barbar}
done < some_data_file

echo 'Ta da'
exit "$time_to_drink_beer"

-- 
Time flies like the wind. Fruit flies like a banana. Stranger things have  .0.
happened but none stranger than this. Does your driver's license say Organ ..0
Donor?Black holes are where God divided by zero. Listen to me! We are all- 000
individuals! What if this weren't a hypothetical question?
steveo at syslang.net
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Ben Scott
On Mon, Feb 2, 2009 at 1:18 PM, kenta  wrote:
> I did have to change the single quotes to double quotes
> for some reason ...

  Single-quotes (') have no significance to the Windows shell.  So the
shell would still have done tokenization on the spaces within the awk
script, and awk expects its script to be a single C argument.
However, double-quotes are still used to constrain the text within to
a single token, similar to the Unix shells.

  Unlike Unix, the Windows shell does not strip the quotes (IIRC).  It
leaves them intact as part of the token.  I would guess most Unix
ports introduce some kind of compatibility layer to handle argument
quoting.

  The Windows shell is a crock, if you haven't figured that out yet.  :)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Ben Scott
On Mon, Feb 2, 2009 at 12:07 PM, kenta  wrote:
> for some reason the gnu-port of grep for win32 has no -o option. :(

  Hmmm, the binary I have does.  However, come to think of it, I did
update a bunch of the utilities from another distribution; maybe that
made the difference.

  If you care, it was this distribution:
http://gnuwin32.sourceforge.net/  Very similar to unxutils, but with
some stuff is more current, and has some additional stuff.  Be warned
that DLL hell applies.

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread kenta
On Mon, Feb 2, 2009 at 1:06 PM, Shawn O'Shea  wrote:
> You could try it this way in gawk:
> $ cat foo
> foofoo:A1234567890B\barbar
> foofoo:C9234567890E\barbar
> foofoo:A8234567890B\barbar
> foofoo:F7234567890D\barbar
> $ gawk --posix '{ if (match($0,/[[:xdigit:]]{12}/)) print
> substr($0,RSTART,RLENGTH) }' foo
> A1234567890B
> C9234567890E
> A8234567890B
> F7234567890D
>
>
> You need --posix for [:xdigit:] and the curly braces ( {} ) to work. This
> basically says:
> if the line ($0) matches 12 hexdigits ([:xdigit:]) in a row:
>print a substring of the line ($0) starting at RSTART and going for
> RLENGTH characters
> match() sets RSTART and RLENGTH for you on a match.
>
> -Shawn

Tested here and this does what I'm looking for, and importantly works
with the gnu-port of gawk for win32.  I did have to change the single
quotes to double quotes for some reason, but it's working.

Thanks to everyone for helping out!

Definitely beats my multiple sed steps.

-Kenta
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Shawn O'Shea
On Mon, Feb 2, 2009 at 12:07 PM, kenta  wrote:
>
>
> Actually I need to get the info regardless of delimiters so matching
> any hex digits of a certain length works for me.  I'd sent an e-mail
> before but I used the wrong address so maybe it didn't go through, for
> som reason the gnu-port of grep for win32 has no -o option. :(
>

You could try it this way in gawk:
$ cat foo
foofoo:A1234567890B\barbar
foofoo:C9234567890E\barbar
foofoo:A8234567890B\barbar
foofoo:F7234567890D\barbar
$ gawk --posix '{ if (match($0,/[[:xdigit:]]{12}/)) print
substr($0,RSTART,RLENGTH) }' foo
A1234567890B
C9234567890E
A8234567890B
F7234567890D


You need --posix for [:xdigit:] and the curly braces ( {} ) to work. This
basically says:
if the line ($0) matches 12 hexdigits ([:xdigit:]) in a row:
   print a substring of the line ($0) starting at RSTART and going for
RLENGTH characters
match() sets RSTART and RLENGTH for you on a match.

-Shawn
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread kenta
On Mon, Feb 2, 2009 at 11:58 AM, Ben Scott  wrote:
> On Mon, Feb 2, 2009 at 11:46 AM, Alan Johnson  wrote:
>>-o, --only-matching
>>   Print  only  the  matched  (non-empty) parts of a matching
>> line, with each such part on a separate output line.
>
>  Oh.  Hey, that's neat; I didn't know about that one.  So Kenta could
> -- possibly -- do this:
>
>grep -o -E [[:xdigit:]]+
>
> However, that will generate false positive matches on anything that
> happens to match a hex digit and isn't within Kenta's field delimiters
> (colon and backslash).  May or may not be an issue, depending on what
> "foo" and "bar" really are.  And you can't use any kind of delimiter
> matching without also including the delimiters in the output with
> grep, can you?

Actually I need to get the info regardless of delimiters so matching
any hex digits of a certain length works for me.  I'd sent an e-mail
before but I used the wrong address so maybe it didn't go through, for
som reason the gnu-port of grep for win32 has no -o option. :(

-Kenta
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Ben Scott
On Mon, Feb 2, 2009 at 11:58 AM, Michael ODonnell
 wrote:
>>  sed s/.*:\([[:xdigit:]]*\)\\.*/\1/
>
> That looks good to me, though I assume he meant to show that
> expression in single quotes.

  Nope.  The Windows NT shell (CMD.EXE) has different meta-characters
from Bourne and company.  In particular, asterisk (*), backslash (\),
and square brackets ([]) are *not* shell meta-characters.

> Also, I can't remember if those character class notations count
> as Extended Regular Expressions but ...

  The sed from unxutils apparently only supports Basic Regular
Expressions.  At least, it doesn't recognize the + one-or-more
modifier.  It does recognize the named character classes, though.  I
did run my command line on a whole one test case.  ;-)

  According to a random "sed" man page I found on the web, the named
character classes are part of POSIX BREs.

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Ben Scott
On Mon, Feb 2, 2009 at 11:46 AM, Alan Johnson  wrote:
>-o, --only-matching
>   Print  only  the  matched  (non-empty) parts of a matching
> line, with each such part on a separate output line.

  Oh.  Hey, that's neat; I didn't know about that one.  So Kenta could
-- possibly -- do this:

grep -o -E [[:xdigit:]]+

However, that will generate false positive matches on anything that
happens to match a hex digit and isn't within Kenta's field delimiters
(colon and backslash).  May or may not be an issue, depending on what
"foo" and "bar" really are.  And you can't use any kind of delimiter
matching without also including the delimiters in the output with
grep, can you?

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Michael ODonnell


Ben wrote:

>  sed s/.*:\([[:xdigit:]]*\)\\.*/\1/

That looks good to me, though I assume he meant to show that
expression in single quotes.  Also, I can't remember if those
character class notations count as Extended Regular Expressions
but, if so, some versions of sed might want something like -r
on the command line to enable their use.

___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Alan Johnson
On Mon, Feb 2, 2009 at 11:45 AM, Ben Scott  wrote:

>  There might be a way to do this with awk, but my knowledge of awk is
> mostly limited to using it to print columns.  :)
>

Me too!  (like some brain-dead AOLer)

All I have ever used awk for is picking out a column that is white-space
delimited, but it is annoying for any other delimiter.  cut is great for
single-character delimiters, but is annoying for white space.  I fall back
to regex for multiple-character delimiters.

People keep telling me that awk is good for other things, then start showing
me some Klingon syntaxt and I just knod and smile with the understand that
we disagree on the definition of "good". =)  To be fair, even awk '{print
$5}' is a bear to get right until you type it 50 times.  I used to aways
switch the positioning of the ' and the {.  Oh well.
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Kevin D. Clark

kenta writes:

> Otherwise, I want as an end result:
> 
> A1234567890B
> C9234567890E
> A8234567890B
> F7234567890D

How about?:

 sed 's/^[a-zA-Z0-9][a-zA-Z0-9]*:\([A-Z][0-9][0-9]*[A-Z]\).*/\1/'

I've made the regexp here a little bit tight in order to prevent false
positives.

Regards,

--kevin
-- 
GnuPG ID: B280F24EMeet me by the knuckles
alumni.unh.edu!kdcof the skinny-bone tree.
http://kdc-blog.blogspot.com/ -- Tom Waits
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Alan Johnson
Your man-fu might be a bit off too. ;)  From the grep man page:
   -o, --only-matching
  Print  only  the  matched  (non-empty) parts of a matching
line,
  with each such part on a separate output line.

Also, if you have cut, it will probably run faster with:

cut -d ':' -f 2 $file | cut -d '\' -f 1
or
 | cut -d ':' -f 2 | cut -d '\' -f 1

No regex generally means faster run times.

Enjoy!

On Mon, Feb 2, 2009 at 11:19 AM, kenta  wrote:

> I've been working with some large files and need to extract a piece of
> info, unfortunately there's a bunch of junk around the part that I
> want.  Example:
> foofoo:A1234567890B\barbar
> foofoo:C9234567890E\barbar
> foofoo:A8234567890B\barbar
> foofoo:F7234567890D\barbar
>
> What I had done the first pass to get what I wanted was to use sed and
> do a  s/foofoo:// to get rid of the stuff in front, and then do the
> same for the \barbar in the back.  However what would be easier for me
> is if I could just extract the pattern of [a-fA-F0-9] when they appear
> n times in a row. I couldn't seem to figure out if sed could display
> only the part I wanted.  So I figured it'd be a better job for grep,
> however It appears that it's printing the entire matching line and I
> only want the match on the pattern to display.
>
> Otherwise, I want as an end result:
>
> A1234567890B
> C9234567890E
> A8234567890B
> F7234567890D
>
> Here's a caveat: I need to do this on a Windows box and am limited to
> using the ports of  GNU utilities (using
> http://unxutils.sourceforge.net/) otherwise I'd do this in perl and
> wouldn't be asking :/   I'm thinking maybe I'm just missing something
> simple here.
>
> Any quick solutions?  My Google-fu is weak today.
>
> -Kenta
> ___
> gnhlug-discuss mailing list
> gnhlug-discuss@mail.gnhlug.org
> http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
>



-- 
Alan Johnson
a...@datdec.com
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Re: Displaying only data matching a pattern?

2009-02-02 Thread Ben Scott
On Mon, Feb 2, 2009 at 11:19 AM, kenta  wrote:
> extract the pattern of [a-fA-F0-9] when they
> appear n times in a row.
>
> foofoo:A1234567890B\barbar
> A1234567890B

  I *think* this will do what you want:

sed s/.*:\([[:xdigit:]]*\)\\.*/\1/

  By grouping something within parenthesis, you can backreference it
in the replacement string.  By grouping the part you're interested,
but then having an ungrouped match for everything else, you can
replace each line with just the part you're interested in.

  [[:xdigit:]] matches any hex digit; a little clearer than the literal pattern.

  The literal backslash has to be escaped, contributing to leaning
toothpick syndrome.  Perl's better at this (and also available for
'doze), but you specified using only the unxutils package.

  There might be a way to do this with awk, but my knowledge of awk is
mostly limited to using it to print columns.  :)

-- Ben
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/


Displaying only data matching a pattern?

2009-02-02 Thread kenta
I've been working with some large files and need to extract a piece of
info, unfortunately there's a bunch of junk around the part that I
want.  Example:
foofoo:A1234567890B\barbar
foofoo:C9234567890E\barbar
foofoo:A8234567890B\barbar
foofoo:F7234567890D\barbar

What I had done the first pass to get what I wanted was to use sed and
do a  s/foofoo:// to get rid of the stuff in front, and then do the
same for the \barbar in the back.  However what would be easier for me
is if I could just extract the pattern of [a-fA-F0-9] when they appear
n times in a row. I couldn't seem to figure out if sed could display
only the part I wanted.  So I figured it'd be a better job for grep,
however It appears that it's printing the entire matching line and I
only want the match on the pattern to display.

Otherwise, I want as an end result:

A1234567890B
C9234567890E
A8234567890B
F7234567890D

Here's a caveat: I need to do this on a Windows box and am limited to
using the ports of  GNU utilities (using
http://unxutils.sourceforge.net/) otherwise I'd do this in perl and
wouldn't be asking :/   I'm thinking maybe I'm just missing something
simple here.

Any quick solutions?  My Google-fu is weak today.

-Kenta
___
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/