Re: extract string

Paul Lussier Wed, 11 Jan 2006 06:58:34 -0800

Zhao Peng <[EMAIL PROTECTED]> writes:

> First I really cannot be more grateful for the answers to my question
> from all of you, I appreciate your help and time. I'm especially
> touched by the outpouring of response on this list., which I have
> never experienced  before anywhere else.


Zhao, this is a pretty amazing list, as you and many others have
discovered.  It's seldom I find as good, or complete, answers anywhere
else.  And most often, the ensuing discussion is more interesting,
educational, and enlightening than the original question posed.  (It's
often amusing to me when I google for an answer to a question and
within the top 10 returns from google is a reference to this list.
More amusing is when it was *I* who answered the question for someone
else which I am now asking :)

> Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt"
> works. I mis-read /\ as a simliar sign on the top of "6" key on the
> keyboard(so when I typed that sign, I felt strange that it is much
> smaller than /\, but didn't realize that they just are not the same
> thing), instead of forward slash and back slash. I felt really
> embarrassed with my stupid mistake. //blush

Ahhhh, this makes so much more sense now.  So you in fact typed
something like:

  grep univ abc.txt | cut -f3 -d, | sed s/^/\>/g 

?

That still doesn't end up with a '>' in def.txt, but depending upon
exactly what you typed, I can certainly see where the use of ^ instead
of /\ could result in something like that.

For educational purposes, the use of ^ is to "anchor" following
pattern to match from the beginning of the line.  Therefore:

 sed 's/foo/bar/g'

and

 sed 's/^foo/bar/g'

are very different, since the former results in all occurrences of
'foo' being replaced with 'bar', whereas the latter only changes foo
to bar when foo is found at the beginning of the line.  The use of '$'
in a pattern does exactly the same thing, except for it anchors
patterns at the *end* of a line.

Btw, I highly recommend reading the O'Reilly book on Regular
Expressions.  If you're going to be doing a lot of this type of data
mining, a solid understanding of regexps and mastery of perl will make
your life significantly more fun.

Also, you might want to play with with writing perl/shell scripts that
output data parseable by gnuplot which allow you to auto-generate some
rather interesting and complicated graphs of the data (I know SAS can
do all this, but I bet it's no where as interesting or fun as learning
the UNIX way of doing it, and you don't need an SAS license either ;)

> You said that "there is an extra column in the 3rd line". I disagree
> with you from my perspective. As you can see, there are 3 commas in
> between "jesse" and "Dartmouth college". For these 3 commas, again, if
> we think the 2nd one as an merely indication that the value for age
> column is missing, then the 3rd line will be be read as ["jesse",
> MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth
> college"] as you suggested.

If you're going to be doing a lot of this type of thing, then perl
will most definitely be your best friend :)a

> Paul, as to your "simplest by what measurement" question. I was
> thinking of both "easiest to remember" and "easiest to understand"
> when I was posting my question. Now I desire for "most efficient"
> approach. I know that will be my homework.

Well, again, most efficient by what measurement.  In the long run, I'm
going to bet it's in your best interests to learn perl, since it's one
tool which will allow you write rather small and arbitrarily complex
scripts which would mostly obviate the need to learn several different
tools like cut, sed, awk, comm, etc.  In fact, learning perl will
likely lead you to learn about these other tools over time as the
situation dictates, but make you vastly more productive in the short
term.  Since perl excels at textual manipulation, it's perfect for
this type of data analysis.  And, since perl, combined with gnuplot,
is simple to run from an Apache web server.... Well, I'm sure your
imagination will lead you to wherever you need to go :)

Good luck, and please feel free to post more interesting questions.

-- 

Seeya,
Paul
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss

Re: extract string

Reply via email to