Zhao Peng <[EMAIL PROTECTED]> writes: > First I really cannot be more grateful for the answers to my question > from all of you, I appreciate your help and time. I'm especially > touched by the outpouring of response on this list., which I have > never experienced before anywhere else.
Zhao, this is a pretty amazing list, as you and many others have discovered. It's seldom I find as good, or complete, answers anywhere else. And most often, the ensuing discussion is more interesting, educational, and enlightening than the original question posed. (It's often amusing to me when I google for an answer to a question and within the top 10 returns from google is a reference to this list. More amusing is when it was *I* who answered the question for someone else which I am now asking :) > Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" > works. I mis-read /\ as a simliar sign on the top of "6" key on the > keyboard(so when I typed that sign, I felt strange that it is much > smaller than /\, but didn't realize that they just are not the same > thing), instead of forward slash and back slash. I felt really > embarrassed with my stupid mistake. //blush Ahhhh, this makes so much more sense now. So you in fact typed something like: grep univ abc.txt | cut -f3 -d, | sed s/^/\>/g ? That still doesn't end up with a '>' in def.txt, but depending upon exactly what you typed, I can certainly see where the use of ^ instead of /\ could result in something like that. For educational purposes, the use of ^ is to "anchor" following pattern to match from the beginning of the line. Therefore: sed 's/foo/bar/g' and sed 's/^foo/bar/g' are very different, since the former results in all occurrences of 'foo' being replaced with 'bar', whereas the latter only changes foo to bar when foo is found at the beginning of the line. The use of '$' in a pattern does exactly the same thing, except for it anchors patterns at the *end* of a line. Btw, I highly recommend reading the O'Reilly book on Regular Expressions. If you're going to be doing a lot of this type of data mining, a solid understanding of regexps and mastery of perl will make your life significantly more fun. Also, you might want to play with with writing perl/shell scripts that output data parseable by gnuplot which allow you to auto-generate some rather interesting and complicated graphs of the data (I know SAS can do all this, but I bet it's no where as interesting or fun as learning the UNIX way of doing it, and you don't need an SAS license either ;) > You said that "there is an extra column in the 3rd line". I disagree > with you from my perspective. As you can see, there are 3 commas in > between "jesse" and "Dartmouth college". For these 3 commas, again, if > we think the 2nd one as an merely indication that the value for age > column is missing, then the 3rd line will be be read as ["jesse", > MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth > college"] as you suggested. If you're going to be doing a lot of this type of thing, then perl will most definitely be your best friend :)a > Paul, as to your "simplest by what measurement" question. I was > thinking of both "easiest to remember" and "easiest to understand" > when I was posting my question. Now I desire for "most efficient" > approach. I know that will be my homework. Well, again, most efficient by what measurement. In the long run, I'm going to bet it's in your best interests to learn perl, since it's one tool which will allow you write rather small and arbitrarily complex scripts which would mostly obviate the need to learn several different tools like cut, sed, awk, comm, etc. In fact, learning perl will likely lead you to learn about these other tools over time as the situation dictates, but make you vastly more productive in the short term. Since perl excels at textual manipulation, it's perfect for this type of data analysis. And, since perl, combined with gnuplot, is simple to run from an Apache web server.... Well, I'm sure your imagination will lead you to wherever you need to go :) Good luck, and please feel free to post more interesting questions. -- Seeya, Paul _______________________________________________ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss