Re: extract string from filename
Tom Buskey <[EMAIL PROTECTED]> writes: > Unix Shell Programming by Kochan and Wood is a classic on shell programming > > > Portable Shell Programming by Blinn > The Awk Programming Language by Aho, Weinberger and Kernighan I'm also a big fan of Kernighan and Pikes, "The UNIX Programming Environment". When I first saw this book I thought it was going to be more of a C programming book explaining thinks like linking and compiling under UNIX. However, it turned out to be simply a great book on how to get around the shell and do a variety of things in the UNIX environment. So named the UNIX Progamming Environment because, as we've all seen here, the shell is *programmable* :) And, yet another plug for my all-time favorite UNIX book, "The UNIX Philosophy" by Mike Gancarz, which has recently been updated with a second edition (which I have not yet read) The Linux and UNIX Philosophy. This book does a fantastic job of explaining exactly *why* UNIX is such a great environment, and why other competing environments just can't compete when what you need is raw power and flexibility. -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
[EMAIL PROTECTED] (Kevin D. Clark) writes: > Zhao Peng writes: > >> I'm back, with another "extract string" question. //grin > > > find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | > sort -u > somefile.txt Or, to simplify this: find ./ -name \*sas7bdat | awk -F_ '{print $2}' |sort -u ls *sas7bdat | perl -F_ -ane 'print "$F[1]\n";'|sort -u perl -e 'opendir(DIR,"."); map { if (/sas7bdat$/) { $k = (split(/_/,$_))[1]; $f{$k} =1; } } readdir(DIR); map { print "$_\n";}sort keys %f;' That last one might be a little better formatted like: perl -e 'opendir(DIR,"."); map { if (/sas7bdat$/) { $k = (split(/_/,$_))[1]; $f{$k}=1; } } readdir(DIR); map { print "$_\n";} sort keys %f;' It should be rather obvious that your best bet for quick one-liners for this type of thing is to probably stick with standard UNIX tools like sort, cut, sed, awk, etc. Perl is great for text manipulation, but as you can see, none of the perl one-liners has been nearly as concise as the shell variants. If speed matters, or process overhead, then maybe perl is better. Of course for such a small data set as you've given, the perl versions are both harder and longer to type. hth. -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
Zhao Peng wrote: string1_string2_string3_string4.sas7bdat abc_st_nh_num.sas7bdat abc_st_vt_num.sas7bdat abc_st_ma_num.sas7bdat abcd_region_NewEngland_num.sas7bdat abcd_region_South_num.sas7bdat My goal is to : 1, extract string2 from each file name 2, then sort them and keep only unique ones 3, then output them to a .txt file. (one unique string2 per line) Solution #1: ls -1 *sas7bdat|awk -F_ '{print $2}'|sort -fu|cat -n >output.txt Take output of ls, 1 file per line (ls -1) - only files ending with sas7bdat Feed into awk, splitting on _, print the 2nd field Sort ignoring case, eliminating duplicates (sort options: f "folds case", u "keeps only uniques") Number the lines (cat -n) Put output in file named output.txt Solution #2: ls -1 *sas7bdat|sed 's/^\([a-zA-Z0-9]*_\)\([a-zA-Z0-9]*\)_.*$/\2/'|sort -fu|cat -n >output.txt Use sed (stream editor) to break up filenames into atoms separated by _, and output the 2nd one (the \2). Regular expressions (regex) can be very handy. ^ matches beginning of string, [a-zA-Z0-9]*_ matches letter/number string ending with _, the backslashed parentheses groups the patterns, so the 2nd one can be extracted. There are many solutions to the problem, as you can see. -- Dan Jenkins ([EMAIL PROTECTED]) Rastech Inc., Bedford, NH, USA --- 1-603-206-9951 *** Technical Support Excellence for over a quarter century ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
"cat -n" will number output lines ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
On 1/13/06, Ben Scott <[EMAIL PROTECTED]> wrote: > On 1/13/06, Zhao Peng <[EMAIL PROTECTED]> wrote: > > Is it possible to number the extracted string2? > > find -name \*sas7bdat -printf '%f\n' | cut -d _ -f 2 | sort | uniq | cat -n I forgot to mention: If the *only* files in that directory are the ones with the interesting file names, you can just use this: ls | cut -d _ -f 2 | sort | uniq | cat -n -- Ben "I would flunk the quiz" Scott ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
On 1/13/06, Zhao Peng <[EMAIL PROTECTED]> wrote: > Is it possible to number the extracted string2? find -name \*sas7bdat -printf '%f\n' | cut -d _ -f 2 | sort | uniq | cat -n Run that pipeline in the directory you are interested in. The find(1) command finds files, based on their name or other filesystem attributes. The "-name \*sas7bdat" part finds files with file names which match the pattern. There backslash escapes the star, to keep the shell from trying to interpret it, so find gets the star instead. The "-printf '%f\n'" part has find output just the file name, not the path. cut(1) is used to split input strings, as you know. "-d _" splits into fields, based on underscores. "-f 2" outputs the second field only, one per line. sort(1) sorts, and uniq(1) eliminates duplicate lines. "cat -n" numbers the output. -- Ben "Pay attention, there's gonna be a quiz next week" Scott ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
On Fri, Jan 13, 2006 at 11:40:26AM -0500, Zhao Peng wrote: > Kevin, > > Thank you very much! I really appreciate it. > > I like your "find" approach, it's simple and easy to understand. > > I'll also try to understand your perl approach, when I got time to start > learning it. (Hopefully it won't be un-fulfilled forever) > > I have one more question: > > Is it possible to number the extracted string2? > > Say, the output file contains the following list of extracted string2: > > st > region > local > > Any idea about what command to use to number the list to make it look > like below: > > 1 st > 2 region > 3 local Pipe the output into "pr -n -T" This is not pr's intended use, but it will work. -n option means "put numbers on the lines, -T option means "No page breaks". The "-n" option appears to be missing from the FC2 man pages. -- Jeff Kinz, Emergent Research, Hudson, MA. speech recognition software may have been used to create this e-mail "The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding." - Brandeis To think contrary to one's era is heroism. But to speak against it is madness. -- Eugene Ionesco ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
Kevin, Thank you very much! I really appreciate it. I like your "find" approach, it's simple and easy to understand. I'll also try to understand your perl approach, when I got time to start learning it. (Hopefully it won't be un-fulfilled forever) I have one more question: Is it possible to number the extracted string2? Say, the output file contains the following list of extracted string2: st region local Any idea about what command to use to number the list to make it look like below: 1 st 2 region 3 local Again, thank you for your help and time! Zhao Kevin D. Clark wrote: Zhao Peng writes: I'm back, with another "extract string" question. //grin find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | sort -u > somefile.txt or perl -MFile::Find -e 'find(sub{$string2 = (split /_/)[2]; $seen{$string2}++; }, @ARGV); map { print "$_\n"; } keys(%seen)' FOLDERNAME (which looks more readable as: perl -MFile::Find -e 'find(sub{ $string2 = (split /_/)[2]; $seen{$string2}++; }, @ARGV); map { print "$_\n"; } keys(%seen)' \ FOLDERNAME > somefile.txt ) Either of which solves the problem that you describe. Actually, they solve more than the problem that you describe, since it wasn't apparent to me if you had any subdirectories here, but this is solved too) (substitute FOLDERNAME with your directory's name) Honestly, the first solution I present is the way I would have solved this problem myself. Very fast this way. Regards, --kevin ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
Zhao Peng wrote: My goal is to : 1, extract string2 from each file name 2, then sort them and keep only unique ones 3, then output them to a .txt file. (one unique string2 per line) It is really interesting how many ways there are to do things in *nix. My first reaction, if this is a one time event, is to just use vi: % ls *.sas7bdat > string2.txt % vi string2.txt :%s/^[^_]*_// :%s/_.*$// :%!sort -u :wq The first regex removes the first underscore and everything in front of it, while the second regex removes what is now the first underscore (was the second originally) and everything after it. And then I do the unique sort right in vi. Larry ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
On Jan 12, 2006, at 19:40, Zhao Peng wrote: I also downloaded an e-book called "Learning Perl" (OReilly, 4th.Edition), and had a quick look thru its Contents of Table, but did not find any chapter which looks likely addressing any issue related to my question. Good start. Read these sections: 'A Stroll Through Perl', 'The Split and Join Functions', 'Lists and Arrays', 'Hashes', 'Directory Access', and 'File Manipulation'. Your description is the outline of the algorithm. Take this script where I've filled in the requisite perl and figure out how it works: #!/usr/bin/perl -w use strict; # show stupid errors use warnings FATAL=>'all';# don't let you get away with them #I have almost 1k small files within one folder. The only pattern of the file names is: my $dirname = shift; # take the command line parameter as the directory name opendir DIRECTORY, $dirname; my @files = readdir(DIRECTORY); closedir DIRECTORY; #string1_string2_string3_string4.sas7bdat #Note: #1, string2 often repeat itself across each file name #2, All 4 strings contain no underscores. #3, 4 strings are separated by 3 underscores (as you can see) #4, The length of all 4 strings are not fixed. my (@part_2s); # we'll keep the second parts here foreach my $file (@files) { next if (($file eq '.') or ($file eq '..')); # the directory will contain . and .. which we don't want #My goal is to : #1, extract string2 from each file name my ($filename,$extension) = split('\.',$file); # don't forget to escape the . since this is a regex my @strings = split('_',$filename); my $part_2 = $strings[1]; # remember, arrays in perl are zero-indexed push(@part_2s,$part_2); # store the data we want on the end of the array } #2, keep only unique ones # perl trick using a hash to easily get unique items my (%temp_hash); foreach my $part (@part_2s) { $temp_hash{$part} = 1; } my @uniques = (keys %temp_hash); # and then sort them my @sorted = sort { $a cmp $b} (@uniques); # cmp for string storting #3, then output them to a .txt file. (one unique string2 per line) open OUTFILE, ">output.txt"; foreach my $item (@sorted) { print OUTFILE $item . "\n"; } close OUTFILE; When you understand each line you'll be able to solve future similar problems easily. Note Kevin's perl solution is equally valid and probably faster, but you're not going to grok it until you excercise the perl part of your brain for a while. -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Cell: 603.252.2606 http://www.bfccomputing.com/Page: 603.442.1833 Blog: http://blog.bfccomputing.com/ VCard: http://bfccomputing.com/vcard/bill.vcf ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
On 1/12/06, Ben Scott <[EMAIL PROTECTED]> wrote: On 1/12/06, Zhao Peng <[EMAIL PROTECTED]> wrote:> I'm back, with another "extract string" question. //grin It sounds like you could use a tutorial on Unix text processing and command line tools, specifically, one which addresses pipes andredirection, as well as the standard text tools (grep, cut, sed, awk,etc.). While Paul's recommendation about the O'Reilly regularexpressions book is valid, I suspect it might be a little too focused on regex's and not cover some of the *other* elements you seem to beneeding. It's been forever for me, but I seem to recall that _Unix PowerTools_, also published by O'Reilly, covers all of the above and much, much more. If others on this list second my suggestion, you mightwant to obtain a copy. Alternatively, maybe list members can suggestalternatives?Unix Shell Programming by Kochan and Wood is a classic on shell programming Portable Shell Programming by BlinnThe Awk Programming Language by Aho, Weinberger and KernighanPower Tools is excellent but it more of a tip book in my mind. Not as much as the Hack series though. There are also a number of free guides at the Linux DocumentationProject. See: http://www.tldp.org/guides.html Look for anything mentioning "bash" (the Bourne-again shell) orscripting. I can't speak as to how good they are, but you can't beat the price.Some of them are very good. And the examples work. -- A strong conviction that something must be done is the parent of many bad measures. - Daniel Webster
Re: extract string from filename
On Jan 12, 2006, at 8:25 PM, Ben Scott wrote: It sounds like you could use a tutorial on Unix text processing and command line tools, specifically, one which addresses pipes and redirection, as well as the standard text tools (grep, cut, sed, awk, etc.). While Paul's recommendation about the O'Reilly regular expressions book is valid, I suspect it might be a little too focused on regex's and not cover some of the *other* elements you seem to be needing. Gee, I wonder if that would be a good topic for a meeting . Bruce Dawson and David Berube did a presentation on Regular expressions that helped me grasp what they were and why I'd want to know more. Bought the Reg Exp book on my next visit to SoftPro . A similar kind of presentation that explained the place of sed, grep, awk, pipes, redirection, tee and so forth. It's been forever for me, but I seem to recall that _Unix Power Tools_, also published by O'Reilly, covers all of the above and much, much more. If others on this list second my suggestion, you might want to obtain a copy. Alternatively, maybe list members can suggest alternatives? Re: UNIX Power Tools. Third time I've heard that recommended. Guess I'll add that to my wish list. Jerry Peek (http://www.oreillynet.com/pub/au/28 - a number of articles and book extracts linked here), one of the original authors of Unix Power Tools, has been running a series in Linux Magazine for a while now on working from the command line, including the inscrutable 2&1 and other arcana. Linux magazine is online at http://www.linux-mag.com/ and posts their issues sixty days after publication at http://www.linux-mag.com/ backissues/. Ben's other links are quite useful, too. The Answers Are Out There. The challenge is finding the answer you need now. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
Zhao Peng writes: > I'm back, with another "extract string" question. //grin find FOLDERNAME -name \*sas7bdat -print | sed 's/.*\///' | cut -d _ -f 2 | sort -u > somefile.txt or perl -MFile::Find -e 'find(sub{$string2 = (split /_/)[2]; $seen{$string2}++; }, @ARGV); map { print "$_\n"; } keys(%seen)' FOLDERNAME (which looks more readable as: perl -MFile::Find -e 'find(sub{ $string2 = (split /_/)[2]; $seen{$string2}++; }, @ARGV); map { print "$_\n"; } keys(%seen)' \ FOLDERNAME > somefile.txt ) Either of which solves the problem that you describe. Actually, they solve more than the problem that you describe, since it wasn't apparent to me if you had any subdirectories here, but this is solved too) (substitute FOLDERNAME with your directory's name) Honestly, the first solution I present is the way I would have solved this problem myself. Very fast this way. Regards, --kevin -- (There are also also 228 babies named Unique during the 1990s alone, and 1 each of Uneek, Uneque, and Uneqqee.) -- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner [but no Unix folks named their kids "uniq", apparently. --kevin] ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
On Thu, 2006-01-12 at 19:40 -0500, Zhao Peng wrote: > For example: > abc_st_nh_num.sas7bdat > abc_st_vt_num.sas7bdat > abc_st_ma_num.sas7bdat > abcd_region_NewEngland_num.sas7bdat > abcd_region_South_num.sas7bdat You're not the only one learning here. I put these names into a file called str2-test-data $ cut -d _ -f 2 str2-test-data | sort | uniq region st I think that you could use: ls | cut -d _ -f 2 | sort | uniq > str2-results.txt -- Lloyd Kvam Venix Corp ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string from filename
On 1/12/06, Zhao Peng <[EMAIL PROTECTED]> wrote: > I'm back, with another "extract string" question. //grin It sounds like you could use a tutorial on Unix text processing and command line tools, specifically, one which addresses pipes and redirection, as well as the standard text tools (grep, cut, sed, awk, etc.). While Paul's recommendation about the O'Reilly regular expressions book is valid, I suspect it might be a little too focused on regex's and not cover some of the *other* elements you seem to be needing. It's been forever for me, but I seem to recall that _Unix Power Tools_, also published by O'Reilly, covers all of the above and much, much more. If others on this list second my suggestion, you might want to obtain a copy. Alternatively, maybe list members can suggest alternatives? There are also a number of free guides at the Linux Documentation Project. See: http://www.tldp.org/guides.html Look for anything mentioning "bash" (the Bourne-again shell) or scripting. I can't speak as to how good they are, but you can't beat the price. Anyway, on to your question... > I tried to use "cut" commands, but can't even figure out how to use the > filenames as input. Anyone care to offer me a hint? You'll want to pipe the output of "ls" to cut. This should get you started: ls -1 | cut -d _ -f 2 The "-1" switch to ls(1) tells it to output a single column of file names. Some versions of "ls" do this automagically when using redirection, but it is best to be sure. The "-d _" switch to cut(1) tells cut to split fields on the underscore. The "-f 2" selects the second field. See also: sort(1), uniq(1) Hope this helps! -- Ben "Unix plumber" Scott ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
extract string from filename
Hi all, I'm back, with another "extract string" question. //grin I have almost 1k small files within one folder. The only pattern of the file names is: string1_string2_string3_string4.sas7bdat Note: 1, string2 often repeat itself across each file name For example: abc_st_nh_num.sas7bdat abc_st_vt_num.sas7bdat abc_st_ma_num.sas7bdat abcd_region_NewEngland_num.sas7bdat abcd_region_South_num.sas7bdat 2, All 4 strings contain no underscores. 3, 4 strings are separated by 3 underscores (as you can see) 4, The length of all 4 strings are not fixed. My goal is to : 1, extract string2 from each file name 2, then sort them and keep only unique ones 3, then output them to a .txt file. (one unique string2 per line) I tried to use "cut" commands, but can't even figure out how to use the filenames as input. Anyone care to offer me a hint? I also downloaded an e-book called "Learning Perl" (OReilly, 4th.Edition), and had a quick look thru its Contents of Table, but did not find any chapter which looks likely addressing any issue related to my question. Thank you very much! Zhao ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/11/06, Ben Scott <[EMAIL PROTECTED]> wrote: > I felt really embarrassed with my stupid mistake. //blushYou think you were embarrassed? There was a certain instance of someone accidentally hitting "Reply to All" to a list message which isstill remembered to this day. I won't mention any names because Idon't want to get any squeegees in trouble. ;-) God damned I wish I had an archive of that thread. Anyone have one? Thomas
Re: extract string
Zhao Peng <[EMAIL PROTECTED]> writes: > You said that "there is an extra column in the 3rd line". I disagree > with you from my perspective. As you can see, there are 3 commas in > between "jesse" and "Dartmouth college". For these 3 commas, again, if > we think the 2nd one as an merely indication that the value for age > column is missing, then the 3rd line will be be read as ["jesse", > MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth > college"] as you suggested. >From my perspective, your file format makes it harder to be parsed. If at all possible, I would suggest that if you can, you modify this file's format. Still, if this isn't possible, this works on your input: perl -lane 's/,,/,MISSING/g; @F = split /,/; if (index($F[-1], "univ") != -1) { ($u = $F[-1]) =~ y/"//d; print $u }' Formatted more readibly, this looks like this: perl -lne 's/,,/,MISSING/g; @F = split /,/; if (index($F[-1], "univ") != -1) { ($u = $F[-1]) =~ y/"//d; print $u }' This seems to be a reasonable solution to your problem. I hope it helps. Just another Perl hacker, --kevin -- GnuPG ID: B280F24E ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
perl split on the char pair ," Take last element of returned array, either remove the " at the end or replace the one you ate with the split. Keep a running variable containing largest length encountered so far. Add 10 to be safe. ;-) Any regexp I have to think about for more than 30 seconds is unlikely to be used unless it greatly improves my execution speed...and then only if I have a LOT of data to process. :-) --Drew "Not showing you my crufty perl" VZ
Re: extract string
On 1/11/06, Bill McGonigle <[EMAIL PROTECTED]> wrote: On Jan 11, 2006, at 08:42, [EMAIL PROTECTED] wrote:> This poses an interesting problem. The "," is being used for two > purposes: a delimiter *AND* as a place holder.Now, for the Lazy, Perl regular expressions are a state machine ofsorts. I suspect you might be able to do the right thing withgreedy/non-greedy matches. Someone who lives and breathes regex might have a better handle on this. It would take me two hours to get thisone figured out. Hehe, it'd be one of those really, REALLY ugly Regular expressions that, when you stare at it long enough, looks like ASCII art in order to make it 100%. ;-) Thomas
Re: extract string
On 1/11/06, Zhao Peng <[EMAIL PROTECTED]> wrote: > Secondly I'm sorry for the big stir-up as to "homework problems" which > flooded the list, since I'm origin of it. *Trust me*, that wasn't a "big" stir-up. Search the list archives for "taxes" if you want to see big ones. The homework thread was even more remarkable for being a debate we *haven't* had before ad nasueam. :) > Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" > works. I mis-read /\ as a simliar sign on the top of "6" key on the > keyboard ... I was wondering if it might be a transcription error. While shell syntax and regular expressions are very powerful, they tend to be very cryptic as well. That's why I spelled out what each character was. The ^ character is called a "caret", by the way. > I felt really embarrassed with my stupid mistake. //blush You think you were embarrassed? There was a certain instance of someone accidentally hitting "Reply to All" to a list message which is still remembered to this day. I won't mention any names because I don't want to get any squeegees in trouble. ;-) -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On Jan 11, 2006, at 08:42, [EMAIL PROTECTED] wrote: This poses an interesting problem. The "," is being used for two purposes: a delimiter *AND* as a place holder. I tried to prove to myself last night that this method would produce unresolvable ambiguities, but if you think like a state machine, character-by-character, it seems to work. Now, for the Lazy, Perl regular expressions are a state machine of sorts. I suspect you might be able to do the right thing with greedy/non-greedy matches. Someone who lives and breathes regex might have a better handle on this. It would take me two hours to get this one figured out. This format sure makes the parser harder though, so if there's another way to get the data that's going to be desirable. You can't use Text::CSV::Simple anymore, for instance, which gives you a 15-minute explicit reusable solution. -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Cell: 603.252.2606 http://www.bfccomputing.com/Page: 603.442.1833 Blog: http://blog.bfccomputing.com/ VCard: http://bfccomputing.com/vcard/bill.vcf ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/11/06, Zhao Peng <[EMAIL PROTECTED]> wrote: Hi All,First I really cannot be more grateful for the answers to my questionfrom all of you, I appreciate your help and time. I'm especially touched by the outpouring of response on this list., which I have neverexperienced before anywhere else. I hope my little comment didn't seem mean, I was more poking fun at the fact that if someone posted a simular post, and called themselves a Systems Administrator on a Windows network, comments simular to mine would have come forth.. ;-) Secondly I'm sorry for the big stir-up as to "homework problems" whichflooded the list, since I'm origin of it. Nah, it wasn't a flood. Trust me, once you see a flood, you'll know it. Usually, it's becouse someone says something political in nature. Kenny, regarding missing column issue, let me try to explain it again.Below is quoted from my original post: Also, if one column is missing, and "," is used to indicate that missingcolumn, like the following (2nd column of 3rd line is missing):"name","age","school" "jerry" ,"21","univ of Vermont""jesse",,,"Dartmouth college""jack","18","univ of Penn""john","20","univ of south Florida" ===You said that "there is an extra column in the 3rd line". I disagreewith you from my perspective. As you can see, there are 3 commas inbetween "jesse" and "Dartmouth college". For these 3 commas, again, if we think the 2nd one as an merely indication that the value for agecolumn is missing, then the 3rd line will be be read as ["jesse",MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth college"] as you suggested. This is unusual, as typically, a comma delimited set of values would simply have nothing between the commas, or a set of quotes with no data. Typically the line would look like this: "jesse",,"Dartmouth college" Or "jesse","","Dartmouth college" Paul, as to your "simplest by what measurement" question. I was thinkingof both "easiest to remember" and "easiest to understand" when I was posting my question. Now I desire for "most efficient" approach. I knowthat will be my homework. If this is something that you will be doing repeatedly for different files types, I'd highly suggest getting familiar with regular expressions. You've seen a small snippet in Kenny's example 'sed s/\"//g'. The 's/\"//g' says to globally replace all quotes with nothing (s = substitute, /1/2/ says 'replace everything matching 1 with 2', in this case, a quote, with nothing. g means globally, aka, do it more then just once. Regular expressions are a powerful way to parse text files based on a given pattern, to get at the data you want. Part of my primary job responsibilities is to convert raw data into SASdata sets. My "extract string" question comes from processing a raw data file in .txt format, which doesn't have any documentation, except thevariable list. By looking at the raw data, I know that each variable isseparated by a comma. For one particular variable(column) called"school", the length of some of its value is quite long(like: Univ of Wisconsin at Madison, Health Sci Ctr), but I don't know the definitelength. I need to know it, because if the length I specify it notenough, only partial values will be read. Many of its values contain"univ", so I just thought if I could extract all strings containing "univ" from that variable(column), I will have a better chance to figureout the length of "school". That's why I had this question. Haven't even run it, but something perl like: my $maxlen = 0;while(<>) { /^(.*),(.*),(.*)$/; if(length($3) > $maxlen) { $maxlen = $3; }}print "Longest String in third column is $maxlen\n"; This would read on STDIN till it couldn't read anymore. Each line, it would split based on the commas (If the third column contains commas, this won't work, becouse $2 or $1 would be greedy and gobble some of the data, FYI), and check the length of the third field against max length. If it's longer, assign it. At the end, print it out. This Regular _expression_ isn't great, but it's the 20 second typing version. Thomas
Re: extract string -- TIMTOWTDI
William D Ricker <[EMAIL PROTECTED]> writes: >> On 1/10/06, Paul Lussier <[EMAIL PROTECTED]> wrote: >> > > perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' < >> > > abc.txt > def.txt >> > Egads! [outstanding explanation I didn't have time to write myself removed ] > None of this is seriously obfuscatory golfing, but if someone wanted to > say darn the cost of forking new processes off bash, 'awk/cut|grep|sed' > is easier to read, well, I won't argue that it's easier for him/her > to read, and they should do it that way -- unless they need to tune > for performance. I would, however, offer that if someone were to find 'awk/cut|grep|sed' easier to read, then that person a) wouldn't have asked this question ;) and b) would certainly benefit from learning perl for those times when "the cost of forking new processes off bash" can't be ignored for some reason :) Additionally, perl offers the benefit of a debugger which can be immensely helpful for even simple "one liner" tasks. -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Zhao Peng writes: > ... your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" > works. It "works" but is it correct? What happens if you pass it the following line of input?: "Aunivz","28","Cambridge Community College" By your original problem description, you don't want to see "Cambridge Community College" but there it is. I might have overlooked something, but I believe that I have only seen two people post correct solutions so far. Just something to think about. Regards, --kevin -- (There are also also 228 babies named Unique during the 1990s alone, and 1 each of Uneekm, Uneque, and Uneqqee.) -- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner [but no Unix folks named their kids "uniq", apparently. --kevin] ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Zhao Peng <[EMAIL PROTECTED]> writes: > First I really cannot be more grateful for the answers to my question > from all of you, I appreciate your help and time. I'm especially > touched by the outpouring of response on this list., which I have > never experienced before anywhere else. Zhao, this is a pretty amazing list, as you and many others have discovered. It's seldom I find as good, or complete, answers anywhere else. And most often, the ensuing discussion is more interesting, educational, and enlightening than the original question posed. (It's often amusing to me when I google for an answer to a question and within the top 10 returns from google is a reference to this list. More amusing is when it was *I* who answered the question for someone else which I am now asking :) > Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" > works. I mis-read /\ as a simliar sign on the top of "6" key on the > keyboard(so when I typed that sign, I felt strange that it is much > smaller than /\, but didn't realize that they just are not the same > thing), instead of forward slash and back slash. I felt really > embarrassed with my stupid mistake. //blush A, this makes so much more sense now. So you in fact typed something like: grep univ abc.txt | cut -f3 -d, | sed s/^/\>/g ? That still doesn't end up with a '>' in def.txt, but depending upon exactly what you typed, I can certainly see where the use of ^ instead of /\ could result in something like that. For educational purposes, the use of ^ is to "anchor" following pattern to match from the beginning of the line. Therefore: sed 's/foo/bar/g' and sed 's/^foo/bar/g' are very different, since the former results in all occurrences of 'foo' being replaced with 'bar', whereas the latter only changes foo to bar when foo is found at the beginning of the line. The use of '$' in a pattern does exactly the same thing, except for it anchors patterns at the *end* of a line. Btw, I highly recommend reading the O'Reilly book on Regular Expressions. If you're going to be doing a lot of this type of data mining, a solid understanding of regexps and mastery of perl will make your life significantly more fun. Also, you might want to play with with writing perl/shell scripts that output data parseable by gnuplot which allow you to auto-generate some rather interesting and complicated graphs of the data (I know SAS can do all this, but I bet it's no where as interesting or fun as learning the UNIX way of doing it, and you don't need an SAS license either ;) > You said that "there is an extra column in the 3rd line". I disagree > with you from my perspective. As you can see, there are 3 commas in > between "jesse" and "Dartmouth college". For these 3 commas, again, if > we think the 2nd one as an merely indication that the value for age > column is missing, then the 3rd line will be be read as ["jesse", > MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth > college"] as you suggested. If you're going to be doing a lot of this type of thing, then perl will most definitely be your best friend :)a > Paul, as to your "simplest by what measurement" question. I was > thinking of both "easiest to remember" and "easiest to understand" > when I was posting my question. Now I desire for "most efficient" > approach. I know that will be my homework. Well, again, most efficient by what measurement. In the long run, I'm going to bet it's in your best interests to learn perl, since it's one tool which will allow you write rather small and arbitrarily complex scripts which would mostly obviate the need to learn several different tools like cut, sed, awk, comm, etc. In fact, learning perl will likely lead you to learn about these other tools over time as the situation dictates, but make you vastly more productive in the short term. Since perl excels at textual manipulation, it's perfect for this type of data analysis. And, since perl, combined with gnuplot, is simple to run from an Apache web server Well, I'm sure your imagination will lead you to wherever you need to go :) Good luck, and please feel free to post more interesting questions. -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
-- Original message -- From: Zhao Peng <[EMAIL PROTECTED]> > Hi All, > > Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" > works. I mis-read /\ as a simliar sign on the top of "6" key on the > keyboard(so when I typed that sign, I felt strange that it is much > smaller than /\, but didn't realize that they just are not the same > thing), instead of forward slash and back slash. I felt really > embarrassed with my stupid mistake. //blush It happens. Believe me, I have done much dumber things in my time :-) > Kenny, regarding missing column issue, let me try to explain it again. > Below is quoted from my original post: [SNIP] > You said that "there is an extra column in the 3rd line". I disagree > with you from my perspective. As you can see, there are 3 commas in > between "jesse" and "Dartmouth college". For these 3 commas, again, if > we think the 2nd one as an merely indication that the value for age > column is missing, then the 3rd line will be be read as ["jesse", > MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth > college"] as you suggested. This poses an interesting problem. The "," is being used for two purposes: a delimiter *AND* as a place holder. Unfortunately, cut and the like will see it as a delimiter and only a delimiter. It's what they do. I think that you may need to use the awk line that I sent, or some of the perl one-liners to get just the last column. Otherwise, you will end up with emty fields. > For one particular variable(column) called > "school", the length of some of its value is quite long(like: Univ of > Wisconsin at Madison, Health Sci Ctr), but I don't know the definite > length. I need to know it, because if the length I specify it not > enough, only partial values will be read. Many of its values contain > "univ", so I just thought if I could extract all strings containing > "univ" from that variable(column), I will have a better chance to figure > out the length of "school". That's why I had this question. This is going to be another problem. Every "," that is used is going to be seen as a dilimiter. If the school name has a "," in it as there is between Madison and Health above. That means that taking just the last field will not work either. I think that the easiest thing to do in this case is to change the delimiter to something that is unlikely to be found in any of the columns, like a ":". C-Ya, Kenny ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Zhao, I am really busy right now, so I have not read all of the responses to your problem completely, but I did notice this: [EMAIL PROTECTED] said: > You said that "there is an extra column in the 3rd line". I disagree with > you from my perspective. As you can see, there are 3 commas in between > "jesse" and "Dartmouth college". For these 3 commas, again, if we think the > 2nd one as an merely indication that the value for age column is missing, > then the 3rd line will be be read as ["jesse", MISSING, "Dartmouth > college"], not ["jesse",empty,empty, "Dartmouth college"] as you suggested. A lot of these textual commands depend on the concept of a "field delimiter". In your first example, it seemed clear that a possible "field delimiter" was the comma (","), and so if you saw two commas together, it represented an "empty" field. Not a "missing" field, because the field was technically still thereit just had NO data in it. When you included the line: "jesse",,,"Dartmouth college" and claimed that the middle comma represented a missing age, to a textual based scanning program that has been told that the comma is a field separator means that there are now four fields in the line, not just three. If, from the beginning, you had shown that you meant for the comma to be used both as a delimiter and as a piece of data, then a lot of the answers would have been completely different (and probably considerably more complex). md -- Jon "maddog" Hall Executive Director Linux International(R) email: [EMAIL PROTECTED] 80 Amherst St. Voice: +1.603.672.4557 Amherst, N.H. 03031-3032 U.S.A. WWW: http://www.li.org Board Member: Uniforum Association, USENIX Association (R)Linux is a registered trademark of Linus Torvalds in several countries. (R)Linux International is a registered trademark in the USA used pursuant to a license from Linux Mark Institute, authorized licensor of Linus Torvalds, owner of the Linux trademark on a worldwide basis (R)UNIX is a registered trademark of The Open Group in the USA and other countries. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Hi All, First I really cannot be more grateful for the answers to my question from all of you, I appreciate your help and time. I'm especially touched by the outpouring of response on this list., which I have never experienced before anywhere else. Secondly I'm sorry for the big stir-up as to "homework problems" which flooded the list, since I'm origin of it. Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt" works. I mis-read /\ as a simliar sign on the top of "6" key on the keyboard(so when I typed that sign, I felt strange that it is much smaller than /\, but didn't realize that they just are not the same thing), instead of forward slash and back slash. I felt really embarrassed with my stupid mistake. //blush Kenny, regarding missing column issue, let me try to explain it again. Below is quoted from my original post: Also, if one column is missing, and "," is used to indicate that missing column, like the following (2nd column of 3rd line is missing): "name","age","school" "jerry" ,"21","univ of Vermont" "jesse",,,"Dartmouth college" "jack","18","univ of Penn" "john","20","univ of south Florida" === You said that "there is an extra column in the 3rd line". I disagree with you from my perspective. As you can see, there are 3 commas in between "jesse" and "Dartmouth college". For these 3 commas, again, if we think the 2nd one as an merely indication that the value for age column is missing, then the 3rd line will be be read as ["jesse", MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth college"] as you suggested. Paul, as to your "simplest by what measurement" question. I was thinking of both "easiest to remember" and "easiest to understand" when I was posting my question. Now I desire for "most efficient" approach. I know that will be my homework. BTW, A bit about me: I'm a junior SAS programmer at Dartmouth Medical school. (FYI: core strength of SAS lies in statistical analysis, I think, so you could say it's a statistical software, check www.sas.com). We run SAS on a RedHat server, but I basically know nothing about linux before I started working on this position(July, 2005). Fortunately, SAS programming doesn't require much linux knowledge. However, as you can imagine, at least I need to know some basic linux commands since I work on linux platform. Part of my primary job responsibilities is to convert raw data into SAS data sets. My "extract string" question comes from processing a raw data file in .txt format, which doesn't have any documentation, except the variable list. By looking at the raw data, I know that each variable is separated by a comma. For one particular variable(column) called "school", the length of some of its value is quite long(like: Univ of Wisconsin at Madison, Health Sci Ctr), but I don't know the definite length. I need to know it, because if the length I specify it not enough, only partial values will be read. Many of its values contain "univ", so I just thought if I could extract all strings containing "univ" from that variable(column), I will have a better chance to figure out the length of "school". That's why I had this question. Thank you all again! Zhao ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string -- TIMTOWTDI
> On 1/10/06, Paul Lussier <[EMAIL PROTECTED]> wrote: > > > perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' < > > > abc.txt > def.txt > > Egads! That's a literal start at a Perl "bring the grep sed and cut-or-awk into one process", but it's not maximally Perl-ish. It is also inefficient, it does the side-effecting "" removal before discarding non-univs. It is a literal translation of a `cut | sed | grep` pipe, not of the `cut|grep|sed` pipe shown earlier. Of coures, if the requirements allowed it, a `grep|cut|sed` pipe would be the best shell impelementation -- but that relies upon knowning all 'univ' are in the desired column, which we haven't been granted. If it weren't for the desire to drop the quotes, Perl couldn't beat Cut's golf-score (key stroke count) on this one anyway, but we can try to optimize expressivity while saving two of three process forks. The Perl Motto is TIMTOWTDI: There Is More Than One Way TO Do It. (We've already seen that this is often true for BASH too.) This is usually a good thing, as often some are better for some requirements than for others. For generalness in real code, I'd like to explicitly ignore the header line on this sort of CSV file: $perl -F, -lane 'next if $.==1 or $F[-1] !~/univ/; print $F[-1]=~m/"(.*)"/;' univ of Vermont univ of Penn univ of south Florida $ As with one prior posting, the '-naF,' args cause Perl to auto-split on ',' into @F on each line. I normally used '-F, -lane' on one-liners, since it's memorable. The '$.==1 or' is not strictly required since the top line of the sample file had "school" not "university" for the column head, but it it had "university or school" or "school/univ" on line 1, would be required. $F[-1] is Perl's equivalent to AWK's NF, referring to the last column, instead of by number. (By number is notoriously error prone with 0-based field counting). $F[-2] means last-but-one, etc, too, and you can slice with them as @F[-6..-2] . Rather than remove the "" with s/"//g, I've captured what's between them and printed that. We can make more use of -F ... we'll split on all the punctuation. $ perl -F'/^"|","|"$/' -lane 'next if $.==1 or $F[-1] !~/univ/i; print $F[-1]' schools.txt univ of Vermont univ of Penn univ of south Florida $ Of course, some CSV files the ""'s are optional. In qhich case we can do $perl -F'/^"|"?\s*,"?|"$/' -lane 'next if $.==1 or $F[-1] !~/univ/i; print $F[-1]' schools.txt univ of Vermont univ of Penn univ of south Florida $ Alternatively, to print **any** quoted phrase containing univ, whether in last column or not, using the commas .. $perl -F, -lane 'for (@F){s/"//g; print if /univ/i}' schools.txt univ of Vermont univ of Penn univ of south Florida $ or ignoring the commas, just uses the quotes to capture between quotes, but only if there's a univ between. I started sneaking in a /i flag to be case insensitive above, and I'll continue here ... $perl -lane 'print for m{ " ( [^"]*? univ [^"]* ) " }xig' schools.txt univ of Vermont univ of Penn univ of south Florida $ [the ? isn't required but it should help efficiency.] or $ perl -lane 'print for grep {/univ/i} m{"([^"]*)"}g' schools.txt univ of Vermont univ of Penn univ of south Florida There's also a CPAN module or two for processing CSV files that handles the commas and quotes in CSV files ... http://search.cpan.org/search?query=Text%3A%3ACSV&mode=all Your Linux distro should have Text::CSV_XS as a apt/yum/rpm/... module option, or grab it from CPAN and build. (It has an XS => .c module, so is ripping fast, but has to be make'd.) None of this is seriously obfuscatory golfing, but if someone wanted to say darn the cost of forking new processes off bash, 'awk/cut|grep|sed' is easier to read, well, I won't argue that it's easier for him/her to read, and they should do it that way -- unless they need to tune for performance. -- /"\ Bill Ricker N1VUX [EMAIL PROTECTED] \ / http://world.std.com/~wdr/ X Member of the ASCII Ribbon Campaign Against HTML Mail / \ ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On Tuesday 10 January 2006 06:05 pm, Travis Roy wrote: > Just let it go, if you think it's somebody "cheating" then don't > answer, or give them a vague answer or point them to places where > they can learn about it rather then copy it off of. That is my technique too. I get to answer a lot of questions about IR optics and detector physics on my web site. If it appears I am doing someone's term project, I start outlining how to get the answers to the problem rather than giving the answers. I must confess, though, that I have a hard time keeping my mouth (keyboard) shut if I know the answer. There have always been cheaters, but the American culture has changed dramatically in the last 40 years. "Claiming the work of others as your own" has become completely accepted. Jim Kuzdrall ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On Tue, Jan 10, 2006 at 08:16:47PM -0500, Christopher Schmidt wrote: > On Tue, Jan 10, 2006 at 07:56:46PM -0500, Thomas Charron wrote: > > A programmer that doesn't know how to grep and split text strings.. > > > > Well.. Isn't.. > > I know of several ways to do it, but none of them would have worked as > well as the cut solution presented here. I've been working on Linux as > my primary platform for 2.5 years, I've been coding in various languages > for 5. > > I'm relatively intelligent, know how to use awk, grep, and sed. > > Considering the huge number of programmers who are doomed to forever > live and work in a GUI-only MSVC++ (or whatever it's called) without the > tools such as sed, grep and awk, I'd say I'm in the top 50% as far as > knowledge goes for programmers -- and I think I'm probably being > relatively modest. > > The lack of knowledge of a simple command line tool to do what you want > it to does not indicate whether someone is a programmer or not. It > simply indicates one thing -- their level of experience with core *nix > tools. Lack of that is not an indication of deficiencies in their > ability to program. Easily fixed, All we need is the appropriate man page:.. http://ars.userfriendly.org/cartoons/?id=19990216;-) I have that one on the cover of my "Intro to Linux" slides -- Jeff Kinz, Emergent Research, Hudson, MA. speech recognition software may have been used to create this e-mail "The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding." - Brandeis To think contrary to one's era is heroism. But to speak against it is madness. -- Eugene Ionesco ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On Jan 10, 2006, at 20:16, Christopher Schmidt wrote: The lack of knowledge of a simple command line tool to do what you want it to does not indicate whether someone is a programmer or not. It simply indicates one thing -- their level of experience with core *nix tools. Lack of that is not an indication of deficiencies in their ability to program. Right - Zhao is pretty new to unix and linux. For those following along, notice he didn't say, 'is there any way to do this' but 'what's the most efficient way to do this'? (paraphrasing). What matters is not whether one has achieved enlightenment but rather that one is on the path to enlightenment. -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Cell: 603.252.2606 http://www.bfccomputing.com/Page: 603.442.1833 Blog: http://blog.bfccomputing.com/ VCard: http://bfccomputing.com/vcard/bill.vcf ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On Tue, Jan 10, 2006 at 07:56:46PM -0500, Thomas Charron wrote: > A programmer that doesn't know how to grep and split text strings.. > > Well.. Isn't.. I know of several ways to do it, but none of them would have worked as well as the cut solution presented here. I've been working on Linux as my primary platform for 2.5 years, I've been coding in various languages for 5. I'm relatively intelligent, know how to use awk, grep, and sed. Considering the huge number of programmers who are doomed to forever live and work in a GUI-only MSVC++ (or whatever it's called) without the tools such as sed, grep and awk, I'd say I'm in the top 50% as far as knowledge goes for programmers -- and I think I'm probably being relatively modest. The lack of knowledge of a simple command line tool to do what you want it to does not indicate whether someone is a programmer or not. It simply indicates one thing -- their level of experience with core *nix tools. Lack of that is not an indication of deficiencies in their ability to program. I'm assuming that your post was made with tongue in cheek, but I think it's a ridiculous statement and decided to do what all good people on the internet do: blow it out of proportion in a rant on a mailing list that few will ever care about. (I think I'm supposed to call you Hitler now or something. Godwin told me that once.) -- Christopher Schmidt Web Developer signature.asc Description: Digital signature
Re: Homework problems (was: extract string)
On Tue, Jan 10, 2006 at 07:56:46PM -0500, Thomas Charron wrote: > On 1/10/06, Bill McGonigle <[EMAIL PROTECTED]> wrote: > > > > On Jan 10, 2006, at 18:05, Travis Roy wrote: > > > How do we, as a list, tell what's a homework problem and what's a > > > legit question. > > I think there's little substitute for knowing the membership. Zhao is > > a programmer for Dartmouth Medical School. > > > > For, or attending? ;-) Hopefully things haven't gotten so bad that programmers are now attending medical school for their next career move. ;-) -- Jeff Kinz, Emergent Research, Hudson, MA. speech recognition software may have been used to create this e-mail "The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding." - Brandeis To think contrary to one's era is heroism. But to speak against it is madness. -- Eugene Ionesco ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On 1/10/06, Thomas Charron <[EMAIL PROTECTED]> wrote: >A programmer that doesn't know how to grep and split text strings.. Believe it or not, there are environments *other* then nix, and a great many well-qualified professionals have never touched nix. I don't just mean doze, either. Classic Mac, VMS, the various IBM mainframe and mini systems, and other, less well know worlds have syntax and tools all their own. I, for one, think we should be welcoming to newcomers to the nix world -- not scold them for being new. -- Ben "Used to be a DOS weenie" Scott ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On 1/10/06, Bill McGonigle <[EMAIL PROTECTED]> wrote: On Jan 10, 2006, at 18:05, Travis Roy wrote:> How do we, as a list, tell what's a homework problem and what's a> legit question.I think there's little substitute for knowing the membership. Zhao isa programmer for Dartmouth Medical School. For, or attending? ;-) A programmer that doesn't know how to grep and split text strings.. Well.. Isn't.. Thomas
Re: extract string
Ben Scott <[EMAIL PROTECTED]> writes: > On 1/10/06, Jon maddog Hall <[EMAIL PROTECTED]> wrote: >> I was the senior systems administrator for Bell Labs in North Andover, MA. I >> got the job without ever having seen a UNIX system. > > Well, really. How many people *had* seen a UNIX system, back then? ;-) Two; *Kernighan* and *Ritchie* ;) This *was* Bell Labs, right ;) -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On Tue, Jan 10, 2006 at 04:01:05PM -0500, Ben Scott wrote: > On 1/10/06, Jeff Kinz <[EMAIL PROTECTED]> wrote: > > Now your Lug can achieve its financial funding goals simply by charging > > 25 cents for each shell scripting homework problem answered and 50 cents > > for extended explanations such as rendered below. :-) > > I was wondering if I should raise the "Ya know, this looks an awful > lot like a homework problem to me..." question. But I also considered > the following: > My "homework business model" was simply a tongue in cheek comment 'cause I was leaving and didn't have time to add anything substantive to the thread. > Assume it is a homework problem. Does it make a real difference > whether the student learns the material from the text book, this list, > or some random web page found via Google? > I've always found all of the above to be useful tools for learning. Well, not always - Google hit a dry spell from 1973 to 1997 or thereabouts :-) umm - wait a minute... (Googles for Google founding date.. ) 1998. > And if the student hands in a Perl one-liner in a basic class on > shell scripting, the resulting student/instructor discuss will > doubtless by very educational. heh heh, very! -- Jeff Kinz, Emergent Research, Hudson, MA. speech recognition software may have been used to create this e-mail "The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding." - Brandeis To think contrary to one's era is heroism. But to speak against it is madness. -- Eugene Ionesco ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
RE: Homework problems (was: extract string)
> -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Jim Kuzdrall > Sent: Tuesday, January 10, 2006 5:45 PM > To: gnhlug-discuss@mail.gnhlug.org > Subject: Re: Homework problems (was: extract string) > > > All of us will care when the country has to depend on the > products of today's education system. Get ready for it. The > standards are so incredibly low that these graduates will not > even know the buzz words of technology. > Our country, as a whole, has never really "depended" on the cheaters and slackers (we'll skip the shooting-fish-in-a-barrel political jokes here). Those are the people that are sweeping our hallways, hanging bumpers on the assembly lines, etc. Our country participates in the World Economy, we trend toward the solutions that provide the best cost/performance ratio, often looking globally for the answer. Our justice system was founded on the idea that "It is better to let 1000 guilty men go free than to convict 1 innocent man". On a Linux/FOSS oriented list, newsgroup, etc, I would rather answer 1000 homework questions, than risk alienating 1 potential comrade. And, as we've come to find out, the OP was simply asking a legitimate question that merely appeared "homeworky". I don't have enough time to be that judgmental and concerned about the impact on the freedom of our country because someone is too lazy to do their own homework. Ask a question of me, if I can I'll answer it, if I can't, I'll try to point you in the right direction. If you get Extra Credit for my answer, then you owe me a dollar, you can paypal it to: [EMAIL PROTECTED] :) ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On Jan 10, 2006, at 18:05, Travis Roy wrote: How do we, as a list, tell what's a homework problem and what's a legit question. I think there's little substitute for knowing the membership. Zhao is a programmer for Dartmouth Medical School. -Bill - Bill McGonigle, Owner Work: 603.448.4440 BFC Computing, LLC Home: 603.448.1668 [EMAIL PROTECTED] Cell: 603.252.2606 http://www.bfccomputing.com/Page: 603.442.1833 Blog: http://blog.bfccomputing.com/ VCard: http://bfccomputing.com/vcard/bill.vcf ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
-- Original message -- From: Jim Kuzdrall <[EMAIL PROTECTED]> > On Tuesday 10 January 2006 04:13 pm, Brian wrote: > > Answer C: Who cares? > > All of us will care when the country has to depend on the products > of today's education system. Get ready for it. The standards are so > incredibly low that these graduates will not even know the buzz words > of technology. > > The professors look the other way on all manner of things that were > "cheating" when I got my education. Copying from other students, > lifting solutions from the Internet, putting all the test answers on > one's programmable calculator, and other guarantees of ignorance are > completely acceptable. While I agree with the sentiment, I have to disagree in this case. I don't see this so much as cheasting as it is augmenting the education. There have been several examples given to the original poster, who will now have to learn which is the best way for them to proceed. What they are learning is where to look for the answers to problems, which I find to be of greater value then just learning a quick one-line shell command (or string of commands as the case may be). > The cruel world will reveal their error when it is too late for them > or us to recover from it. > > It is not just in tech schools. Can you imagine a cum laude English > graduate who cannot spell or write a grammatical sentence? Well, our > family has just paid for one. In an English class, a Math class, I can agree with you. However, in the technology sector I disagree. I mean, seriously, isn't OSS all about code re-use and working as a group to furthur a project? Just my $0.02, Kenny ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
Jim Kuzdrall wrote: On Tuesday 10 January 2006 04:13 pm, Brian wrote: Answer C: Who cares? All of us will care when the country has to depend on the products of today's education system. Get ready for it. The standards are so incredibly low that these graduates will not even know the buzz words of technology. All education system debate aside... How do we, as a list, tell what's a homework problem and what's a legit question. And if we start blocking "homework questions" a cheater will just work around that and word their question into something that seems like a personal or work related problem rather then a homework problem. Just let it go, if you think it's somebody "cheating" then don't answer, or give them a vague answer or point them to places where they can learn about it rather then copy it off of. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
On Tuesday 10 January 2006 04:13 pm, Brian wrote: > Answer C: Who cares? All of us will care when the country has to depend on the products of today's education system. Get ready for it. The standards are so incredibly low that these graduates will not even know the buzz words of technology. The professors look the other way on all manner of things that were "cheating" when I got my education. Copying from other students, lifting solutions from the Internet, putting all the test answers on one's programmable calculator, and other guarantees of ignorance are completely acceptable. The cruel world will reveal their error when it is too late for them or us to recover from it. It is not just in tech schools. Can you imagine a cum laude English graduate who cannot spell or write a grammatical sentence? Well, our family has just paid for one. Jim Kuzdrall ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Ben Scott writes: > Is there a tool that quickly and easily extracts one or more columns > of text (separated by whitespace) from an output stream? I'm familiar > with the > > awk '{ print $3 }' > > mechanism, but I've always felt that was clumsy. I've tried to get > cut(1) to do it in the past, but the field separator semantics appear > to assume one and only one separator, not "whitespace" (one or more > space or tab characters). > > I get the feeling there is some command or switch I'm not aware of > that I should be using. This hyptherical command might work something > like this: > > ls -l | foo 3 > > to extract just the third column (username) from the ls(1) output. Thoughts: 0: I've always found awk and cut to be very convenient for these operations. For complex things, I recommend Perl. In particular, awk and Perl allow for pattern separators, as you desire. 1: You might find awk's -F option to be useful. 2: Something like this is always fun: perl -F: -ane 'print join " ", @F[0,5,6]' /etc/passwd 3: If you really want foo, how about this: foo() { if [ $# -eq 0 ] ; then foo 0 else awk "{ print `echo "[EMAIL PROTECTED]" | sed 's/\([0-9]*\)/\\$\1/g'` }" fi } I leave it to the reader to improve this if so desired. (-: Hope this helps, --kevin -- (There are also also 228 babies named Unique during the 1990s alone, and 1 each of Uneekm, Uneque, and Uneqqee.) -- _Freakonomics_, Steven D. Levitt and Stephen J. Dubner ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: Homework problems (was: extract string)
I agree.. Does it matter. We're here to help, discuss, and answer questions. If you feel that your answer will cause more problems in the long run, then don't answer. Not having a degree, some of my best information has come from places like this and other sources on the internet. In fact, during my interview for my current job I stated that knowing who to ask and where to get information is far more important then knowing the information. It's impossible to know everything. If we give this person the wrong answer, and they fail, well.. that's what you get for trusting the list. If they get it right, good for them, they still got the job done. Answer C: Who cares? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ben Scott Sent: Tuesday, January 10, 2006 4:01 PM To: gnhlug-discuss@mail.gnhlug.org Subject: Homework problems (was: extract string) Assume it is a homework problem. Does it make a real difference whether the student learns the material from the text book, this list, or some random web page found via Google? ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
-- Original message -- From: Zhao Peng <[EMAIL PROTECTED]> > Kenny, > > Thank you for your suggestion. > > The following line works: > grep univ abc.txt | cut -f3 -d, >> dev.txt. > > > While the following line intended to remove quotes does NOT work: > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt > It resulted in a line starts with ">" prompt, and not output dev.txt > > Could you please double-check or modify it? I have checked it, and it works exactly as it should. > Also, if one column is missing, and "," is used to indicate that missing > column, like the following (2nd column of 3rd line is missing): > > "name","age","school" > "jerry" ,"21","univ of Vermont" > "jesse",,,"Dartmouth college" > "jack","18","univ of Penn" > "john","20","univ of south Florida" > > Does the "cut" approach still apply? If not, what command would you > suggest to address this missing issue? > A column is not missing, it is just empty. It is still delimited by a ",", so it is still a valid column. However, in the example above, there is an extra column in the 3rd line. All of the other lines have "name,age,school". Line 3 has "name,empty,empty,school". Now, if you know that the school is always going to be the last field, you may not want to use cut at all. You might want to use something like : grep univ abc.txt | awk -F, '{print $NF}'| sed 's/\"//g' awk takes the place of `cut` in this case by looking at the last field delimited by a ",". FYI, Kenny ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/10/06, Jon maddog Hall <[EMAIL PROTECTED]> wrote: > I was the senior systems administrator for Bell Labs in North Andover, MA. I > got the job without ever having seen a UNIX system. Well, really. How many people *had* seen a UNIX system, back then? ;-) (Sorry, couldn't resist.) > It was that experience that led me to page through section (1) of the manual > every six months, just to remind myself of the gold that was hidden in those > pages. I learned the plurality of what I know about the shell by reading the bash(1) man "page" (and most of that while sitting at VT-320's in UNH computer clusters, no less). Never underestimate the power of RTFM. :-) -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
[EMAIL PROTECTED] said: >> While the following line intended to remove quotes does NOT work: >> grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt >> It resulted in a line starts with ">" prompt, and not output dev.txt > I can't see any reason why what state should be happening. As a matter of > fact, I tried that exact command line on my system and it worked exactly as > (specified|advertised|expected). Might this not be affected by different command interpreters? sh vs csh vs ksh vs bash? >Simplest by what measurement? > - fewest processes spawned > - most efficient > - least amount of typing > - easiest to remember > - easiest to understand > - ability to debug > - extensibility Most portable? [EMAIL PROTECTED] said: > While it does seem like a few man page pointers would be better (more > instructive in the long run), I have to admit I wasn't familiar with cut, so > I've learned something from this one. I still remember the time that I first was learning UNIX (all capital letters)... I was the senior systems administrator for Bell Labs in North Andover, MA. I got the job without ever having seen a UNIX system. Of course I had programmed on dozens of different OS systemsbut there I was, late at night, trying to solve much the same type of problem that was solved here. After thinking about it, and wondering if I would have to write a program to do it, I thought to myself..."I do not KNOW that UNIX has a command that could do this, but I am willing to BET it does." And I started paging through section 1 of the manual.the shell commands. Sure enough, I came to "cut(1)" and it was exactly what I needed. (Later on I was glad the command was not at the back of the section, being named something like "Yet Another Cut Command".hmmmmaybe in a way it was) :-} It was that experience that led me to page through section (1) of the manual every six months, just to remind myself of the gold that was hidden in those pages. Warmest regards, maddog -- Jon "maddog" Hall Executive Director Linux International(R) email: [EMAIL PROTECTED] 80 Amherst St. Voice: +1.603.672.4557 Amherst, N.H. 03031-3032 U.S.A. WWW: http://www.li.org Board Member: Uniforum Association, USENIX Association (R)Linux is a registered trademark of Linus Torvalds in several countries. (R)Linux International is a registered trademark in the USA used pursuant to a license from Linux Mark Institute, authorized licensor of Linus Torvalds, owner of the Linux trademark on a worldwide basis (R)UNIX is a registered trademark of The Open Group in the USA and other countries. ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
RE: Homework problems (was: extract string)
Answer C: Who cares? > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Ben Scott > Sent: Tuesday, January 10, 2006 4:01 PM > To: gnhlug-discuss@mail.gnhlug.org > Subject: Homework problems (was: extract string) > > Assume it is a homework problem. Does it make a real > difference whether the student learns the material from the > text book, this list, or some random web page found via Google? > > ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/10/06, Drew Van Zandt <[EMAIL PROTECTED]> wrote: > While it does seem like a few man page pointers would be better (more > instructive in the long run), I have to admit I wasn't familiar with cut, so > I've learned something from this one. Since we're on the subject... Is there a tool that quickly and easily extracts one or more columns of text (separated by whitespace) from an output stream? I'm familiar with the awk '{ print $3 }' mechanism, but I've always felt that was clumsy. I've tried to get cut(1) to do it in the past, but the field separator semantics appear to assume one and only one separator, not "whitespace" (one or more space or tab characters). I get the feeling there is some command or switch I'm not aware of that I should be using. This hyptherical command might work something like this: ls -l | foo 3 to extract just the third column (username) from the ls(1) output. -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
-- Original message -- From: Paul Lussier <[EMAIL PROTECTED]> > [EMAIL PROTECTED] writes: > > > Actually, if you are looking for only lines that contain the string "univ", > then you would want to grep for it: > > > > grep univ abc.txt | cut -f3 -d, >> dev.txt. > > Why are you appending to dev.txt? (or def.txt even). Are you assuming > the file already exists and don't want to over-write the contents? > That is exactly what I was thinking. Even if it isn't being appended to, The result is essentially the same. Unless, of course, you want to over-write the file. Then that would work out to well. It's better to be safe then sorry :-) Besides, I've been doing a log of this sort of thing in the last few days, and the >> just sort of rolled off my fingertips. > > Paul's example would give you the third field of each line, even if > > they don't have "univ" in them. Now, if you wanted to remove the > > quotes, then you would need something like: > > > > > > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt > > yep, that should work, but no need for the >> when a simple > will do. What? Two redirects are better then one, right :-) C-Ya, Kenny ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/10/06, Paul Lussier <[EMAIL PROTECTED]> wrote: > > perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' < > > abc.txt > def.txt > > Egads! Egads? -- Ben "As I was saying about explanation..." Scott ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Homework problems (was: extract string)
On 1/10/06, Jeff Kinz <[EMAIL PROTECTED]> wrote: > Now your Lug can achieve its financial funding goals simply by charging > 25 cents for each shell scripting homework problem answered and 50 cents > for extended explanations such as rendered below. :-) I was wondering if I should raise the "Ya know, this looks an awful lot like a homework problem to me..." question. But I also considered the following: Assume it is a homework problem. Does it make a real difference whether the student learns the material from the text book, this list, or some random web page found via Google? That assumes the student learns the material of course. But students copy answers from each other, text books, and other materials all the time. If someone is so dumb as to copy material without learning it, well, life will eventually teach them the folly of that, too. Life is a persistent teacher. There's also the fact that several answers have been posted, each with varying degrees of applicability to the questions posed. The student would have to sort through those answers and choose the one they thought best suited to the task at hand. That's a useful skill, too. And if the student hands in a Perl one-liner in a basic class on shell scripting, the resulting student/instructor discuss will doubtless by very educational. This is not to say I'm going to go down a list of home work questions and provide concise, unexplained answers. I generally try to avoid that regardless of the audience. I find the explanation and reasoning behind the solution to a problem is generally far more educational then the solution alone. -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
While it does seem like a few man page pointers would be better (more instructive in the long run), I have to admit I wasn't familiar with cut, so I've learned something from this one. --Drew
Re: extract string
Zhao Peng <[EMAIL PROTECTED]> writes: > While the following line intended to remove quotes does NOT work: > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt > It resulted in a line starts with ">" prompt, and not output dev.txt I can't see any reason why what state should be happening. As a matter of fact, I tried that exact command line on my system and it worked exactly as (specified|advertised|expected). > Could you please double-check or modify it? Well, I think you've received lots of good help. Perhaps you should spend some time reading the relevant man pages and trying to understand exactly what has been offered so you double-check and/or modify it ? > Also, if one column is missing, and "," is used to indicate that > missing column, like the following (2nd column of 3rd line is > missing): [...] > Does the "cut" approach still apply? If not, what command would you > suggest to address this missing issue? man cut will answer this question. -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
> Ooo, look! - a new business model for Lugs! I happen to like these threads and far from regarding them as a burden I think they're a pleasant diversion and extremely useful as learning opportunities. But I've been asking for a long time when our IPO will be happening; we've got more talent and longevity than most of the scams^H^H^H^H^Hventures that got millions during the dotcom era, and when the juices are really flowing most VC firms appear(ed) to regard those pesky business plans as optional, anyway...;-> ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
[EMAIL PROTECTED] writes: > Actually, if you are looking for only lines that contain the string "univ", > then you would want to grep for it: > > grep univ abc.txt | cut -f3 -d, >> dev.txt. Why are you appending to dev.txt? (or def.txt even). Are you assuming the file already exists and don't want to over-write the contents? > Paul's example would give you the third field of each line, even if > they don't have "univ" in them. Now, if you wanted to remove the > quotes, then you would need something like: > > > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt yep, that should work, but no need for the >> when a simple > will do. -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Ben Scott <[EMAIL PROTECTED]> writes: > Here's one way, as a Perl one-liner: > > perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' < > abc.txt > def.txt Egads! -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Zhao Peng <[EMAIL PROTECTED]> writes: > Hi > > Suppose that I have a file called abc.txt, which contains the > following 5 lines (columns are delimited by ",") > > "name","age","school" > "jerry" ,"21","univ of Vermont" > "jesse","28","Dartmouth college" > "jack","18","univ of Penn" > "john","20","univ of south Florida" > > My OS is RedHat Enterprise, how could I extract the string which > contains "univ" and create an output file called def.txt, which only > has 3 following lines: > > univ of Vermont > univ of Penn > univ of south Florida > Here are 3, pick your poison: awk -F, '/univ/ && gsub(/\"/,"") {print $3}' abc.txt > def.txt perl -F, -ane 'if (/univ/) { $F[2] =~ s/\"//g; print $F[2]};' abc.txt \ > def.txt grep univ abc.txt | cut -f3 -d, | sed 's/\"//g' > def.txt > Please suggest the simplest command line approach. Simplest by what measurement? - fewest processes spawned - most efficient - least amount of typing - easiest to remember - easiest to understand - ability to debug - extensibility Simplest is a rather subjective approach... -- Seeya, Paul ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Ooo, look! - a new business model for Lugs! Achieve Lug financial independence today! Now your Lug can achieve its financial funding goals simply by charging 25 cents for each shell scripting homework problem answered and 50 cents for extended explanations such as rendered below. :-) All we need now is a PayPal account. :-) (rendered tongue at least halfway in cheek, all proceeds to go to GNHLUGS tab at Martha's) On Tue, Jan 10, 2006 at 01:23:14PM -0500, Ben Scott wrote: > On 1/10/06, Zhao Peng <[EMAIL PROTECTED]> wrote: > > While the following line intended to remove quotes does NOT work: > > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt > > It resulted in a line starts with ">" prompt, and not output dev.txt > > The ">" prompt indicates the shell thinks you are still in the > middle of some shell construct, and is prompting you to finish it. It > usually manifests due to an unclosed quote. Most likely, something is > eating the backslash that appears before the double-quote in the sed > command. It should be > > sed s/\"//g > > where the second word contains the characters letter s, a forward > slash (/), a backslash (\), a double-quote, two forward slashes (//), > and the letter g. The backslash tells the shell that the following > character (in this case, a quote) is not to be interpreted as shell > syntax, but instead passed to the specified command "as is". This is > called an "escape character" or a "shell escape". > > If you're putting this shell command inside some other program or > shell, you may find *that* program also interprets the backslash this > way. So you need to escape it *twice*: > > sed s/\\"//g > > The characters \\ get interpreted by the first program as "literal > backslash here". The shell then receives a single backslash, which it > applies to the double-quote. > > Shell escapes can get very, very messy. > > -- Ben > ___ > gnhlug-discuss mailing list > gnhlug-discuss@mail.gnhlug.org > http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss > -- Jeff Kinz, Emergent Research, Hudson, MA. speech recognition software may have been used to create this e-mail "The greatest dangers to liberty lurk in insidious encroachment by men of zeal, well-meaning but without understanding." - Brandeis To think contrary to one's era is heroism. But to speak against it is madness. -- Eugene Ionesco ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/10/06, Zhao Peng <[EMAIL PROTECTED]> wrote: > While the following line intended to remove quotes does NOT work: > grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt > It resulted in a line starts with ">" prompt, and not output dev.txt The ">" prompt indicates the shell thinks you are still in the middle of some shell construct, and is prompting you to finish it. It usually manifests due to an unclosed quote. Most likely, something is eating the backslash that appears before the double-quote in the sed command. It should be sed s/\"//g where the second word contains the characters letter s, a forward slash (/), a backslash (\), a double-quote, two forward slashes (//), and the letter g. The backslash tells the shell that the following character (in this case, a quote) is not to be interpreted as shell syntax, but instead passed to the specified command "as is". This is called an "escape character" or a "shell escape". If you're putting this shell command inside some other program or shell, you may find *that* program also interprets the backslash this way. So you need to escape it *twice*: sed s/\\"//g The characters \\ get interpreted by the first program as "literal backslash here". The shell then receives a single backslash, which it applies to the double-quote. Shell escapes can get very, very messy. -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/10/06, Whelan, Paul <[EMAIL PROTECTED]> wrote: > Like so: cat abc.txt | cut -d, -f3 1. Randal Schwartz likes to call that UUOC (Useless Use Of cat). :-) You can just do this instead: cut -d, -f3 < abc.txt If you like the input file at the start of the command line, that's legal, too: < abc.txt cut -d, -f3 You can read more about UUOC at: http://sial.org/code/shell/tips/useless-cat/ 2. The above simply returns the third field. OP appeared to want only lines containing "univ". So: cut -d, -f3 < abc.txt | grep univ 3. I'll leave the quote removal as an exercise to the reader. ;-) -- Ben "Pedantic" Scott ;-) ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
On 1/10/06, Zhao Peng <[EMAIL PROTECTED]> wrote: > how could I extract the string which > contains "univ" and create an output file called def.txt, which only has > 3 following lines: Here's one way, as a Perl one-liner: perl -ne 'split ","; $_ = $_[2]; s/(^")|("$)//g; print if m/univ/;' < abc.txt > def.txt That trims out the quotes, as it appears you want. The search for "univ" is case-sensitive. Broken down into a script with comments: #!/usr/bin/perl -n split ","; # split input fields into @_ (split at commas) $_ = $_[2];# grab the third field, put into default workspace ($_) s/(^")|("$)//g;# delete double-quote (") at start and/or end print if m/univ/; # print if contains "univ" HTH, -- Ben ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
Re: extract string
Kenny, Thank you for your suggestion. The following line works: grep univ abc.txt | cut -f3 -d, >> dev.txt. While the following line intended to remove quotes does NOT work: grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt It resulted in a line starts with ">" prompt, and not output dev.txt Could you please double-check or modify it? Also, if one column is missing, and "," is used to indicate that missing column, like the following (2nd column of 3rd line is missing): "name","age","school" "jerry" ,"21","univ of Vermont" "jesse",,,"Dartmouth college" "jack","18","univ of Penn" "john","20","univ of south Florida" Does the "cut" approach still apply? If not, what command would you suggest to address this missing issue? Thank you again. Zhao [EMAIL PROTECTED] wrote: Actually, if you are looking for only lines that contain the string "univ", then you would want to grep for it: grep univ abc.txt | cut -f3 -d, >> dev.txt. Paul's example would give you the third field of each line, even if they don't have "univ" in them. Now, if you wanted to remove the quotes, then you would need something like: grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt FYI, Kenny -- Original message -- From: "Whelan, Paul" <[EMAIL PROTECTED]> Like so: cat abc.txt | cut -d, -f3 Thanks. -Original Message- From: Zhao Peng [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 10, 2006 11:51 AM To: gnhlug-discuss@mail.gnhlug.org Subject: extract string Hi Suppose that I have a file called abc.txt, which contains the following 5 lines (columns are delimited by ",") "name","age","school" "jerry" ,"21","univ of Vermont" "jesse","28","Dartmouth college" "jack","18","univ of Penn" "john","20","univ of south Florida" My OS is RedHat Enterprise, how could I extract the string which contains "univ" and create an output file called def.txt, which only has 3 following lines: univ of Vermont univ of Penn univ of south Florida Please suggest the simplest command line approach. Thank you. Zhao ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
RE: extract string
Actually, if you are looking for only lines that contain the string "univ", then you would want to grep for it: grep univ abc.txt | cut -f3 -d, >> dev.txt. Paul's example would give you the third field of each line, even if they don't have "univ" in them. Now, if you wanted to remove the quotes, then you would need something like: grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt FYI, Kenny -- Original message -- From: "Whelan, Paul" <[EMAIL PROTECTED]> > Like so: cat abc.txt | cut -d, -f3 > > Thanks. > > -Original Message- > From: Zhao Peng [mailto:[EMAIL PROTECTED] > Sent: Tuesday, January 10, 2006 11:51 AM > To: gnhlug-discuss@mail.gnhlug.org > Subject: extract string > > Hi > > Suppose that I have a file called abc.txt, which contains the following > 5 lines (columns are delimited by ",") > > "name","age","school" > "jerry" ,"21","univ of Vermont" > "jesse","28","Dartmouth college" > "jack","18","univ of Penn" > "john","20","univ of south Florida" > > My OS is RedHat Enterprise, how could I extract the string which > contains "univ" and create an output file called def.txt, which only has > > 3 following lines: > > univ of Vermont > univ of Penn > univ of south Florida > > Please suggest the simplest command line approach. > > Thank you. > Zhao > ___ > gnhlug-discuss mailing list > gnhlug-discuss@mail.gnhlug.org > http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss > > ___ > gnhlug-discuss mailing list > gnhlug-discuss@mail.gnhlug.org > http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
RE: extract string
Like so: cat abc.txt | cut -d, -f3 Thanks. -Original Message- From: Zhao Peng [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 10, 2006 11:51 AM To: gnhlug-discuss@mail.gnhlug.org Subject: extract string Hi Suppose that I have a file called abc.txt, which contains the following 5 lines (columns are delimited by ",") "name","age","school" "jerry" ,"21","univ of Vermont" "jesse","28","Dartmouth college" "jack","18","univ of Penn" "john","20","univ of south Florida" My OS is RedHat Enterprise, how could I extract the string which contains "univ" and create an output file called def.txt, which only has 3 following lines: univ of Vermont univ of Penn univ of south Florida Please suggest the simplest command line approach. Thank you. Zhao ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss
extract string
Hi Suppose that I have a file called abc.txt, which contains the following 5 lines (columns are delimited by ",") "name","age","school" "jerry" ,"21","univ of Vermont" "jesse","28","Dartmouth college" "jack","18","univ of Penn" "john","20","univ of south Florida" My OS is RedHat Enterprise, how could I extract the string which contains "univ" and create an output file called def.txt, which only has 3 following lines: univ of Vermont univ of Penn univ of south Florida Please suggest the simplest command line approach. Thank you. Zhao ___ gnhlug-discuss mailing list gnhlug-discuss@mail.gnhlug.org http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss