On Tue, Feb 24, 2009 at 2:56 AM, Etaoin Shrdlu <shr...@unlimitedmail.org> wrote: <SNIP> > > So, in my understanding this is what we want to accomplish so far: > > given an input of the form > > D1,T1,a1,b1,c1,d1,...,R1 > D2,T2,a2,b2,c2,d2,...,R2 > D3,T3,a3,b3,c3,d3,...,R3 > D4,T4,a4,b4,c4,d4,...,R4 > D5,T5,a5,b5,c5,d5,...,R5 > > (the ... mean that an arbitrary number of columns can follow) > > You want to group lines by n at a time, keeping the D and T column from > the first line of each group, and keeping the R column from the last > line of the group, so for example with n=3 we would have: > > D1,T1,a1,b1,c1,d1,...a2,b2,c2,d2,...a3,b3,c3,d3,...R3 > D1,T1,a2,b2,c2,d2,...a3,b3,c3,d3,...a4,b4,c4,d4,...R4 > D1,T1,a3,b3,c3,d3,...a4,b4,c4,d4,...a5,b5,c5,d5,...R5 > > (and you're right, that produces an output that is roughly n times the > size of the original file) > > Now, in addition to that, you also want to drop an arbitrary number of > columns in the a,b,c... group. So for example, you want to drop columns > 2 and 3 (b and c in the example), so you'd end up with something like > > D1,T1,a1,d1,...a2,d2,...a3,d3,...R3 > D1,T1,a2,d2,...a3,d3,...a4,d4,...R4 > D1,T1,a3,d3,...a4,d4,...a5,d5,...R5 > > Please confirm that my understanding is correct, so I can come up with > some code to do that.
Perfectly correct for all the data rows. For the header I now see that we have a slightly harder job. What we'd need to do is read the first line of the file, duplicate it N times, and then drop the same columns as we drop in the rows. The problem is that now I have the same header value for N columns which won't make sense to the tool that uses this data. If we could read the header and then automatically postpend the number N to each duplicated name. (or some string like _N) Maybe better would be a separate small program to do the header part and then this program could read that header and make it the first line of the output file. My worry is that when this data file becomes very large - say 1GB or more of data - I probably cannot open the file with vi to edit the header. Better if I could put the header in it's own file. That file would be 1 line long. I could check it for the name edits, make sure it's right, and then the program you are so kindly building would just read it, cut out columns, and put it at the start of the new large file. Does that make sense? > >> I found a web site to study awk so I'm starting to see more or less >> how your example works when I have the code in front of me. Creating >> the code out of thin air might be a bit of a stretch for me at this >> point though. > > I suggest you start from > > http://www.gnu.org/software/gawk/manual/gawk.html > > really complete, but gradual so you can have an easy start and move on to > the complexities later. > > Yes, very complete. A good reference. Thanks! Cheers, Mark