On Sunday 22 February 2009, 20:06, Mark Knecht wrote: > Hi, > Very off topic other than I'd do this on my Gentoo box prior to > using R on my Gentoo box. Please ignore if not of interest. > > I've got a really big data file in essentially a *.csv format. > (comma delimited) I need to scan this file and create a new output > file. I'm wondering if there is a reasonably easy command line way of > doing this using something like sed or awk which I know nothing about. > Thanks in advance. > > The basic idea goes something like this: > > 1) The input file might look this the following where some of it is > attributes (shown as letters) and other parts are results. (shown as > numbers) > > A,B,C,D,1 > E,F,G,H,2 > I,J,K,L,3 > M,N,O,P,4 > Q,R,S,T,5 > U,V,W,X,6
Are the results always in the last field, and only a single field? Is the total number of fields per line always fixed? > 2) From the above data input file I want to take the attributes from a > few preceeding lines (say 3 in this example) and write them to the > output file along with the result on the last of the 3 lines. The > output file might look like this: > > A,B,C,D,E,F,G,H,I,J,K,L,3 > E,F,G,H,I,J,K,L,M,N,O,P,4 > I,J,K,L,M,N,O,P,Q,R,S,T,5 > M,N,O,P,Q,R,S,T,U,V,W,X,6 Is the number of lines you pick for the operation always 3 or can it vary? And, once you choose a number n of lines, should the whole file be processed concatenating n lines at a time, and the resulting single line be ended with the result of the nth line? in other words, does the following hold for the output format: <concatenation of attributes of lines 1..n> <result of line n> <concatenation of attributes of lines 2..n+1> <result of line n+1> <concatenation of attributes of lines 3..n+2> <result of line n+1> <concatenation of attributes of lines 4..n+3> <result of line n+1> ... With answers to the above questions, it's probably possible to hack together a solution.