Hi Josh, Sorry for the insufficient introduction. This might work, but I'm not sure. The file that I have includes up to 100 documents (Document 1, Document 2, Document 3....Document 100) with the newspaper name following 4 lines below each Document number. I'm using readlines to get the text file into R and then trying to use grep to get the newspaper name for each record. But your idea of indexing the text object read into R with the line number where the newspaper name is found is a good one. I'll just have to come up with a loop to tell R to get the 4th, 8th, 12, 16th, line, etc. I'll see if I can get that to work. Simon On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
> Dear Simon, > > Maybe I don't understand properly....if you are doing this in R, can't > you just pick the line you want? > > Josh > > ## print your data to clipboard > cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file = > "clipboard") > ## read data in, and only select the 4th line to pass to grep() > grep("pattern", x = readLines("clipboard")[4]) > > > On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com> wrote: >> Dear colleagues, >> I have a series of newspaper articles in a text file, downloaded from a text >> file. They look as follows: >> >> Document 1 of 100 >> \n >> \n >> \n >> Newspaper Name >> \n >> \n >> Day Date >> >> I have a series of grep scripts that can extract the date and convert it to >> a date object, but I can't figure out how to grep the newspaper name. There >> is no field ID attached to those lines. The best I can come up with would be >> to have the program grep the four lines following matching the pattern >> "Document [0-9]". There is an an argument to grep in unix that can do this >> ...grep -A4 'pattern' infile>outfile, but I don't know if there is an >> equivalent argument in R. >> >> Any thoughts. >> Yours, Simon Kiss >> ********************************* >> Simon J. Kiss, PhD >> Assistant Professor, Wilfrid Laurier University >> 73 George Street >> Brantford, Ontario, Canada >> N3T 2C9 >> Cell: +1 905 746 7606 >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > https://joshuawiley.com/ ********************************* Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 905 746 7606 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.