Try this (untested as I'm on my iPhone now): index <- grep("Document+.", yourfile, value = FALSE) index <- c(index + 2, index + 4)
You just need to make sure you avoid recycling, e.g., 1:10 + c(2, 4) # not what you want If you want a sufficient number of lines that manually writing index + becomes cumbersome, you could use something like: as.vector(sapply(c(2, 4), "+", e2 = index)) HTH, Josh On Jul 11, 2011, at 11:09, Simon Kiss <sjk...@gmail.com> wrote: > Josh, that's amazing. Is there any way to have it grab two different lines > after the grep, say the second and the fourth line? There's some other > information in the text file I'd like to grab. I could do two separate > commands, but I'd like to know if this could be done in one command... > Simon Kiss > On 2011-07-11, at 1:31 PM, Joshua Wiley wrote: > >> If you know you can find the start of the document (say that line >> always starts with Document...), then: >> >> grep("Document+.", yourfile, value = FALSE) + 4 >> >> should give you 4 lines after each line where Document occurred. No >> loop needed :) >> >> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjk...@gmail.com> wrote: >>> Hi Josh, >>> Sorry for the insufficient introduction. This might work, but I'm not sure. >>> The file that I have includes up to 100 documents (Document 1, Document 2, >>> Document 3....Document 100) with the newspaper name following 4 lines below >>> each Document number. >>> I'm using readlines to get the text file into R and then trying to use grep >>> to get the newspaper name for each record. But your idea of indexing the >>> text object read into R with the line number where the newspaper name is >>> found is a good one. I'll just have to come up with a loop to tell R to >>> get the 4th, 8th, 12, 16th, line, etc. >>> I'll see if I can get that to work. >>> Simon >>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote: >>> >>>> Dear Simon, >>>> >>>> Maybe I don't understand properly....if you are doing this in R, can't >>>> you just pick the line you want? >>>> >>>> Josh >>>> >>>> ## print your data to clipboard >>>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file = >>>> "clipboard") >>>> ## read data in, and only select the 4th line to pass to grep() >>>> grep("pattern", x = readLines("clipboard")[4]) >>>> >>>> >>>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com> wrote: >>>>> Dear colleagues, >>>>> I have a series of newspaper articles in a text file, downloaded from a >>>>> text file. They look as follows: >>>>> >>>>> Document 1 of 100 >>>>> \n >>>>> \n >>>>> \n >>>>> Newspaper Name >>>>> \n >>>>> \n >>>>> Day Date >>>>> >>>>> I have a series of grep scripts that can extract the date and convert it >>>>> to a date object, but I can't figure out how to grep the newspaper name. >>>>> There is no field ID attached to those lines. The best I can come up with >>>>> would be to have the program grep the four lines following matching the >>>>> pattern "Document [0-9]". There is an an argument to grep in unix that >>>>> can do this ...grep -A4 'pattern' infile>outfile, but I don't know if >>>>> there is an equivalent argument in R. >>>>> >>>>> Any thoughts. >>>>> Yours, Simon Kiss >>>>> ********************************* >>>>> Simon J. Kiss, PhD >>>>> Assistant Professor, Wilfrid Laurier University >>>>> 73 George Street >>>>> Brantford, Ontario, Canada >>>>> N3T 2C9 >>>>> Cell: +1 905 746 7606 >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> Joshua Wiley >>>> Ph.D. Student, Health Psychology >>>> University of California, Los Angeles >>>> https://joshuawiley.com/ >>> >>> ********************************* >>> Simon J. Kiss, PhD >>> Assistant Professor, Wilfrid Laurier University >>> 73 George Street >>> Brantford, Ontario, Canada >>> N3T 2C9 >>> Cell: +1 905 746 7606 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> University of California, Los Angeles >> https://joshuawiley.com/ > > ********************************* > Simon J. Kiss, PhD > Assistant Professor, Wilfrid Laurier University > 73 George Street > Brantford, Ontario, Canada > N3T 2C9 > Cell: +1 905 746 7606 > > > > > > > > > > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.