Re: [R] grep lines before or after pattern matched?

Joshua Wiley Mon, 11 Jul 2011 11:20:35 -0700

Try this (untested as I'm on my iPhone now):

index <- grep("Document+.", yourfile, value = FALSE)
index <- c(index + 2, index + 4)


You just need to make sure you avoid recycling, e.g.,

1:10 + c(2, 4) # not what you want

If you want a sufficient number of lines that manually writing index + becomes 
cumbersome, you could use something like:

as.vector(sapply(c(2, 4), "+", e2 = index))

HTH,

Josh

On Jul 11, 2011, at 11:09, Simon Kiss <sjk...@gmail.com> wrote:

> Josh, that's amazing. Is there any way to have it grab two different lines 
> after the grep, say the second and the fourth line? There's some other 
> information in the text file I'd like to grab.  I could do two separate 
> commands, but I'd like to know if this could be done in one command...
> Simon Kiss
> On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
> 
>> If you know you can find the start of the document (say that line
>> always starts with Document...), then:
>> 
>> grep("Document+.", yourfile, value = FALSE) + 4
>> 
>> should give you 4 lines after each line where Document occurred.  No
>> loop needed :)
>> 
>> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjk...@gmail.com> wrote:
>>> Hi Josh,
>>> Sorry for the insufficient introduction. This might work, but I'm not sure.
>>> The file that I have includes up to 100 documents (Document 1, Document 2, 
>>> Document 3....Document 100) with the newspaper name following 4 lines below 
>>> each Document number.
>>> I'm using readlines to get the text file into R and then trying to use grep 
>>> to get the newspaper name for each record. But your idea of indexing the 
>>> text object read into R with the line number where the newspaper name is 
>>> found is a good one.  I'll just have to come up with a loop to tell R to 
>>> get the 4th, 8th, 12, 16th, line, etc.
>>> I'll see if I can get that to work.
>>> Simon
>>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>>> 
>>>> Dear Simon,
>>>> 
>>>> Maybe I don't understand properly....if you are doing this in R, can't
>>>> you just pick the line you want?
>>>> 
>>>> Josh
>>>> 
>>>> ## print your data to clipboard
>>>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>>>> "clipboard")
>>>> ## read data in, and only select the 4th line to pass to grep()
>>>> grep("pattern", x = readLines("clipboard")[4])
>>>> 
>>>> 
>>>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com> wrote:
>>>>> Dear colleagues,
>>>>> I have a series of newspaper articles in a text file, downloaded from a 
>>>>> text file.  They look as follows:
>>>>> 
>>>>> Document 1 of 100
>>>>> \n
>>>>> \n
>>>>> \n
>>>>> Newspaper Name
>>>>> \n
>>>>> \n
>>>>> Day Date
>>>>> 
>>>>> I have a series of grep scripts that can extract the date and convert it 
>>>>> to a date object, but I can't figure out how to grep the newspaper name.  
>>>>> There is no field ID attached to those lines. The best I can come up with 
>>>>> would be to have the program grep the four lines following matching the 
>>>>> pattern "Document [0-9]".  There is an an argument to grep in unix that 
>>>>> can do this ...grep -A4 'pattern' infile>outfile, but I don't know if 
>>>>> there is an equivalent argument in R.
>>>>> 
>>>>> Any thoughts.
>>>>> Yours, Simon Kiss
>>>>> *********************************
>>>>> Simon J. Kiss, PhD
>>>>> Assistant Professor, Wilfrid Laurier University
>>>>> 73 George Street
>>>>> Brantford, Ontario, Canada
>>>>> N3T 2C9
>>>>> Cell: +1 905 746 7606
>>>>> 
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Joshua Wiley
>>>> Ph.D. Student, Health Psychology
>>>> University of California, Los Angeles
>>>> https://joshuawiley.com/
>>> 
>>> *********************************
>>> Simon J. Kiss, PhD
>>> Assistant Professor, Wilfrid Laurier University
>>> 73 George Street
>>> Brantford, Ontario, Canada
>>> N3T 2C9
>>> Cell: +1 905 746 7606
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> https://joshuawiley.com/
> 
> *********************************
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 905 746 7606
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

Reply via email to