Re: [R] grep lines before or after pattern matched?

David Winsemius Mon, 11 Jul 2011 12:56:03 -0700


On Jul 11, 2011, at 3:33 PM, Joshua Wiley wrote:

On Jul 11, 2011, at 12:00, Bert Gunter <gunter.ber...@gene.com> wrote:
Simon:
Basic basic stuff (not grep -- the stuff thereafter) . Please readthe
docs, especially the tutorial,  An Intro to R.

... and Josh's solution can be shortened to (as he knows):

index <- grep("Document+.", yourfile, value = FALSE) + c(2,4)
Really? Won't the 2 and 4 get recycled so that every other elementreturned from grep will have 2 or 4 added instead of 2 *and* 4?
My understanding is that Simon has a single file with for exampleDocument 1 on line 1 Document 2 on line 301 etc. And he wants boththe 2nd and 4th lines after each document, so lines 3, 5, 303, 305but just doing + c(2,4) would only give 3, 305.


So:

rep(index, each=2) + c(2,4)

--
David.

Josh
-- Bert
On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley <jwiley.ps...@gmail.com> wrote:
Try this (untested as I'm on my iPhone now):

index <- grep("Document+.", yourfile, value = FALSE)
index <- c(index + 2, index + 4)

You just need to make sure you avoid recycling, e.g.,

1:10 + c(2, 4) # not what you want
If you want a sufficient number of lines that manually writingindex + becomes cumbersome, you could use something like:
as.vector(sapply(c(2, 4), "+", e2 = index))

HTH,

Josh

On Jul 11, 2011, at 11:09, Simon Kiss <sjk...@gmail.com> wrote:
Josh, that's amazing. Is there any way to have it grab twodifferent lines after the grep, say the second and the fourthline? There's some other information in the text file I'd like tograb. I could do two separate commands, but I'd like to know ifthis could be done in one command...
Simon Kiss
On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
If you know you can find the start of the document (say that line
always starts with Document...), then:

grep("Document+.", yourfile, value = FALSE) + 4
should give you 4 lines after each line where Documentoccurred. No
loop needed :)
On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjk...@gmail.com>wrote:
Hi Josh,
Sorry for the insufficient introduction. This might work, butI'm not sure.The file that I have includes up to 100 documents (Document 1,Document 2, Document 3....Document 100) with the newspaper namefollowing 4 lines below each Document number.I'm using readlines to get the text file into R and then tryingto use grep to get the newspaper name for each record. But youridea of indexing the text object read into R with the linenumber where the newspaper name is found is a good one. I'lljust have to come up with a loop to tell R to get the 4th, 8th,12, 16th, line, etc.
I'll see if I can get that to work.
Simon
On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
Dear Simon,
Maybe I don't understand properly....if you are doing this inR, can't
you just pick the line you want?

Josh

## print your data to clipboard
cat("Document 1 of 100 \n \n \n Newspaper Name \n \n DayDate", file =
"clipboard")
## read data in, and only select the 4th line to pass to grep()
grep("pattern", x = readLines("clipboard")[4])
On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com>wrote:
Dear colleagues,
I have a series of newspaper articles in a text file,downloaded from a text file. They look as follows:
Document 1 of 100
\n
\n
\n
Newspaper Name
\n
\n
Day Date
I have a series of grep scripts that can extract the date andconvert it to a date object, but I can't figure out how togrep the newspaper name. There is no field ID attached tothose lines. The best I can come up with would be to have theprogram grep the four lines following matching the pattern"Document [0-9]". There is an an argument to grep in unixthat can do this ...grep -A4 'pattern' infile>outfile, but Idon't know if there is an equivalent argument in R.



David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] grep lines before or after pattern matched?

Reply via email to