On Jul 11, 2011, at 3:33 PM, Joshua Wiley wrote:
On Jul 11, 2011, at 12:00, Bert Gunter <gunter.ber...@gene.com> wrote:
Simon:
Basic basic stuff (not grep -- the stuff thereafter) . Please read
the
docs, especially the tutorial, An Intro to R.
... and Josh's solution can be shortened to (as he knows):
index <- grep("Document+.", yourfile, value = FALSE) + c(2,4)
Really? Won't the 2 and 4 get recycled so that every other element
returned from grep will have 2 or 4 added instead of 2 *and* 4?
My understanding is that Simon has a single file with for example
Document 1 on line 1 Document 2 on line 301 etc. And he wants both
the 2nd and 4th lines after each document, so lines 3, 5, 303, 305
but just doing + c(2,4) would only give 3, 305.
So:
rep(index, each=2) + c(2,4)
--
David.
Josh
-- Bert
On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley <jwiley.ps...@gmail.com
> wrote:
Try this (untested as I'm on my iPhone now):
index <- grep("Document+.", yourfile, value = FALSE)
index <- c(index + 2, index + 4)
You just need to make sure you avoid recycling, e.g.,
1:10 + c(2, 4) # not what you want
If you want a sufficient number of lines that manually writing
index + becomes cumbersome, you could use something like:
as.vector(sapply(c(2, 4), "+", e2 = index))
HTH,
Josh
On Jul 11, 2011, at 11:09, Simon Kiss <sjk...@gmail.com> wrote:
Josh, that's amazing. Is there any way to have it grab two
different lines after the grep, say the second and the fourth
line? There's some other information in the text file I'd like to
grab. I could do two separate commands, but I'd like to know if
this could be done in one command...
Simon Kiss
On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
If you know you can find the start of the document (say that line
always starts with Document...), then:
grep("Document+.", yourfile, value = FALSE) + 4
should give you 4 lines after each line where Document
occurred. No
loop needed :)
On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss <sjk...@gmail.com>
wrote:
Hi Josh,
Sorry for the insufficient introduction. This might work, but
I'm not sure.
The file that I have includes up to 100 documents (Document 1,
Document 2, Document 3....Document 100) with the newspaper name
following 4 lines below each Document number.
I'm using readlines to get the text file into R and then trying
to use grep to get the newspaper name for each record. But your
idea of indexing the text object read into R with the line
number where the newspaper name is found is a good one. I'll
just have to come up with a loop to tell R to get the 4th, 8th,
12, 16th, line, etc.
I'll see if I can get that to work.
Simon
On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
Dear Simon,
Maybe I don't understand properly....if you are doing this in
R, can't
you just pick the line you want?
Josh
## print your data to clipboard
cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day
Date", file =
"clipboard")
## read data in, and only select the 4th line to pass to grep()
grep("pattern", x = readLines("clipboard")[4])
On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss <sjk...@gmail.com>
wrote:
Dear colleagues,
I have a series of newspaper articles in a text file,
downloaded from a text file. They look as follows:
Document 1 of 100
\n
\n
\n
Newspaper Name
\n
\n
Day Date
I have a series of grep scripts that can extract the date and
convert it to a date object, but I can't figure out how to
grep the newspaper name. There is no field ID attached to
those lines. The best I can come up with would be to have the
program grep the four lines following matching the pattern
"Document [0-9]". There is an an argument to grep in unix
that can do this ...grep -A4 'pattern' infile>outfile, but I
don't know if there is an equivalent argument in R.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.