Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Bert Gunter
Josh:

(assuming you have interpreted correctly) You are *absolutely* right
-- I did not read carefully enough.

Does

 index <-  matrix(rep(grep("Document+.", yourfile, value = FALSE),e=3)
+ c(0,2,4),nc=3,byr=TRUE)

do it for you?

Sheepishly,

Bert




On Mon, Jul 11, 2011 at 12:33 PM, Joshua Wiley  wrote:
> On Jul 11, 2011, at 12:00, Bert Gunter  wrote:
>
>> Simon:
>>
>> Basic basic stuff (not grep -- the stuff thereafter) . Please read the
>> docs, especially the tutorial,  An Intro to R.
>>
>> ... and Josh's solution can be shortened to (as he knows):
>>
>> index <- grep("Document+.", yourfile, value = FALSE) + c(2,4)
>>
>
> Really?  Won't the 2 and 4 get recycled so that every other element returned 
> from grep will have 2 or 4 added instead of 2 *and* 4?
>
> My understanding is that Simon has a single file with for example Document 1 
> on line 1 Document 2 on line 301 etc. And he wants both the 2nd and 4th lines 
> after each document, so lines 3, 5, 303, 305 but just doing + c(2,4) would 
> only give 3, 305.
>
> Josh
>
>> -- Bert
>>
>> On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley  
>> wrote:
>>> Try this (untested as I'm on my iPhone now):
>>>
>>> index <- grep("Document+.", yourfile, value = FALSE)
>>> index <- c(index + 2, index + 4)
>>>
>>> You just need to make sure you avoid recycling, e.g.,
>>>
>>> 1:10 + c(2, 4) # not what you want
>>>
>>> If you want a sufficient number of lines that manually writing index + 
>>> becomes cumbersome, you could use something like:
>>>
>>> as.vector(sapply(c(2, 4), "+", e2 = index))
>>>
>>> HTH,
>>>
>>> Josh
>>>
>>> On Jul 11, 2011, at 11:09, Simon Kiss  wrote:
>>>
 Josh, that's amazing. Is there any way to have it grab two different lines 
 after the grep, say the second and the fourth line? There's some other 
 information in the text file I'd like to grab.  I could do two separate 
 commands, but I'd like to know if this could be done in one command...
 Simon Kiss
 On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:

> If you know you can find the start of the document (say that line
> always starts with Document...), then:
>
> grep("Document+.", yourfile, value = FALSE) + 4
>
> should give you 4 lines after each line where Document occurred.  No
> loop needed :)
>
> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss  wrote:
>> Hi Josh,
>> Sorry for the insufficient introduction. This might work, but I'm not 
>> sure.
>> The file that I have includes up to 100 documents (Document 1, Document 
>> 2, Document 3Document 100) with the newspaper name following 4 lines 
>> below each Document number.
>> I'm using readlines to get the text file into R and then trying to use 
>> grep to get the newspaper name for each record. But your idea of 
>> indexing the text object read into R with the line number where the 
>> newspaper name is found is a good one.  I'll just have to come up with a 
>> loop to tell R to get the 4th, 8th, 12, 16th, line, etc.
>> I'll see if I can get that to work.
>> Simon
>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>>
>>> Dear Simon,
>>>
>>> Maybe I don't understand properlyif you are doing this in R, can't
>>> you just pick the line you want?
>>>
>>> Josh
>>>
>>> ## print your data to clipboard
>>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>>> "clipboard")
>>> ## read data in, and only select the 4th line to pass to grep()
>>> grep("pattern", x = readLines("clipboard")[4])
>>>
>>>
>>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from 
 a text file.  They look as follows:

 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date

 I have a series of grep scripts that can extract the date and convert 
 it to a date object, but I can't figure out how to grep the newspaper 
 name.  There is no field ID attached to those lines. The best I can 
 come up with would be to have the program grep the four lines 
 following matching the pattern "Document [0-9]".  There is an an 
 argument to grep in unix that can do this ...grep -A4 'pattern' 
 infile>outfile, but I don't know if there is an equivalent argument in 
 R.

 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread David Winsemius


On Jul 11, 2011, at 3:33 PM, Joshua Wiley wrote:


On Jul 11, 2011, at 12:00, Bert Gunter  wrote:


Simon:

Basic basic stuff (not grep -- the stuff thereafter) . Please read  
the

docs, especially the tutorial,  An Intro to R.

... and Josh's solution can be shortened to (as he knows):

index <- grep("Document+.", yourfile, value = FALSE) + c(2,4)



Really?  Won't the 2 and 4 get recycled so that every other element  
returned from grep will have 2 or 4 added instead of 2 *and* 4?


My understanding is that Simon has a single file with for example  
Document 1 on line 1 Document 2 on line 301 etc. And he wants both  
the 2nd and 4th lines after each document, so lines 3, 5, 303, 305  
but just doing + c(2,4) would only give 3, 305.


So:

rep(index, each=2) + c(2,4)

--
David.




Josh


-- Bert

On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley > wrote:

Try this (untested as I'm on my iPhone now):

index <- grep("Document+.", yourfile, value = FALSE)
index <- c(index + 2, index + 4)

You just need to make sure you avoid recycling, e.g.,

1:10 + c(2, 4) # not what you want

If you want a sufficient number of lines that manually writing  
index + becomes cumbersome, you could use something like:


as.vector(sapply(c(2, 4), "+", e2 = index))

HTH,

Josh

On Jul 11, 2011, at 11:09, Simon Kiss  wrote:

Josh, that's amazing. Is there any way to have it grab two  
different lines after the grep, say the second and the fourth  
line? There's some other information in the text file I'd like to  
grab.  I could do two separate commands, but I'd like to know if  
this could be done in one command...

Simon Kiss
On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:


If you know you can find the start of the document (say that line
always starts with Document...), then:

grep("Document+.", yourfile, value = FALSE) + 4

should give you 4 lines after each line where Document  
occurred.  No

loop needed :)

On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss   
wrote:

Hi Josh,
Sorry for the insufficient introduction. This might work, but  
I'm not sure.
The file that I have includes up to 100 documents (Document 1,  
Document 2, Document 3Document 100) with the newspaper name  
following 4 lines below each Document number.
I'm using readlines to get the text file into R and then trying  
to use grep to get the newspaper name for each record. But your  
idea of indexing the text object read into R with the line  
number where the newspaper name is found is a good one.  I'll  
just have to come up with a loop to tell R to get the 4th, 8th,  
12, 16th, line, etc.

I'll see if I can get that to work.
Simon
On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:


Dear Simon,

Maybe I don't understand properlyif you are doing this in  
R, can't

you just pick the line you want?

Josh

## print your data to clipboard
cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day  
Date", file =

"clipboard")
## read data in, and only select the 4th line to pass to grep()
grep("pattern", x = readLines("clipboard")[4])


On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss   
wrote:

Dear colleagues,
I have a series of newspaper articles in a text file,  
downloaded from a text file.  They look as follows:


Document 1 of 100
\n
\n
\n
Newspaper Name
\n
\n
Day Date

I have a series of grep scripts that can extract the date and  
convert it to a date object, but I can't figure out how to  
grep the newspaper name.  There is no field ID attached to  
those lines. The best I can come up with would be to have the  
program grep the four lines following matching the pattern  
"Document [0-9]".  There is an an argument to grep in unix  
that can do this ...grep -A4 'pattern' infile>outfile, but I  
don't know if there is an equivalent argument in R.







David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley
On Jul 11, 2011, at 12:00, Bert Gunter  wrote:

> Simon:
> 
> Basic basic stuff (not grep -- the stuff thereafter) . Please read the
> docs, especially the tutorial,  An Intro to R.
> 
> ... and Josh's solution can be shortened to (as he knows):
> 
> index <- grep("Document+.", yourfile, value = FALSE) + c(2,4)
> 

Really?  Won't the 2 and 4 get recycled so that every other element returned 
from grep will have 2 or 4 added instead of 2 *and* 4?

My understanding is that Simon has a single file with for example Document 1 on 
line 1 Document 2 on line 301 etc. And he wants both the 2nd and 4th lines 
after each document, so lines 3, 5, 303, 305 but just doing + c(2,4) would only 
give 3, 305.

Josh

> -- Bert
> 
> On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley  wrote:
>> Try this (untested as I'm on my iPhone now):
>> 
>> index <- grep("Document+.", yourfile, value = FALSE)
>> index <- c(index + 2, index + 4)
>> 
>> You just need to make sure you avoid recycling, e.g.,
>> 
>> 1:10 + c(2, 4) # not what you want
>> 
>> If you want a sufficient number of lines that manually writing index + 
>> becomes cumbersome, you could use something like:
>> 
>> as.vector(sapply(c(2, 4), "+", e2 = index))
>> 
>> HTH,
>> 
>> Josh
>> 
>> On Jul 11, 2011, at 11:09, Simon Kiss  wrote:
>> 
>>> Josh, that's amazing. Is there any way to have it grab two different lines 
>>> after the grep, say the second and the fourth line? There's some other 
>>> information in the text file I'd like to grab.  I could do two separate 
>>> commands, but I'd like to know if this could be done in one command...
>>> Simon Kiss
>>> On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
>>> 
 If you know you can find the start of the document (say that line
 always starts with Document...), then:
 
 grep("Document+.", yourfile, value = FALSE) + 4
 
 should give you 4 lines after each line where Document occurred.  No
 loop needed :)
 
 On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss  wrote:
> Hi Josh,
> Sorry for the insufficient introduction. This might work, but I'm not 
> sure.
> The file that I have includes up to 100 documents (Document 1, Document 
> 2, Document 3Document 100) with the newspaper name following 4 lines 
> below each Document number.
> I'm using readlines to get the text file into R and then trying to use 
> grep to get the newspaper name for each record. But your idea of indexing 
> the text object read into R with the line number where the newspaper name 
> is found is a good one.  I'll just have to come up with a loop to tell R 
> to get the 4th, 8th, 12, 16th, line, etc.
> I'll see if I can get that to work.
> Simon
> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
> 
>> Dear Simon,
>> 
>> Maybe I don't understand properlyif you are doing this in R, can't
>> you just pick the line you want?
>> 
>> Josh
>> 
>> ## print your data to clipboard
>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>> "clipboard")
>> ## read data in, and only select the 4th line to pass to grep()
>> grep("pattern", x = readLines("clipboard")[4])
>> 
>> 
>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
>>> Dear colleagues,
>>> I have a series of newspaper articles in a text file, downloaded from a 
>>> text file.  They look as follows:
>>> 
>>> Document 1 of 100
>>> \n
>>> \n
>>> \n
>>> Newspaper Name
>>> \n
>>> \n
>>> Day Date
>>> 
>>> I have a series of grep scripts that can extract the date and convert 
>>> it to a date object, but I can't figure out how to grep the newspaper 
>>> name.  There is no field ID attached to those lines. The best I can 
>>> come up with would be to have the program grep the four lines following 
>>> matching the pattern "Document [0-9]".  There is an an argument to grep 
>>> in unix that can do this ...grep -A4 'pattern' infile>outfile, but I 
>>> don't know if there is an equivalent argument in R.
>>> 
>>> Any thoughts.
>>> Yours, Simon Kiss
>>> *
>>> Simon J. Kiss, PhD
>>> Assistant Professor, Wilfrid Laurier University
>>> 73 George Street
>>> Brantford, Ontario, Canada
>>> N3T 2C9
>>> Cell: +1 905 746 7606
>>> 
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>> 
>> 
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> https://joshuawiley.com/
> 
> *
> Simon J. Kiss, PhD
> Assist

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Bert Gunter
Simon:

Basic basic stuff (not grep -- the stuff thereafter) . Please read the
docs, especially the tutorial,  An Intro to R.

... and Josh's solution can be shortened to (as he knows):

index <- grep("Document+.", yourfile, value = FALSE) + c(2,4)

-- Bert

On Mon, Jul 11, 2011 at 11:19 AM, Joshua Wiley  wrote:
> Try this (untested as I'm on my iPhone now):
>
> index <- grep("Document+.", yourfile, value = FALSE)
> index <- c(index + 2, index + 4)
>
> You just need to make sure you avoid recycling, e.g.,
>
> 1:10 + c(2, 4) # not what you want
>
> If you want a sufficient number of lines that manually writing index + 
> becomes cumbersome, you could use something like:
>
> as.vector(sapply(c(2, 4), "+", e2 = index))
>
> HTH,
>
> Josh
>
> On Jul 11, 2011, at 11:09, Simon Kiss  wrote:
>
>> Josh, that's amazing. Is there any way to have it grab two different lines 
>> after the grep, say the second and the fourth line? There's some other 
>> information in the text file I'd like to grab.  I could do two separate 
>> commands, but I'd like to know if this could be done in one command...
>> Simon Kiss
>> On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
>>
>>> If you know you can find the start of the document (say that line
>>> always starts with Document...), then:
>>>
>>> grep("Document+.", yourfile, value = FALSE) + 4
>>>
>>> should give you 4 lines after each line where Document occurred.  No
>>> loop needed :)
>>>
>>> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss  wrote:
 Hi Josh,
 Sorry for the insufficient introduction. This might work, but I'm not sure.
 The file that I have includes up to 100 documents (Document 1, Document 2, 
 Document 3Document 100) with the newspaper name following 4 lines 
 below each Document number.
 I'm using readlines to get the text file into R and then trying to use 
 grep to get the newspaper name for each record. But your idea of indexing 
 the text object read into R with the line number where the newspaper name 
 is found is a good one.  I'll just have to come up with a loop to tell R 
 to get the 4th, 8th, 12, 16th, line, etc.
 I'll see if I can get that to work.
 Simon
 On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:

> Dear Simon,
>
> Maybe I don't understand properlyif you are doing this in R, can't
> you just pick the line you want?
>
> Josh
>
> ## print your data to clipboard
> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
> "clipboard")
> ## read data in, and only select the 4th line to pass to grep()
> grep("pattern", x = readLines("clipboard")[4])
>
>
> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
>> Dear colleagues,
>> I have a series of newspaper articles in a text file, downloaded from a 
>> text file.  They look as follows:
>>
>> Document 1 of 100
>> \n
>> \n
>> \n
>> Newspaper Name
>> \n
>> \n
>> Day Date
>>
>> I have a series of grep scripts that can extract the date and convert it 
>> to a date object, but I can't figure out how to grep the newspaper name. 
>>  There is no field ID attached to those lines. The best I can come up 
>> with would be to have the program grep the four lines following matching 
>> the pattern "Document [0-9]".  There is an an argument to grep in unix 
>> that can do this ...grep -A4 'pattern' infile>outfile, but I don't know 
>> if there is an equivalent argument in R.
>>
>> Any thoughts.
>> Yours, Simon Kiss
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> https://joshuawiley.com/

 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606












>>>
>>>
>>>
>>> --
>>> Joshua Wiley
>>> Ph.D. Student, Health Psychology
>>> University of California, Los Angeles
>>> https://joshuawiley.com/
>>
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
> ___

Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley
Try this (untested as I'm on my iPhone now):

index <- grep("Document+.", yourfile, value = FALSE)
index <- c(index + 2, index + 4)

You just need to make sure you avoid recycling, e.g.,

1:10 + c(2, 4) # not what you want

If you want a sufficient number of lines that manually writing index + becomes 
cumbersome, you could use something like:

as.vector(sapply(c(2, 4), "+", e2 = index))

HTH,

Josh

On Jul 11, 2011, at 11:09, Simon Kiss  wrote:

> Josh, that's amazing. Is there any way to have it grab two different lines 
> after the grep, say the second and the fourth line? There's some other 
> information in the text file I'd like to grab.  I could do two separate 
> commands, but I'd like to know if this could be done in one command...
> Simon Kiss
> On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:
> 
>> If you know you can find the start of the document (say that line
>> always starts with Document...), then:
>> 
>> grep("Document+.", yourfile, value = FALSE) + 4
>> 
>> should give you 4 lines after each line where Document occurred.  No
>> loop needed :)
>> 
>> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss  wrote:
>>> Hi Josh,
>>> Sorry for the insufficient introduction. This might work, but I'm not sure.
>>> The file that I have includes up to 100 documents (Document 1, Document 2, 
>>> Document 3Document 100) with the newspaper name following 4 lines below 
>>> each Document number.
>>> I'm using readlines to get the text file into R and then trying to use grep 
>>> to get the newspaper name for each record. But your idea of indexing the 
>>> text object read into R with the line number where the newspaper name is 
>>> found is a good one.  I'll just have to come up with a loop to tell R to 
>>> get the 4th, 8th, 12, 16th, line, etc.
>>> I'll see if I can get that to work.
>>> Simon
>>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>>> 
 Dear Simon,
 
 Maybe I don't understand properlyif you are doing this in R, can't
 you just pick the line you want?
 
 Josh
 
 ## print your data to clipboard
 cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
 "clipboard")
 ## read data in, and only select the 4th line to pass to grep()
 grep("pattern", x = readLines("clipboard")[4])
 
 
 On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
> Dear colleagues,
> I have a series of newspaper articles in a text file, downloaded from a 
> text file.  They look as follows:
> 
> Document 1 of 100
> \n
> \n
> \n
> Newspaper Name
> \n
> \n
> Day Date
> 
> I have a series of grep scripts that can extract the date and convert it 
> to a date object, but I can't figure out how to grep the newspaper name.  
> There is no field ID attached to those lines. The best I can come up with 
> would be to have the program grep the four lines following matching the 
> pattern "Document [0-9]".  There is an an argument to grep in unix that 
> can do this ...grep -A4 'pattern' infile>outfile, but I don't know if 
> there is an equivalent argument in R.
> 
> Any thoughts.
> Yours, Simon Kiss
> *
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 905 746 7606
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
 
 
 
 --
 Joshua Wiley
 Ph.D. Student, Health Psychology
 University of California, Los Angeles
 https://joshuawiley.com/
>>> 
>>> *
>>> Simon J. Kiss, PhD
>>> Assistant Professor, Wilfrid Laurier University
>>> 73 George Street
>>> Brantford, Ontario, Canada
>>> N3T 2C9
>>> Cell: +1 905 746 7606
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> https://joshuawiley.com/
> 
> *
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 905 746 7606
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss
Josh, that's amazing. Is there any way to have it grab two different lines 
after the grep, say the second and the fourth line? There's some other 
information in the text file I'd like to grab.  I could do two separate 
commands, but I'd like to know if this could be done in one command...
Simon Kiss
On 2011-07-11, at 1:31 PM, Joshua Wiley wrote:

> If you know you can find the start of the document (say that line
> always starts with Document...), then:
> 
> grep("Document+.", yourfile, value = FALSE) + 4
> 
> should give you 4 lines after each line where Document occurred.  No
> loop needed :)
> 
> On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss  wrote:
>> Hi Josh,
>> Sorry for the insufficient introduction. This might work, but I'm not sure.
>> The file that I have includes up to 100 documents (Document 1, Document 2, 
>> Document 3Document 100) with the newspaper name following 4 lines below 
>> each Document number.
>> I'm using readlines to get the text file into R and then trying to use grep 
>> to get the newspaper name for each record. But your idea of indexing the 
>> text object read into R with the line number where the newspaper name is 
>> found is a good one.  I'll just have to come up with a loop to tell R to get 
>> the 4th, 8th, 12, 16th, line, etc.
>> I'll see if I can get that to work.
>> Simon
>> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>> 
>>> Dear Simon,
>>> 
>>> Maybe I don't understand properlyif you are doing this in R, can't
>>> you just pick the line you want?
>>> 
>>> Josh
>>> 
>>> ## print your data to clipboard
>>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>>> "clipboard")
>>> ## read data in, and only select the 4th line to pass to grep()
>>> grep("pattern", x = readLines("clipboard")[4])
>>> 
>>> 
>>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
 Dear colleagues,
 I have a series of newspaper articles in a text file, downloaded from a 
 text file.  They look as follows:
 
 Document 1 of 100
 \n
 \n
 \n
 Newspaper Name
 \n
 \n
 Day Date
 
 I have a series of grep scripts that can extract the date and convert it 
 to a date object, but I can't figure out how to grep the newspaper name.  
 There is no field ID attached to those lines. The best I can come up with 
 would be to have the program grep the four lines following matching the 
 pattern "Document [0-9]".  There is an an argument to grep in unix that 
 can do this ...grep -A4 'pattern' infile>outfile, but I don't know if 
 there is an equivalent argument in R.
 
 Any thoughts.
 Yours, Simon Kiss
 *
 Simon J. Kiss, PhD
 Assistant Professor, Wilfrid Laurier University
 73 George Street
 Brantford, Ontario, Canada
 N3T 2C9
 Cell: +1 905 746 7606
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
>>> 
>>> 
>>> 
>>> --
>>> Joshua Wiley
>>> Ph.D. Student, Health Psychology
>>> University of California, Los Angeles
>>> https://joshuawiley.com/
>> 
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> https://joshuawiley.com/

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley
If you know you can find the start of the document (say that line
always starts with Document...), then:

grep("Document+.", yourfile, value = FALSE) + 4

should give you 4 lines after each line where Document occurred.  No
loop needed :)

On Mon, Jul 11, 2011 at 10:25 AM, Simon Kiss  wrote:
> Hi Josh,
> Sorry for the insufficient introduction. This might work, but I'm not sure.
> The file that I have includes up to 100 documents (Document 1, Document 2, 
> Document 3Document 100) with the newspaper name following 4 lines below 
> each Document number.
> I'm using readlines to get the text file into R and then trying to use grep 
> to get the newspaper name for each record. But your idea of indexing the text 
> object read into R with the line number where the newspaper name is found is 
> a good one.  I'll just have to come up with a loop to tell R to get the 4th, 
> 8th, 12, 16th, line, etc.
> I'll see if I can get that to work.
> Simon
> On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:
>
>> Dear Simon,
>>
>> Maybe I don't understand properlyif you are doing this in R, can't
>> you just pick the line you want?
>>
>> Josh
>>
>> ## print your data to clipboard
>> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
>> "clipboard")
>> ## read data in, and only select the 4th line to pass to grep()
>> grep("pattern", x = readLines("clipboard")[4])
>>
>>
>> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
>>> Dear colleagues,
>>> I have a series of newspaper articles in a text file, downloaded from a 
>>> text file.  They look as follows:
>>>
>>> Document 1 of 100
>>> \n
>>> \n
>>> \n
>>> Newspaper Name
>>> \n
>>> \n
>>> Day Date
>>>
>>> I have a series of grep scripts that can extract the date and convert it to 
>>> a date object, but I can't figure out how to grep the newspaper name.  
>>> There is no field ID attached to those lines. The best I can come up with 
>>> would be to have the program grep the four lines following matching the 
>>> pattern "Document [0-9]".  There is an an argument to grep in unix that can 
>>> do this ...grep -A4 'pattern' infile>outfile, but I don't know if there is 
>>> an equivalent argument in R.
>>>
>>> Any thoughts.
>>> Yours, Simon Kiss
>>> *
>>> Simon J. Kiss, PhD
>>> Assistant Professor, Wilfrid Laurier University
>>> 73 George Street
>>> Brantford, Ontario, Canada
>>> N3T 2C9
>>> Cell: +1 905 746 7606
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> https://joshuawiley.com/
>
> *
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 905 746 7606
>
>
>
>
>
>
>
>
>
>
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss
Hi Josh,
Sorry for the insufficient introduction. This might work, but I'm not sure.
The file that I have includes up to 100 documents (Document 1, Document 2, 
Document 3Document 100) with the newspaper name following 4 lines below 
each Document number.
I'm using readlines to get the text file into R and then trying to use grep to 
get the newspaper name for each record. But your idea of indexing the text 
object read into R with the line number where the newspaper name is found is a 
good one.  I'll just have to come up with a loop to tell R to get the 4th, 8th, 
12, 16th, line, etc. 
I'll see if I can get that to work.
Simon
On 2011-07-11, at 12:45 PM, Joshua Wiley wrote:

> Dear Simon,
> 
> Maybe I don't understand properlyif you are doing this in R, can't
> you just pick the line you want?
> 
> Josh
> 
> ## print your data to clipboard
> cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
> "clipboard")
> ## read data in, and only select the 4th line to pass to grep()
> grep("pattern", x = readLines("clipboard")[4])
> 
> 
> On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
>> Dear colleagues,
>> I have a series of newspaper articles in a text file, downloaded from a text 
>> file.  They look as follows:
>> 
>> Document 1 of 100
>> \n
>> \n
>> \n
>> Newspaper Name
>> \n
>> \n
>> Day Date
>> 
>> I have a series of grep scripts that can extract the date and convert it to 
>> a date object, but I can't figure out how to grep the newspaper name.  There 
>> is no field ID attached to those lines. The best I can come up with would be 
>> to have the program grep the four lines following matching the pattern 
>> "Document [0-9]".  There is an an argument to grep in unix that can do this 
>> ...grep -A4 'pattern' infile>outfile, but I don't know if there is an 
>> equivalent argument in R.
>> 
>> Any thoughts.
>> Yours, Simon Kiss
>> *
>> Simon J. Kiss, PhD
>> Assistant Professor, Wilfrid Laurier University
>> 73 George Street
>> Brantford, Ontario, Canada
>> N3T 2C9
>> Cell: +1 905 746 7606
>> 
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> https://joshuawiley.com/

*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grep lines before or after pattern matched?

2011-07-11 Thread Joshua Wiley
Dear Simon,

Maybe I don't understand properlyif you are doing this in R, can't
you just pick the line you want?

Josh

## print your data to clipboard
cat("Document 1 of 100 \n \n \n Newspaper Name \n \n Day Date", file =
"clipboard")
## read data in, and only select the 4th line to pass to grep()
grep("pattern", x = readLines("clipboard")[4])


On Mon, Jul 11, 2011 at 9:31 AM, Simon Kiss  wrote:
> Dear colleagues,
> I have a series of newspaper articles in a text file, downloaded from a text 
> file.  They look as follows:
>
> Document 1 of 100
> \n
> \n
> \n
> Newspaper Name
> \n
> \n
> Day Date
>
> I have a series of grep scripts that can extract the date and convert it to a 
> date object, but I can't figure out how to grep the newspaper name.  There is 
> no field ID attached to those lines. The best I can come up with would be to 
> have the program grep the four lines following matching the pattern "Document 
> [0-9]".  There is an an argument to grep in unix that can do this ...grep -A4 
> 'pattern' infile>outfile, but I don't know if there is an equivalent argument 
> in R.
>
> Any thoughts.
> Yours, Simon Kiss
> *
> Simon J. Kiss, PhD
> Assistant Professor, Wilfrid Laurier University
> 73 George Street
> Brantford, Ontario, Canada
> N3T 2C9
> Cell: +1 905 746 7606
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] grep lines before or after pattern matched?

2011-07-11 Thread Simon Kiss
Dear colleagues,
I have a series of newspaper articles in a text file, downloaded from a text 
file.  They look as follows:

Document 1 of 100
\n
\n
\n
Newspaper Name
\n
\n
Day Date

I have a series of grep scripts that can extract the date and convert it to a 
date object, but I can't figure out how to grep the newspaper name.  There is 
no field ID attached to those lines. The best I can come up with would be to 
have the program grep the four lines following matching the pattern "Document 
[0-9]".  There is an an argument to grep in unix that can do this ...grep -A4 
'pattern' infile>outfile, but I don't know if there is an equivalent argument 
in R.

Any thoughts.
Yours, Simon Kiss
*
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 905 746 7606

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.