Hi, I want to extract information from a number of text files in a folder. The files are named as : 82534.txt, 82555.txt, 8282787.txt etc.
I give below a sample of the kind of the information in the text file : ######## #(a lot of preceding text) 2008-10-01 06:30:12 2 of 3 page #(some lines of text - varies from file to file) sekvens 890 # lines of text sNo start stop direction value 1 70 85 up 60.2 3 60 90 down 71.5 ######### In each of the files that I choose, I want to first go to the appropriate page number. This is the first line in the above text and the page number is 2 (from 2 of 3). The date and time preceding the page number vary from file to file, but the next line always has the word, page. After that, I am interested in the number following the word, sekvens. Also, the table underneath. Finally, I want to collect all the data in a data frame with the following structure : fileno sekvens sNo start stop direction value 82534 890 1 70 85 up 60.2 82534 890 3 60 90 down 71.5 82555 .. .. .. .. .. .. There are a number of topics involved here where I have almost no familiarity. First, the use of regular expressions to specify the files that I want from a folder. Next, how do I locate a particular section (or page) in the text file from the description that I am interested in? Should these files be read in their entirety first, or is it possible to directly go the section with the relevant text? Next, how do I extract the data in the form that I want? I have identified the following commands that would be useful for me here : list.files(), readLines(), strsplit(). I would appreciate some help in getting started here. I would certainly benefit from a few hints. I would also appreciate it if I could get some links to references with examples showing how similiar problems are tackled. Thanking you, Ravi ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.