Re: [R] Extract from a text file

Bert Gunter Tue, 31 May 2016 22:07:07 -0700

On Tue, May 31, 2016 at 7:05 PM, Jeff Newmiller
<jdnew...@dcn.davis.ca.us> wrote:
> You need to go back and study how I made my solution reproducible and make 
> your problem reproducible.
>
> You probably also ought to spend some time comparing the regex pattern to 
> your actual data... the point of this list is to learn how to construct these 
> solutions yourself.



Ah, if only that were the case.

(or is that just the grumbling of an old curmudgeon?)

Cheers,
Bert


> --
> Sent from my phone. Please excuse my brevity.
>
> On May 31, 2016 6:26:31 PM PDT, Val <valkr...@gmail.com> wrote:
>>Thank you so much Jeff. It worked for this example.
>>
>>When I read it from a file (c:\data\test.txt) it did not work
>>
>>KLEM="c:\data"
>>KR=paste(KLEM,"\test.txt",sep="")
>>indta <- readLines(KR, skip=46)  # not interested in the first 46
>>lines)
>>
>>pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>>firstlines <- grep( pattern, indta )
>># Replace the matched portion (entire string) with the first capture #
>>string
>>v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
>># Replace the matched portion (entire string) with the second capture #
>>string
>>v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
>># Convert the lines just after the first lines to numeric
>>v3 <- as.numeric( indta[ firstlines + 1 ] )
>># put it all into a data frame
>>result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>>
>>result
>>[1] Group Mean  SE
>><0 rows> (or 0-length row.names)
>>
>>Thank you in advance
>>
>>
>>On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller
>><jdnew...@dcn.davis.ca.us> wrote:
>>> Please learn to post in plain text (the setting is in your email
>>client...
>>> somewhere), as HTML is "What We See Is Not What You Saw" on this
>>mailing
>>> list.  In conjunction with that, try reading some of the fine
>>material
>>> mentioned in the Posting Guide about making reproducible examples
>>like this
>>> one:
>>>
>>> # You could read in a file
>>> # indta <- readLines( "out.txt" )
>>> # but there is no "current directory" in an email
>>> # so here I have used the dput() function to make source code
>>> # that creates a self-contained R object
>>>
>>> indta <- c(
>>> "Mean of weight  group 1, SE of mean  :  72.289037489555276",
>>> " 11.512956539215610",
>>> "Average weight of group 2, SE of Mean :  83.940053900595013",
>>> "  10.198495690144522",
>>> "group 3 mean , SE of Mean     :                78.310441258245469",
>>> " 13.015876679555",
>>> "Mean of weight of group 4, SE of Mean               :
>>76.967516495101669",
>>> " 12.1254882985", "")
>>>
>>> # Regular expression patterns are discussed all over the internet
>>> # in many places OTHER than R
>>> # You can start with ?regex, but there are many fine tutorials also
>>>
>>> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>>> # For this task the regex has to match the whole "first line" of each
>>set
>>> #  ^ =match starting at the beginning of the string
>>> #  .* =any character, zero or more times
>>> #  "group " =match these characters
>>> #  ( =first capture string starts here
>>> #  \\d = any digit (first backslash for R, second backslash for
>>regex)
>>> #  + =one or more of the preceding (any digit)
>>> #  ) =end of first capture string
>>> #  [^:] =any non-colon character
>>> #  * =zero or more of the preceding (non-colon character)
>>> #  : =match a colon exactly
>>> #  " *" =match zero or more spaces
>>> #  ( =second capture string starts here
>>> #  [ =start of a set of equally acceptable characters
>>> #  -+ =either of these characters are acceptable
>>> #  0-9 =any digit would be acceptable
>>> #  . =a period is acceptable (this is inside the [])
>>> #  eE =in case you get exponential notation input
>>> #  ] =end of the set of acceptable characters (number)
>>> #  * =number of acceptable characters can be zero or more
>>> #  ) =second capture string stops here
>>> #  .* =zero or more of any character (just in case)
>>> #  $ =at end of pattern, requires that the match reach the end
>>> #     of the string
>>>
>>> # identify indexes of strings that match the pattern
>>> firstlines <- grep( pattern, indta )
>>> # Replace the matched portion (entire string) with the first capture
>>#
>>> string
>>> v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
>>> # Replace the matched portion (entire string) with the second capture
>>#
>>> string
>>> v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
>>> # Convert the lines just after the first lines to numeric
>>> v3 <- as.numeric( indta[ firstlines + 1 ] )
>>> # put it all into a data frame
>>> result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>>>
>>> Figuring out how to deliver your result (output) is a separate
>>question that
>>> depends where you want it to go.
>>>
>>>
>>> On Mon, 30 May 2016, Val wrote:
>>>
>>>> Hi all,
>>>>
>>>> I have a messy text file and from this text file I want extract some
>>>> information
>>>> here is the text file (out.txt).  One record has tow lines. The mean
>>comes
>>>> in the first line and the SE of the mean is on the second line. Here
>>is
>>>> the
>>>> sample of the data.
>>>>
>>>> Mean of weight  group 1, SE of mean  :  72.289037489555276
>>>> 11.512956539215610
>>>> Average weight of group 2, SE of Mean :  83.940053900595013
>>>>  10.198495690144522
>>>> group 3 mean , SE of Mean     :                78.310441258245469
>>>> 13.015876679555
>>>> Mean of weight of group 4, SE of Mean               :
>>76.967516495101669
>>>> 12.1254882985
>>>>
>>>> I want produce the following  table. How do i read it first and then
>>>> produce a
>>>>
>>>>
>>>> Gr1  72.289037489555276   11.512956539215610
>>>> Gr2  83.940053900595013   10.198495690144522
>>>> Gr3  78.310441258245469   13.015876679555
>>>> Gr4  76.967516495101669   12.1254882985
>>>>
>>>>
>>>> Thank you in advance
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>---------------------------------------------------------------------------
>>> Jeff Newmiller                        The     .....       .....  Go
>>Live...
>>> DCN:<jdnew...@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>>Go...
>>>                                       Live:   OO#.. Dead: OO#..
>>Playing
>>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>>> /Software/Embedded Controllers)               .OO#.       .OO#.
>>rocks...1k
>>>
>>---------------------------------------------------------------------------
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extract from a text file

Reply via email to