On Tue, May 31, 2016 at 7:05 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > You need to go back and study how I made my solution reproducible and make > your problem reproducible. > > You probably also ought to spend some time comparing the regex pattern to > your actual data... the point of this list is to learn how to construct these > solutions yourself.
Ah, if only that were the case. (or is that just the grumbling of an old curmudgeon?) Cheers, Bert > -- > Sent from my phone. Please excuse my brevity. > > On May 31, 2016 6:26:31 PM PDT, Val <valkr...@gmail.com> wrote: >>Thank you so much Jeff. It worked for this example. >> >>When I read it from a file (c:\data\test.txt) it did not work >> >>KLEM="c:\data" >>KR=paste(KLEM,"\test.txt",sep="") >>indta <- readLines(KR, skip=46) # not interested in the first 46 >>lines) >> >>pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$" >>firstlines <- grep( pattern, indta ) >># Replace the matched portion (entire string) with the first capture # >>string >>v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) ) >># Replace the matched portion (entire string) with the second capture # >>string >>v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) ) >># Convert the lines just after the first lines to numeric >>v3 <- as.numeric( indta[ firstlines + 1 ] ) >># put it all into a data frame >>result <- data.frame( Group = v1, Mean = v2, SE = v3 ) >> >>result >>[1] Group Mean SE >><0 rows> (or 0-length row.names) >> >>Thank you in advance >> >> >>On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller >><jdnew...@dcn.davis.ca.us> wrote: >>> Please learn to post in plain text (the setting is in your email >>client... >>> somewhere), as HTML is "What We See Is Not What You Saw" on this >>mailing >>> list. In conjunction with that, try reading some of the fine >>material >>> mentioned in the Posting Guide about making reproducible examples >>like this >>> one: >>> >>> # You could read in a file >>> # indta <- readLines( "out.txt" ) >>> # but there is no "current directory" in an email >>> # so here I have used the dput() function to make source code >>> # that creates a self-contained R object >>> >>> indta <- c( >>> "Mean of weight group 1, SE of mean : 72.289037489555276", >>> " 11.512956539215610", >>> "Average weight of group 2, SE of Mean : 83.940053900595013", >>> " 10.198495690144522", >>> "group 3 mean , SE of Mean : 78.310441258245469", >>> " 13.015876679555", >>> "Mean of weight of group 4, SE of Mean : >>76.967516495101669", >>> " 12.1254882985", "") >>> >>> # Regular expression patterns are discussed all over the internet >>> # in many places OTHER than R >>> # You can start with ?regex, but there are many fine tutorials also >>> >>> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$" >>> # For this task the regex has to match the whole "first line" of each >>set >>> # ^ =match starting at the beginning of the string >>> # .* =any character, zero or more times >>> # "group " =match these characters >>> # ( =first capture string starts here >>> # \\d = any digit (first backslash for R, second backslash for >>regex) >>> # + =one or more of the preceding (any digit) >>> # ) =end of first capture string >>> # [^:] =any non-colon character >>> # * =zero or more of the preceding (non-colon character) >>> # : =match a colon exactly >>> # " *" =match zero or more spaces >>> # ( =second capture string starts here >>> # [ =start of a set of equally acceptable characters >>> # -+ =either of these characters are acceptable >>> # 0-9 =any digit would be acceptable >>> # . =a period is acceptable (this is inside the []) >>> # eE =in case you get exponential notation input >>> # ] =end of the set of acceptable characters (number) >>> # * =number of acceptable characters can be zero or more >>> # ) =second capture string stops here >>> # .* =zero or more of any character (just in case) >>> # $ =at end of pattern, requires that the match reach the end >>> # of the string >>> >>> # identify indexes of strings that match the pattern >>> firstlines <- grep( pattern, indta ) >>> # Replace the matched portion (entire string) with the first capture >># >>> string >>> v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) ) >>> # Replace the matched portion (entire string) with the second capture >># >>> string >>> v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) ) >>> # Convert the lines just after the first lines to numeric >>> v3 <- as.numeric( indta[ firstlines + 1 ] ) >>> # put it all into a data frame >>> result <- data.frame( Group = v1, Mean = v2, SE = v3 ) >>> >>> Figuring out how to deliver your result (output) is a separate >>question that >>> depends where you want it to go. >>> >>> >>> On Mon, 30 May 2016, Val wrote: >>> >>>> Hi all, >>>> >>>> I have a messy text file and from this text file I want extract some >>>> information >>>> here is the text file (out.txt). One record has tow lines. The mean >>comes >>>> in the first line and the SE of the mean is on the second line. Here >>is >>>> the >>>> sample of the data. >>>> >>>> Mean of weight group 1, SE of mean : 72.289037489555276 >>>> 11.512956539215610 >>>> Average weight of group 2, SE of Mean : 83.940053900595013 >>>> 10.198495690144522 >>>> group 3 mean , SE of Mean : 78.310441258245469 >>>> 13.015876679555 >>>> Mean of weight of group 4, SE of Mean : >>76.967516495101669 >>>> 12.1254882985 >>>> >>>> I want produce the following table. How do i read it first and then >>>> produce a >>>> >>>> >>>> Gr1 72.289037489555276 11.512956539215610 >>>> Gr2 83.940053900595013 10.198495690144522 >>>> Gr3 78.310441258245469 13.015876679555 >>>> Gr4 76.967516495101669 12.1254882985 >>>> >>>> >>>> Thank you in advance >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>--------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >>Live... >>> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>Go... >>> Live: OO#.. Dead: OO#.. >>Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>rocks...1k >>> >>--------------------------------------------------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.