[R] Tolearance Interval calculation for each point in a data set?
Hello, I am looking for tolerance interval related methodologies and found the package "tolerance" which will e.g. nicely calculate the 95%/95% tolerance limits of a given regression. What I am looking for, however is not only the the tolerance limits this calculation defines, but I would like to know for each data point in the set which tolerance band is passing through it. Does anyone know of such methodology and if yes an R implementation? Thank you for any pointers. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Construct plot combination using grid without plotting and retrieving an object?
Hi, I'm currently combining multiple plots using something along the lines of the following pseudo-code: library(grid) grid.newpage() tmpLayout <- grid.layout( nrow=4, ncol=2) pushViewport(viewport(layout = tmpLayout)) and than proceeding with filling the viewports ... works fine, but for packaging of functions I would really prefer if I could assemble all of this in an object which in the end would be callable with "print". I'm envisioning something along the lines of what I can do with ggplot2: return a plot as a ggpplot object and plot it later rather than as I assemble it. Is that possible with a complex grid figure? Thanks for any pointers. Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grep help (character ommission)
Hello, Banging my head against a wall here ... can anyone light the way to a pattern modification that would make the following TRUE? identical( grep( "^Intensity\\s[^HL]", c("Intensity","Intensity L", "Intensity H", "Intensity Rep1")), as.integer(c(1,4))) Thank you for your time. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Peak detector help!?
Johannes Graumann wrote: > Grrr ... new trial with code here: http://pastebin.com/RjHNNG9J > Maybe the amount of inline-code prevented posting? > > Hello, > > I am writing a simple peak detector and it works quite well ... however > there's one special case below, that I can't get my head wrapped around > ... the problem is in the "Deal with not fully qualified peaks at the > sequence extremes" section, but I cannot seem to come up with a condition > that would be met in the special case below and base a fix on it ... mind > completely poisoned with trial solutions that didn't work ... > > A fresh hint anyone? > > Sincerely, Joh Sleep brought some insight. The code here http://pastebin.com/UXzbzqp8 works for now - I have already quite a number of test cases that gave me problems and now don't ... probably more corner cases somewhere, but they will be dealt with as they show up. Cheers, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Peak detector help!?
Grrr ... new trial with code here: http://pastebin.com/RjHNNG9J Maybe the amount of inline-code prevented posting? Hello, I am writing a simple peak detector and it works quite well ... however there's one special case below, that I can't get my head wrapped around ... the problem is in the "Deal with not fully qualified peaks at the sequence extremes" section, but I cannot seem to come up with a condition that would be met in the special case below and base a fix on it ... mind completely poisoned with trial solutions that didn't work ... A fresh hint anyone? Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R CMD check: Error in get("ptime", pos = "CheckExEnv") ...
Prof Brian Ripley wrote: > On 30/01/2013 06:02, Johannes Graumann wrote: >> Hi, >> >> Does anyboody have insight into what this error terminating "R CMD check" >> on an in-house package may imply? > > You have re-defined cat(), so I guess you re-defined get() too. Aha! "cat" not, but one of my functions contains indeed > #' @method get rcfpdsuperclass > setGeneric("get", function(object, slot) standardGeneric("get")) > setMethod( > "get", > signature = signature(object = "rcfpdsuperclass",slot="character"), > definition=function(object,slot){ > result <- slot(object,slot) > if(inherits(x=result,what="RcfpdStoredObject")){ > return(evalObject(result)) > } else { > return(result) > } > }) How does one prevent interference like the one I see. Is the only way to re- name the "Get"? Thanks for any hints. Sincerely, Joh > >> >>> ### >>> cat("Time elapsed: ", proc.time() - get("ptime", pos = >>> 'CheckExEnv'),"\n") >> Error in get("ptime", pos = "CheckExEnv") : >>unused argument(s) (pos = "CheckExEnv") >> Calls: cat -> cat.default -> -> get >> Execution halted >> >> It happens in the "* checking examples ... ERROR" section, yet the >> example code in the file cited as the most likely source works just fine >> when executed manually ... >> >> thanks for any hint. >> >> Sincerely, Joh >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. >> > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R CMD check: Error in get("ptime", pos = "CheckExEnv") ...
Hi, Does anyboody have insight into what this error terminating "R CMD check" on an in-house package may imply? > ### > cat("Time elapsed: ", proc.time() - get("ptime", pos = 'CheckExEnv'),"\n") Error in get("ptime", pos = "CheckExEnv") : unused argument(s) (pos = "CheckExEnv") Calls: cat -> cat.default -> -> get Execution halted It happens in the "* checking examples ... ERROR" section, yet the example code in the file cited as the most likely source works just fine when executed manually ... thanks for any hint. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] parse/eval and character encoded expressions: How to deal with non-encoding strings?
Hi, I am intending to save a path-describing character object in a slot of a class I'm working on. In order to have the option to use "system.file" etc in such string-saved path definitions, I wrote this ExpressionEvaluator <- function(x){ x <- tryCatch( expr=base::parse(text=x), error = function(e){return(as.expression(x))}, finally=TRUE) return(x) } This produces > ExpressionEvaluator("system.file(\"INDEX\")") expression(system.file("INDEX")) > eval(ExpressionEvaluator("system.file(\"INDEX\")")) [1] "/usr/lib/R/library/base/INDEX" Which is what I want. However, > eval(ExpressionEvaluator("Test")) Error in eval(expr, envir, enclos) : object 'Test' not found prevents me from general usage (also in cases where "x" does NOT encode an expression). I don't understand why it is that > base::parse(text="Test") will return [1] expression(Test) while > as.expression("Test") produces [1] expression("Test") which would work with the eval call. Can anyone point out to me how to solve this generally? How can I feed the function a character object and get back an eval-able expression independent of whether there was an expression "encoded" in the input or not. Thank you for any hints. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Class definition and "contains": No definition was found for superclass
Hi, What goes wrong when the following error shows up: > Error in reconcilePropertiesAndPrototype(name, slots, prototype, > superClasses, : > No definition was found for superclass “sequencesuperclass” in the > specification of class “sequences” Has this something to do with recursive class inheritance? "sequences" contains "sequencessuperclass" contains "rcfpdsuperclass" ... Any hint is highly appreciated. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: "varwidth"-equivalent for geom_boxplot?
Hello, I'd like to resurrect this issue: is the "varwidth" equivalent (boxplot box-width scaling according to number of data points) emulatable in the 0.9.* versions of ggplot2? Width still doesn't seem capable of accepting a vector with length > 1 ... Thank you for your input. Sincerely, Joh On Thursday, March 11, 2010 10:07:17 PM UTC+3, hadley wickham wrote: > > No this currently isn't possible - it would require changes to > stat_boxplot to work. > > Hadley > > On Wed, Mar 10, 2010 at 9:12 AM, Johannes Graumann > > wrote: > > Apologies. > > > > from the "boxplot" documentation: > > "... if varwidth is TRUE, the boxes are drawn with widths proportional > to the > > square-roots of the number of observations in the groups." > > > > I find this option often very useful. > > > > Thanks for any insight into how to achieve this with geom_boxplot. > > > > Joh > > > > On Wednesday 10 March 2010 16:12:49 hadley wickham wrote: > >> What is varwidth? > >> > >> Hadley > >> > >> On Wed, Mar 10, 2010 at 1:55 PM, Johannes Graumann > >> > >> > wrote: > >> > Hi, > >> > > >> > Is there such a thing? If no: is it easily simulated? > >> > > >> > thanks, Joh > >> > > >> > __ > >> > r-h...@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html and provide commented, > >> > minimal, self-contained, reproducible code. > >> > > > > -- > Assistant Professor / Dobelman Family Junior Chair > Department of Statistics / Rice University > http://had.co.nz/ > > __ > r-h...@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >__ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] devtools - document() weiredness
Hello, Please try this: > library(devtools) > create("mdaa") > setwd("mdaa") > dev_mode() d> install() Produces ...* DONE (mdaa) Reloading installed mdaa But when I then try to build documentation d> document() devtools/roxygen just hangs with a "?" like so: Updating mdaa documentation Loading mdaa ? I know this scenario is strange as there's no roxygenizable stuff in the package, but I am trying to track down an identical error in one of my nascent packages and am wondering 1) where this behavior originates and 2) why document() does not provide more informative feedback. Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to document Reference Classes using Roxygen? Will Roxygen3 work for those?
Hi, Please see the subject line ;) Goolge only let me to people asking the same question, but no answers ... Am I out of luck with trying to in-line document Reference Classes? Thank you for your input. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Dabbling with R5 setRefClass - Inheritance problems
Ouch - and to think how much time I wasted on this ... Thanks! Joh Jose Iparraguirre wrote: > Hi Johannes, > > Just a typo. > > You've written > > ... > contains="rcfpdsuperclass") > > When, in fact, you've defined the object rcfdpsuperclass > > To highlight the mistake, I'll use capital letters: rcfPD... and rcfDP... > > Regards, > > José > > > > José Iparraguirre > Chief Economist > Age UK > > T 020 303 31482 > E jose.iparragui...@ageuk.org.uk > Twitter @jose.iparraguirre@ageuk > > > Tavis House, 1- 6 Tavistock Square > London, WC1H 9NB > www.ageuk.org.uk | ageukblog.org.uk | @ageukcampaigns > > > For a copy of our new Economic Monitor and the full Chief Economist's > report, visit the Age UK Knowledge Hub > http://www.ageuk.org.uk/professional-resources-home/knowledge-hub- evidence-statistics/ > > > For evidence and statistics on the older population, visit the Age UK > Knowledge Hub > http://www.ageuk.org.uk/professional-resources-home/knowledge-hub- evidence-statistics/ > > > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Johannes Graumann Sent: 08 November 2012 10:03 > To: r-h...@stat.math.ethz.ch > Subject: [R] Dabbling with R5 setRefClass - Inheritance problems > > Hello, > > I wrote a class like so: > >> rcfdpsuperclass <- setRefClass( >> Class="rcfdpsuperclass", >> fields = list( >>RcfpdVersion = "character"), >> methods = list( >>initialize = function(){ >> 'Populates fields with defaults and lock as appropriate' >> initFields( >>RcfpdVersion = as.character(packageVersion("RCFPD"))) >> lockBinding(sym="RcfpdVersion",env=.self) >> })) > > And a second one like this: > >> sequencesuperclass <- setRefClass( >> Class="sequencesuperclass", >> fields = list( >>test="character"), >> contains="rcfpdsuperclass") > > Executing the latter I get: >> Error in getClass(what, where = where) : >> "rcfpdsuperclass" is not a defined class > > Does someone have an idea what I am doing wrong? > > Thank you for your consideration. > > Sincerely, Joh > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. > > Wrap Up & Run 10k next March to raise vital funds for Age UK > > Six exciting new 10k races are taking place throughout the country and we > want you to join in the fun! Whether you're a runner or not, these are > events are for everyone ~ from walking groups to serious athletes. The Age > UK Events Team will provide you with a training plan to suit your level > and lots of tips to make this your first successful challenge of 2012. > Beat the January blues and raise some vital funds to help us prevent > avoidable deaths amongst older people this winter. > > > Sign up now! www.ageuk.org.uk/10k > > Coming to; London Crystal Palace, Southport, Tatton Park, Cheshire > Harewood House, Leeds,Coventry, Exeter > > > Age UK Improving later life > www.ageuk.org.uk > > > > > --- > Age UK is a registered charity and company limited by guarantee, > (registered charity number 1128267, registered company number 6825798). > Registered office: Tavis House, 1-6 Tavistock Square, London WC1H 9NA. > > For the purposes of promoting Age UK Insurance, Age UK is an Appointed > Representative of Age UK Enterprises Limited, Age UK is an Introducer > Appointed Representative of JLT Benefit Solutions Limited and Simplyhealth > Access for the purposes of introducing potential annuity and health > cash plans customers respectively. Age UK Enterprises Limited, JLT > Benefit Solutions Limited and Simplyhealth Access are all authorised and > regulated by the Financial Services Authority. > -- > > This email and any files transmitted with it are confidential and intended > solely for the use of the individual or entity to whom they are addressed. > If you receive a message in error, please advise the sender and delete > immediately. > > Except where this email is sent in the usual course of our business, any > opinions expressed in this email are those of the author and do not > necessarily reflect the opinions of
[R] Dabbling with R5 setRefClass - Inheritance problems
Hello, I wrote a class like so: > rcfdpsuperclass <- setRefClass( > Class="rcfdpsuperclass", > fields = list( >RcfpdVersion = "character"), > methods = list( >initialize = function(){ > 'Populates fields with defaults and lock as appropriate' > initFields( >RcfpdVersion = as.character(packageVersion("RCFPD"))) > lockBinding(sym="RcfpdVersion",env=.self) > })) And a second one like this: > sequencesuperclass <- setRefClass( > Class="sequencesuperclass", > fields = list( >test="character"), > contains="rcfpdsuperclass") Executing the latter I get: > Error in getClass(what, where = where) : > "rcfpdsuperclass" is not a defined class Does someone have an idea what I am doing wrong? Thank you for your consideration. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R5: Lock a class field from within a method?
David Winsemius wrote: > > On Oct 24, 2012, at 2:14 AM, Johannes Graumann wrote: > >> Hello, >> >> testclass <- setRefClass( >> "testclass", >> fields = list(testfield = "logical"), >> methods = list(validate=function(){testfield<<-TRUE})) >> >>> test <- testclass$new() >>> test$testfield >> logical(0) >>> test$validate() >>> test$testfield >> [1] TRUE >> >> Works just fine for me. >> >> I would love to be able to do something like >> >> testclass <- setRefClass( >> "testclass", >> fields = list(testfield = "logical"), >> methods = list(validate=function(){ >> testfield<<-TRUE >> .self$lock(testfield) >> })) >> >> but am unabel to achieve that. Can anyone point out how to go about >> rendering a field immutable after execution of a specific method? >> > > The fact that you used only "lock" in your code and I am unable to > find such a function makes me wonder whether that was an implicit > psuedo-code effort and that you do not know about: > > ?lockBinding lockbinding does in fact what I want, but your statement about pseudo-code is not entirely correct ... try the following ("?setRefClass" will light the way). > testclass <- setRefClass( "testclass", fields = list(testfield = "logical")) > testclass$lock("testfield") > test <- testclass$new() > test$testfield <- TRUE > test$testfield <- FALSE I was assuming to be able to have access to the "lock" method from a class instance as well (rather than just in the class definition) ... Is that indeed impossible? Cheers, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R5: Lock a class field from within a method?
Hello, testclass <- setRefClass( "testclass", fields = list(testfield = "logical"), methods = list(validate=function(){testfield<<-TRUE})) > test <- testclass$new() > test$testfield logical(0) > test$validate() > test$testfield [1] TRUE Works just fine for me. I would love to be able to do something like testclass <- setRefClass( "testclass", fields = list(testfield = "logical"), methods = list(validate=function(){ testfield<<-TRUE .self$lock(testfield) })) but am unabel to achieve that. Can anyone point out how to go about rendering a field immutable after execution of a specific method? Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R-implementation of Local Outlier Probabilities (LoOP)?
Dear all, Is anyone aware of an R implementation of LoOF (H.-P. Kriegel, P. Kröger, E. Schubert, A. Zimek; LoOP: Local Outlier Probabilities; In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM), Hong Kong, China: 1649–1652, 2009.)? I found http://cran.r- project.org/web/packages/Rlof/index.html, but would prefer the p-value'ish measure provided by LoOP. Alternatives implemented in R would also be valuable ... Thank you for your consideration. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector-subsetting with ZERO - Is behavior changeable?
Thank you very much. Learned something again! Joh William Dunlap wrote: > You can use [1] on the output of FUN to ensure that > exactly one value (perhaps NA from numeric(0)[1]) is > returned. E.g. > > > index <- 1 > > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)- index,0)][1]}) > [1] 2 1 NA > > I'll also put in a plug for vapply, which throws an > error if FUN does not return what you expect it to: > > > vapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)- index,0)]}, > > FUN.VALUE=numeric(1)) > Error in vapply(list(c(1, 2, 3), c(1, 2), c(1)), function(x) { : > values must be length 1, >but FUN(X[[3]]) result is length 0 > > vapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)- index,0)][1]}, > > FUN.VALUE=numeric(1)) > [1] 2 1 NA > > For long input vectors vapply can save a fair bit of > memory and time over sapply. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> -Original Message----- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] >> On Behalf Of Johannes Graumann >> Sent: Wednesday, October 05, 2011 4:29 AM >> To: r-h...@stat.math.ethz.ch >> Subject: [R] Vector-subsetting with ZERO - Is behavior changeable? >> >> Dear All, >> >> I have trouble generizising some code. >> >> > index <- 0 >> > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)- index,0)]}) >> Will yield a wished for vector like so: >> [1] 3 2 1 >> >> But in this case (trying to select te second to last element in each >> vector of the list) >> > index <- 1 >> > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)- index,0)]}) >> I end up with >> [[1]] >> [1] 2 >> >> [[2]] >> [1] 1 >> >> [[3]] >> numeric(0) >> >> I would (massively) prefer something like >> [1] 2 1 NA >> >> My current implementation looks like >> > index <- 1 >> > unlist( >> > sapply( >> > list(c(1,2,3),c(1,2),c(1)), >> > function(x){ >> > value <- x[max(length(x)-index,0)] >> > if(identical(value,numeric(0))){return(NA)} else {return(value)} >> > } >> > ) >> > ) >> [1] 2 1 NA >> >> Quite the inelegant eyesore. >> >> Any hints on how to do this better? >> >> Thanks, Joh >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vector-subsetting with ZERO - Is behavior changeable?
Dear All, I have trouble generizising some code. > index <- 0 > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)]}) Will yield a wished for vector like so: [1] 3 2 1 But in this case (trying to select te second to last element in each vector of the list) > index <- 1 > sapply(list(c(1,2,3),c(1,2),c(1)),function(x){x[max(length(x)-index,0)]}) I end up with [[1]] [1] 2 [[2]] [1] 1 [[3]] numeric(0) I would (massively) prefer something like [1] 2 1 NA My current implementation looks like > index <- 1 > unlist( > sapply( > list(c(1,2,3),c(1,2),c(1)), > function(x){ > value <- x[max(length(x)-index,0)] > if(identical(value,numeric(0))){return(NA)} else {return(value)} > } > ) > ) [1] 2 1 NA Quite the inelegant eyesore. Any hints on how to do this better? Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Prevent 'R CMD check' from reporting "NA"/"NA_character_" missmatch?
Prof Brian Ripley wrote: > On Mon, 4 Jul 2011, Johannes Graumann wrote: > >> Hello, >> >> I'm writing a package am running 'R CMD check' on it. >> >> Is there any way to make 'R CMD check' not warn about a missmatch between >> 'NA_character_' (in the function definition) and 'NA' (in the >> documentation)? > > Be consistent Why do you want incorrect documentation of your > package? (It is not clear of the circumstances here: normally 1 vs 1L > and similar are not reported if they are the only errors.) > > And please do note the posting guide > > - this is not really the correct list > - you were asked to give an actual example with output. > Taken to R-devel. Thanks. Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Prevent 'R CMD check' from reporting "NA"/"NA_character_" missmatch?
Hello, I'm writing a package am running 'R CMD check' on it. Is there any way to make 'R CMD check' not warn about a missmatch between 'NA_character_' (in the function definition) and 'NA' (in the documentation)? Thanks for any help. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table mystery
"count.fields" is a very nice hint for a clean solution - thank you! Joh On Sunday 06 March 2011 21:48:32 David Winsemius wrote: > On Mar 6, 2011, at 12:47 PM, Johannes Graumann wrote: > > Thank you for pointing this out. This is really inconvenient as I do > > not > > know a priori how many and where those darn cases containing an > > additional > > (or more) ":" might be ... > > There is a count.fields function that might assist with this task. > > You seem to have a multiline (variable number of lines) format of: > > :>sp|header with "|" AND white space separators > :VARIABLE_NUMBER_OF_CAP_LETTERS_60_CHAR_WIDE > +60:VARIABLE_NUMBER_OF_CAP_LETTERS_60_CHAR_WIDEE > +120:VARIABLE_NUMBER_OF_CAP_LETTERS_60_CHAR_WIDE > +180:EXCEPT_LAST > > No way that read.table can work. You might create an index with the > location of the high-count headers and then reprocess. > > log.idx <- count.fields("/tmp/testfile.txt") > 1 > corpus <- readLines("/tmp/testfile.txt") > > Then parse the headers and rejoin the broken multi-line content. There > may be worked examples in the archive for variable number multi-line > file formats. > > > The seems to work, but will fail if there's a "1:sdfjhlfkh:2:adlkjf" > > somewhere (1 & 2 both integerable). > > > > na.exclude(as.integer(scan("/tmp/ > > testfile.txt",sep=":",what="integer"))) > > > > More robust pointers anyone? > > > > Joh > > > > Sarah Goslee wrote: > >> Not so much a mystery. read.table() only looks at the first 5 lines > >> when > >> decided how many columns your file has (as described in the Details > >> section of the help). > >> > >> The easiest solution is to add a col.names argument to read.table() > >> with > >> the correct number of names. > >> > >> You may want to also include as.is=TRUE if you don't want your data > >> to > >> be imported as factors. If you expect character but have factor you > >> may > >> get unexpected results later. > >> > >> Sarah > >> > >> On Sun, Mar 6, 2011 at 5:04 AM, Johannes Graumann > >> > >> wrote: > >>> Hello, > >>> > >>> > >>> Please have a look at the code below, which I use to read in the > >>> attached > >>> file. As line 18 of the file reads "1065:>sp|Q9V3T9|ADRO_DROME > >>> NADPH:adrenodoxin oxidoreductase, mitochondrial OS=Drosophila > >>> melanogaster GN=dare PE=2 SV=1", I expect the code below to > >>> produce a 3 > >>> column data frame with most of the last column empty and line 18 to > >>> produce a data.frame row like so: > >>> > >>> V1 > >>> > >>> 1065 > >>> > >>> V2 > >>> > >>>> sp|Q9V3T9|ADRO_DROME NADPH > >>> > >>> V3 > >>> > >>> adrenodoxin oxidoreductase, mitochondrial OS=Drosophila > >>> > >>> melanogaster GN=dare PE=2 SV=1 > >>> > >>> Why is that not so? > >>> > >>> Thanks for any hint. > >>> > >>> Sincerely, Joh > >>> > >>> read.table( > >>> "/tmp/testfile.txt", > >>> sep=":", > >>> header=FALSE, > >>> quote="", > >>> fill=TRUE > >>> )[19,] > >> > >> --- > >> Sarah Goslee > >> http://www.functionaldiversity.org > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table mystery
Opted for a solution with 100 column names, which is unlikely to be met ... Thanks for your guidance. Joh On Sunday 06 March 2011 20:57:11 Sarah Goslee wrote: > You could pre-process your data into a more sensible format. > Or you could use scan to read each line of the file, count the number of > colons, then use read.table with ncolons + 1 columns. > Or you could use read.table with many more columns than are ever going to > be in the data, then delete the empty ones. > Or you could use read.table to read everything in as a signle column, then > use strsplit() to split it at the colons. > > There are generally lots of ways to do things, but they vary in efficiency > both on the programming side and the execution side. For instance, the > lots of columns > solution is by far the easiest on the programmer, but is terribly > inefficient and > may fail completely for very large datasets. > > Sarah > > On Sun, Mar 6, 2011 at 12:47 PM, Johannes Graumann > > wrote: > > Thank you for pointing this out. This is really inconvenient as I do not > > know a priori how many and where those darn cases containing an > > additional (or more) ":" might be ... > > > > The seems to work, but will fail if there's a "1:sdfjhlfkh:2:adlkjf" > > somewhere (1 & 2 both integerable). > > > > na.exclude(as.integer(scan("/tmp/testfile.txt",sep=":",what="integer"))) > > > > More robust pointers anyone? > > > > Joh > > > > Sarah Goslee wrote: > >> Not so much a mystery. read.table() only looks at the first 5 lines when > >> decided how many columns your file has (as described in the Details > >> section of the help). > >> > >> The easiest solution is to add a col.names argument to read.table() with > >> the correct number of names. > >> > >> You may want to also include as.is=TRUE if you don't want your data to > >> be imported as factors. If you expect character but have factor you may > >> get unexpected results later. > >> > >> Sarah > >> > >> On Sun, Mar 6, 2011 at 5:04 AM, Johannes Graumann > >> > >> wrote: > >>> Hello, > >>> > >>> > >>> Please have a look at the code below, which I use to read in the > >>> attached file. As line 18 of the file reads > >>> "1065:>sp|Q9V3T9|ADRO_DROME NADPH:adrenodoxin oxidoreductase, > >>> mitochondrial OS=Drosophila > >>> melanogaster GN=dare PE=2 SV=1", I expect the code below to produce a 3 > >>> column data frame with most of the last column empty and line 18 to > >>> produce a data.frame row like so: > >>> > >>> V1 > >>>1065 > >>> V2 > >>>>sp|Q9V3T9|ADRO_DROME NADPH > >>> V3 > >>>adrenodoxin oxidoreductase, mitochondrial OS=Drosophila > >>> melanogaster GN=dare PE=2 SV=1 > >>> > >>> Why is that not so? > >>> > >>> Thanks for any hint. > >>> > >>> Sincerely, Joh > >>> > >>> read.table( > >>> "/tmp/testfile.txt", > >>> sep=":", > >>> header=FALSE, > >>> quote="", > >>> fill=TRUE > >>> )[19,] signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] read.table mystery
Thank you for pointing this out. This is really inconvenient as I do not know a priori how many and where those darn cases containing an additional (or more) ":" might be ... The seems to work, but will fail if there's a "1:sdfjhlfkh:2:adlkjf" somewhere (1 & 2 both integerable). na.exclude(as.integer(scan("/tmp/testfile.txt",sep=":",what="integer"))) More robust pointers anyone? Joh Sarah Goslee wrote: > Not so much a mystery. read.table() only looks at the first 5 lines when > decided how many columns your file has (as described in the Details > section of the help). > > The easiest solution is to add a col.names argument to read.table() with > the correct number of names. > > You may want to also include as.is=TRUE if you don't want your data to > be imported as factors. If you expect character but have factor you may > get unexpected results later. > > Sarah > > On Sun, Mar 6, 2011 at 5:04 AM, Johannes Graumann > wrote: >> Hello, >> >> Please have a look at the code below, which I use to read in the attached >> file. As line 18 of the file reads "1065:>sp|Q9V3T9|ADRO_DROME >> NADPH:adrenodoxin oxidoreductase, mitochondrial OS=Drosophila >> melanogaster GN=dare PE=2 SV=1", I expect the code below to produce a 3 >> column data frame with most of the last column empty and line 18 to >> produce a data.frame row like so: >> >> V1 >>1065 >> V2 >>>sp|Q9V3T9|ADRO_DROME NADPH >> V3 >>adrenodoxin oxidoreductase, mitochondrial OS=Drosophila >> melanogaster GN=dare PE=2 SV=1 >> >> Why is that not so? >> >> Thanks for any hint. >> >> Sincerely, Joh >> >> read.table( >> "/tmp/testfile.txt", >> sep=":", >> header=FALSE, >> quote="", >> fill=TRUE >> )[19,] > > --- > Sarah Goslee > http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table mystery
Hello, Please have a look at the code below, which I use to read in the attached file. As line 18 of the file reads "1065:>sp|Q9V3T9|ADRO_DROME NADPH:adrenodoxin oxidoreductase, mitochondrial OS=Drosophila melanogaster GN=dare PE=2 SV=1", I expect the code below to produce a 3 column data frame with most of the last column empty and line 18 to produce a data.frame row like so: V1 1065 V2 >sp|Q9V3T9|ADRO_DROME NADPH V3 adrenodoxin oxidoreductase, mitochondrial OS=Drosophila melanogaster GN=dare PE=2 SV=1 Why is that not so? Thanks for any hint. Sincerely, Joh read.table( "/tmp/testfile.txt", sep=":", header=FALSE, quote="", fill=TRUE )[19,]0:>sp|Q7K2G1|ADRM1_DROME Proteasomal ubiquitin receptor ADRM1 homolog OS=Drosophila melanogaster GN=CG13349 PE=1 SV=1 116:MFGRQSGLGSSSNSSNLVEFRAGRMNMVGKMVHPDPRKGLVYMTQSDDGLMHFCWKDRTS 177:GKVEDDLIVFPDDFEYKRVDQCKTGRVYVLKFKSSTRRMFFWMQEPKTDKDDEQCRRINE 238:LLNNPPSAHQRSNDGDLQYMLNNMSQQQLMQLFGGVGQMGGLSSLLGQMNSRTPSS 299:RNTSSSASALQTPENVSVPRTPSAPSKSGSSRSSSNVNSQVGEGAGSSVDADAPGR 360:SLNIDLSTALPGADAINQIIADPEHVKTLIVHLPESEDVDDDRKQQIKDNITSPQFQQAL 421:AQFSSALQSAQLGPVIKQFELSNEAVAAAFSGNLEDFVRALEKSLPPGATMGGKPSASEK 482:KASDPETPTSVARDENTDPATEKQEEKQK 512:>sp|Q7K2G1-2|ADRM1_DROME Isoform 2 of Proteasomal ubiquitin receptor ADRM1 homolog OS=Drosophila melanogaster GN=CG13349 633:MFGRQSGLGSSSNSSNLVEFRAGRMNMVGKMVHPDPRKGLVYMTQSDDGLMHFCWKDRTS 694:GKVEDDLIVFPDDFEYKRVDQCKTGRVYVLKFKSSTRRMFFWMQEPKTDKDDEQCRRINE 755:LLNNPPSAHQRSNDGDLQYMLNNMSQQQLMQLFGGVGQMGGLSSLLGQMNSRTPSS 816:RNTSSSASALQTPENVSVPRTPSAPSKSGSSRSSSNVNSQVGEGAGSSVDADAPGK 877:NSTTSTTTASKSTGAYANPFQAYLSNLSPEHGAGRSLNIDLSTALPGADAINQIIADPEH 938:VKTLIVHLPESEDVDDDRKQQIKDNITSPQFQQALAQFSSALQSAQLGPVIKQFELSNEA 999:VAAAFSGNLEDFVRALEKSLPPGATMGGKPSASEKKASDPETPTSVARDENTDPATEKQE 1060:EKQK 1065:>sp|Q9V3T9|ADRO_DROME NADPH:adrenodoxin oxidoreductase, mitochondrial OS=Drosophila melanogaster GN=dare PE=2 SV=1 1180:MGINCLNIFRRGLHTSSARLQVIQSTTPTKRICIVGAGPAGFYAAQLILKQLDNCVVDVV 1241:EKLPVPFGLVRFGVAPDHPEVKNVINTFTKTAEHPRLRYFGNISLGTDVSLRELRDRYHA 1302:VLLTYGADQDRQLELENEQLDNVISARKFVAWYNGLPGAENLAPDLSGRDVTIVGQGNVA 1363:VDVARMLLSPLDALKTTDTTEYALEALSCSQVERVHLVGRRGPLQAAFTIKELREMLKLP 1424:NVDTRWRTEDFSGIDMQLDKLQRPRKRLTELMLKSLKEQGRISGSKQFLPIFLRAPKAIA 1485:PGEMEFSVTELQQEAAVPTSSTERLPSHLILRSIGYKSSCVDTGINFDTRRGRVHNINGR 1546:ILKDDATGEVDPGLYVAGWLGTGPTGVIVTTMNGAFAVAKTICDDINTNALDTSSVKPGY 1607:DADGKRVVTWDGWQRINDFESAAGKAKGKPREKIVSIEEMLRVAGV 1654:>sp|Q26365|ADT_DROME ADP,ATP carrier protein OS=Drosophila melanogaster GN=sesB PE=2 SV=4 1744:MGNISASITSQSKMGKDFDAVGFVKDFAAGGISAAVSKTAVAPIERVKLLLQVQHISKQI 1805:SPDKQYKGMVDCFIRIPKEQGFSSFWRGNLANVIRYFPTQALNFAFKDKYKQVFLGGVDK 1866:NTQFWRYFAGNLASGGAAGATSLCFVYPLDFARTRLAADTGKGGQREFTGLGNCLTKIFK 1927:SDGIVGLYRGFGVSVQGIIIYRAAYFGFYDTARGMLPDPKNTPIYISWAIAQVVTTVAGI 1988:VSYPFDTVRRRMMMQSGRKATEVIYKNTLHCWATIAKQEGTGAFFKGAFSNILRGTGGAF 2049:VLVLYDEIKKVL 2062:>sp|Q26365-2|ADT_DROME Isoform A of ADP,ATP carrier protein OS=Drosophila melanogaster GN=sesB 2157:MGKDFDAVGFVKDFAAGGISAAVSKTAVAPIERVKLLLQVQHISKQISPDKQYKGMVDCF 2218:IRIPKEQGFSSFWRGNLANVIRYFPTQALNFAFKDKYKQVFLGGVDKNTQFWRYFAGNLA 2279:SGGAAGATSLCFVYPLDFARTRLAADTGKGGQREFTGLGNCLTKIFKSDGIVGLYRGFGV 2340:SVQGIIIYRAAYFGFYDTARGMLPDPKNTPIYISWAIAQVVTTVAGIVSYPFDTVRRRMM 2401:MQSGRKATEVIYKNTLHCWATIAKQEGTGAFFKGAFSNILRGTGGAFVLVLYDEIKKVL 2461:>sp|P37193|ADXH_DROME Adrenodoxin-like protein, mitochondrial OS=Drosophila melanogaster GN=Fdxh PE=2 SV=3 2568:MFCLLLRRSAVHNSCKLISKQIAKPAFYTPHNALHTTIPRRHGEFEWQDPKSTDEIVNIT 2629:YVDKDGKRTKVQGKVGDNVLYLAHRHGIEMEGACEASLACTTCHVYVQHDYLQKLKEAEE 2690:QEDDLLDMAPFLRENSRLGCQILLDKSMEGMELELPKATRNFYVDGHKPKPH 2743:>sp|P39413|AEF1_DROME Adult enhancer factor 1 OS=Drosophila melanogaster GN=Aef1 PE=1 SV=1 2834:MMHIKSLPHAHAAATAMSSNCDIVIVAAQPQTTIANETVTQATHPAHMAAVQ 2895:HHQQSSGPPSVTELPLPFQMHLSGISAEAHSAAQAAAMAAAQAA 2956:AAQEQQQTSHLTHLTTHSPTTIHSEHYLANGHSEHPGEGNAAVGVGGAVREP 3017:EKPFHCTVCDRRFRQLSTLTNHVKIHTGEKPYKCNVCDKTFRQSSTLTNHLKIHTGEKPY 3078:NCNFCPKHFRQLSTLANHVKIHTGEKPFECVICKKQFRQSSTLNNHIKIHVMDKVYVPVK 3139:IKTEEDEG __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding NAs in DF
Thank you very much got something running now based on this. Joh jim holtman wrote: > building on the previous responses, does this give you what you want: > >> x >A B > 1 1 1 > 2 2 NA > 3 NA NA > 4 NA 4 >> # determine where the NAs are >> row.na <- apply(x, 1, is.na) >> # now convert to list of columns with NAs >> apply(row.na, 2, function(a) paste(colnames(x)[a], collapse = ',')) > [1] """B" "A,B" "A" >> >> > > > On Mon, Jan 17, 2011 at 5:01 AM, Johannes Graumann > wrote: >> Hi, >> >> What is an efficient way to take this DF >> >> data.frame(A=c(1,2,NA,NA),B=c(1,NA,NA,4)) >> >> and get >> c(NA,"TWO","BOTH","ONE") >> >> as the result, where NA corresponds to a row without "NA"s, TWO indicates >> NA in the second and ONE in the first column. >> >> Thanks for any pointers. >> >> Joh >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding NAs in DF
Both versions do not do what I am looking for, as they do not differentiate where the NA is, if there is just one. My original wished for result therefore holts, but should probably be rewritten c(NA,"B","AB","A") Joh On Monday 17 January 2011 14:06:30 Patrick Burns wrote: > Simpler would be: > > rowSums(is.na(df)) > > On 17/01/2011 10:13, Ivan Calandra wrote: > > Hi, > > > > I hope you made a mistake in c(NA,"TWO","BOTH","ONE") because if not, I > > have no idea what you're looking for... > > > > But would that do? > > df <- data.frame(A=c(1,2,NA,NA),B=c(1,NA,NA,4)) > > apply(df,1, FUN=function(x) length(x[is.na(x)])) > > [1] 0 1 2 1 > > > > There might be better ways to do it, but it works > > HTH, > > Ivan > > > > Le 1/17/2011 11:01, Johannes Graumann a écrit : > >> Hi, > >> > >> What is an efficient way to take this DF > >> > >> data.frame(A=c(1,2,NA,NA),B=c(1,NA,NA,4)) > >> > >> and get > >> c(NA,"TWO","BOTH","ONE") > >> > >> as the result, where NA corresponds to a row without "NA"s, TWO > >> indicates NA > >> in the second and ONE in the first column. > >> > >> Thanks for any pointers. > >> > >> Joh > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding NAs in DF
Hi, What is an efficient way to take this DF data.frame(A=c(1,2,NA,NA),B=c(1,NA,NA,4)) and get c(NA,"TWO","BOTH","ONE") as the result, where NA corresponds to a row without "NA"s, TWO indicates NA in the second and ONE in the first column. Thanks for any pointers. Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [SOLVED] Re: Install Error
Johannes Graumann wrote: > Johannes Graumann wrote: > >> Hi, >> >> I'm running into the error below when doing "R CMD INSTALL >> MyPackage.tar.gz". This didn't use to be this way and I am at a loss as >> to where this might be coming from. Any pointers where to look? >> >> Joh >> >> ** building package indices ... >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> na.strings, >> : >> line 1 did not have 8 elements >> ERROR: installing package indices failed > > I was working on new functionality and had a no-zipped data file in the > "data" directory ... chance finding in google put me on that track ... > Works again. > > Joh Wishlist item here https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14426 Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [SOLVED] Re: Install Error
Johannes Graumann wrote: > Hi, > > I'm running into the error below when doing "R CMD INSTALL > MyPackage.tar.gz". This didn't use to be this way and I am at a loss as to > where this might be coming from. Any pointers where to look? > > Joh > > ** building package indices ... > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 1 did not have 8 elements > ERROR: installing package indices failed I was working on new functionality and had a no-zipped data file in the "data" directory ... chance finding in google put me on that track ... Works again. Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Install Error
Duncan Murdoch wrote: > On 28/10/2010 7:54 AM, Johannes Graumann wrote: >> Hi, >> >> I'm running into the error below when doing "R CMD INSTALL >> MyPackage.tar.gz". This didn't use to be this way and I am at a loss as >> to where this might be coming from. Any pointers where to look? >> >> Joh >> >> ** building package indices ... >> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, >> na.strings, >> : >>line 1 did not have 8 elements >> ERROR: installing package indices failed >> > > You may get more informative error information if you do the install > from within R. Supposing you've used setwd() to go to the directory > where your package lives, try > > install.packages("MyPackage.tar.gz", repos=NULL, type="source") > > If that fails, then traceback() will tell you where the failure happened. > > Duncan Murdoch Thanks for the pointer, but there doesn't seem to be more information ... > install.packages("MyPackage.tar.gz", repos=NULL, type="source") Installing package(s) into ‘/usr/local/lib/R/site-library’ (as ‘lib’ is unspecified) * installing *source* package ‘MyPackage’ ... ** R ** data ** demo ** exec ** inst ** preparing package for lazy loading ** help *** installing help indices ** building package indices ... Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 8 elements ERROR: installing package indices failed * removing ‘/usr/local/lib/R/site-library/MyPackage’ * restoring previous ‘/usr/local/lib/R/site-library/MyPackage’ Warning message: In install.packages("MyPackage.tar.gz", repos = NULL, type = "source") : installation of package 'MyPackage.tar.gz' had non-zero exit status > traceback() No traceback available __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Install Error
Hi, I'm running into the error below when doing "R CMD INSTALL MyPackage.tar.gz". This didn't use to be this way and I am at a loss as to where this might be coming from. Any pointers where to look? Joh ** building package indices ... Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 1 did not have 8 elements ERROR: installing package indices failed __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Function execution on package load?
Hi, Can anyone give me a pointer on howto make a package execute a function at loading? Following an older post (http://bit.ly/cS1Go4), I'd like to do something along the lines of > .localstuff <- new.env() > .localstuff$OftenUsedData <- read.csv(...) upon loading the package ... Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting a DF into rows according to a column
Stupid Joh wants to give you a big hug! Thanks! Why "rank" works but "order" not, I have still to figure out, though ... Joh On Monday 04 October 2010 17:30:32 peter dalgaard wrote: > On Oct 4, 2010, at 16:57 , Johannes Graumann wrote: > > Hi, > > > > I'm turning my wheels on this and keep coming around to the same wrong > > solution - please have a look and give a hand ... > > > > The premise is: a DF like so > > > >> loremIpsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. > > > > Quisque leo ipsum, ultricies scelerisque volutpat non, volutpat et nulla. > > Curabitur consequat ullamcorper tellus id imperdiet. Duis semper > > malesuada nulla, blandit lobortis diam fringilla at. Vestibulum nec > > tellus orci, eu sollicitudin quam. Phasellus sit amet enim diam. > > Phasellus mattis hendrerit varius. Curabitur ut tristique enim. Lorem > > ipsum dolor sit amet, consectetur adipiscing elit. Sed convallis, tortor > > id vehicula facilisis, nunc justo facilisis tellus, sed eleifend nisi > > lacus id purus. Maecenas tempus sollicitudin libero, molestie laoreet > > metus dapibus eu. Mauris justo ante, mattis et pulvinar a, varius > > pretium eros. Curabitur fringilla dui ac dui rutrum pretium. Donec sed > > magna adipiscing nisi accumsan congue sed ac est. Vivamus lorem urna, > > tristique quis accumsan quis, ullamcorper aliquet velit." > > > >> tmpDF <- data.frame(Column1=rep(unlist(strsplit(loremIpsum," > > > > ")),length.out=510),Column2=runif(510,min=0,max=1e8)) > > > > is to be split into DFs with 50 entries in an ordered manner according to > > column2 (first DF ist o contain the rows with the 50 largest numbers, > > ...). > > > > Here is what I have been doing: > >> binSize <- 50 > >> splitMembership <- > > > > pmin(ceiling(order(tmpDF[["Column2"]],decreasing=TRUE)/binSize),floor(nro > > w(tmpDF)/binSize)) > > > >> splitList <- split(tmpDF,splitMembership) > > > > Distribution seems to work ... > > > >> sapply(splitList,nrow) > > > > But this is NOT what I wanted ... > > > >> sapply(splitList,function(x){max(x[["Column2"]])}) > > > > This was supposed to give me bins that are Column2-sorted and bin one > > should have a higher max than 2 than 3 ... > > > > Can anyone point out where (my now 3 reimplementations) fail? > > > > Thanks, Stupid Joh > > Dear Stupid Joh, > > Have you considered something along the lines of > > o <- order(-x$Column2) > xx <- x[o,] > split(xx, (seq_len(NROW(x))-1) %/% 50) > > The above is a bit hard to follow, but it seems to work better with rank() instead of order(): > > splitMembership <- > > + > pmin(ceiling(rank(-tmpDF[["Column2"]])/binSize),floor(nrow(tmpDF)/binSize) > ) > > > splitList <- split(tmpDF,splitMembership)> sapply(splitList,nrow) > > 1 2 3 4 5 6 7 8 9 10 > 50 50 50 50 50 50 50 50 50 60 > > > sapply(splitList,function(x){max(x[["Column2"]])}) > >123456 > 99877498 90567877 81965382 69112280 59814266 52130373 >789 10 > 41557660 32630212 21226996 11880032 signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Splitting a DF into rows according to a column
Hi, I'm turning my wheels on this and keep coming around to the same wrong solution - please have a look and give a hand ... The premise is: a DF like so > loremIpsum <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque leo ipsum, ultricies scelerisque volutpat non, volutpat et nulla. Curabitur consequat ullamcorper tellus id imperdiet. Duis semper malesuada nulla, blandit lobortis diam fringilla at. Vestibulum nec tellus orci, eu sollicitudin quam. Phasellus sit amet enim diam. Phasellus mattis hendrerit varius. Curabitur ut tristique enim. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed convallis, tortor id vehicula facilisis, nunc justo facilisis tellus, sed eleifend nisi lacus id purus. Maecenas tempus sollicitudin libero, molestie laoreet metus dapibus eu. Mauris justo ante, mattis et pulvinar a, varius pretium eros. Curabitur fringilla dui ac dui rutrum pretium. Donec sed magna adipiscing nisi accumsan congue sed ac est. Vivamus lorem urna, tristique quis accumsan quis, ullamcorper aliquet velit." > tmpDF <- data.frame(Column1=rep(unlist(strsplit(loremIpsum," ")),length.out=510),Column2=runif(510,min=0,max=1e8)) is to be split into DFs with 50 entries in an ordered manner according to column2 (first DF ist o contain the rows with the 50 largest numbers, ...). Here is what I have been doing: > binSize <- 50 > splitMembership <- pmin(ceiling(order(tmpDF[["Column2"]],decreasing=TRUE)/binSize),floor(nrow(tmpDF)/binSize)) > splitList <- split(tmpDF,splitMembership) Distribution seems to work ... > sapply(splitList,nrow) But this is NOT what I wanted ... > sapply(splitList,function(x){max(x[["Column2"]])}) This was supposed to give me bins that are Column2-sorted and bin one should have a higher max than 2 than 3 ... Can anyone point out where (my now 3 reimplementations) fail? Thanks, Stupid Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] choose.dir() gone?
OK. Just checked and "choose.file"/"choose.dir" exists in the windows version - apparently not in the linux one ... does anybody have a nice platform-agnostic solution for this? Thanks, Joh Johannes Graumann wrote: > Hi, > > I fail to find "choose.dir()" in my current R install (see below)? Didn't > that exist at some point? How to achieve "file.choose()" equivalent > functionality for directories? > > Thanks for any hints, Joh > >> sessionInfo() > R version 2.11.1 (2010-05-31) > x86_64-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 > [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 > [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rkward_0.5.3 > > loaded via a namespace (and not attached): > [1] tools_2.11.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] choose.dir() gone?
Hi, I fail to find "choose.dir()" in my current R install (see below)? Didn't that exist at some point? How to achieve "file.choose()" equivalent functionality for directories? Thanks for any hints, Joh > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rkward_0.5.3 loaded via a namespace (and not attached): [1] tools_2.11.1 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getNodeSet - what am I doing wrong?
Sorry about that - got dropped from my attempts yesterday (see the first example below, that has the "useInternalNodes=TRUE") ... Thanks again, Joh Duncan Temple Lang wrote: > Johannes Graumann wrote: >> Thanks! >> but: >> > library(XML) >> > xmlDoc <- xmlTreeParse("http://www.unimod.org/xml/unimod_tables.xml";) > > You need to xmlParse() or xmlTreeParse(url, useInternalNodes = TRUE) > (which are equivalent) in order to be able to use getNodeSet(). > > The error you are getting is because you are using xmlTreeParse() > and the result is a tree represented in R rather than internal > C-level data structures on which getNodeSet() can operate. > > xmlParse() is faster than xmlTreeParse() > and one can use XPath to query it. > > D. > >> > getNodeSet(xmlDoc,"//x:modifications_row", "x") >> Error in function (classes, fdef, mtable) : >> unable to find an inherited method for function "saveXML", for >> signature >> "XMLDocument" >> >> ? >> >> Thanks, Joh >> >> >> Duncan Temple Lang wrote: >> >> > >> > Hi Johannes >> > >> > This is a common issue. The document has a default XML namespace, >> > e.g. >> > the root node is defined as >> > >> > http://www.unimod.org/xmlns/schema/unimod_tables_1";...> >> >. >> > >> > So you need to specify which namespace to match in the XPath >> > expression >> > in getNodeSet(). The XML package provides a "convenient" facility for >> > this. You need only specify the prefix such as "x" and that will >> > be bound to the default namespace. You need to specify this in >> > two places - where you use it in the XPath expression and >> > in the namespaces argument of getNodeSet() >> > >> > So >> >getNodeSet(test, "//x:modifications_row", "x") >> > >> > gives you probably what you want. >> > >> > D. >> > >> > >> > >> > On 8/30/10 8:02 AM, Johannes Graumann wrote: >> >> library(XML) >> >>> test <- xmlTreeParse( >> >>> "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE) >> >>> getNodeSet(test,"//modifications_row") >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] getNodeSet - what am I doing wrong?
Thanks! but: > library(XML) > xmlDoc <- xmlTreeParse("http://www.unimod.org/xml/unimod_tables.xml";) > getNodeSet(xmlDoc,"//x:modifications_row", "x") Error in function (classes, fdef, mtable) : unable to find an inherited method for function "saveXML", for signature "XMLDocument" ? Thanks, Joh Duncan Temple Lang wrote: > > Hi Johannes > > This is a common issue. The document has a default XML namespace, e.g. > the root node is defined as > > http://www.unimod.org/xmlns/schema/unimod_tables_1";...> >. > > So you need to specify which namespace to match in the XPath expression > in getNodeSet(). The XML package provides a "convenient" facility for > this. You need only specify the prefix such as "x" and that will > be bound to the default namespace. You need to specify this in > two places - where you use it in the XPath expression and > in the namespaces argument of getNodeSet() > > So > getNodeSet(test, "//x:modifications_row", "x") > > gives you probably what you want. > > D. > > > > On 8/30/10 8:02 AM, Johannes Graumann wrote: >> library(XML) >>> test <- xmlTreeParse( >>> "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE) >>> getNodeSet(test,"//modifications_row") __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] getNodeSet - what am I doing wrong?
Hi, Why is the following retuning a nodset of length 0: > library(XML) > test <- xmlTreeParse( > "http://www.unimod.org/xml/unimod_tables.xml",useInternalNodes=TRUE) > getNodeSet(test,"//modifications_row") Thanks for any hint. Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grid.table and expression in table body?
Hi - I can't get this figured out ... Thanks for any hint. Joh > load("/tmp/AbsoluteTable.Rdata") > absolutetable > library(gridExtra) > grid.table(absoluteTable)#Works > grid.table(absoluteTable,parse=TRUE) Error in parse(text = d[ii]) : unexpected symbol in "Survey Scans" > sessionInfo() R version 2.11.1 (2010-05-31) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] gridExtra_0.7 loaded via a namespace (and not attached): [1] tools_2.11.1 On Monday 09 August 2010 12:02:03 baptiste auguie wrote: > I just uploaded version 0.7 on googlecode. I had inadvertently messed > up the previous attempt (uploaded an older version from another > computer). Fingers crossed it should build on r-forge in the next few > days. > > baptiste > > On 6 August 2010 17:11, Johannes Graumann wrote: > > I updated the package from r-forge, but despite the fact that > > "grid.table" does not complain about the "parse" option if given, your > > example below is not being printed as a parsed expression. > > > > How can I check the actual version of the installed/loaded gridExtra > > package? > > > > Thanks, Joh > > > > On Wednesday 04 August 2010 16:47:12 you wrote: > >> I added a parse argument to grid.table so that when switched to TRUE > >> (default FALSE) all the text strings are interpreted as expressions > >> (inspired by ggplot2::geom_text), > >> > >> d <- data.frame("alpha", "beta") > >> grid.table(d, parse=T) > >> > >> you'll need revision 258 of gridExtra for this to work (googlecode now, > >> r-forge in the following days, CRAN in the next stable version). > >> > >> HTH, > >> > >> baptiste > >> > >> On Aug 4, 2010, at 9:56 AM, Johannes Graumann wrote: > >> > Hi Baptiste, > >> > > >> > This is, I fear a bit beyond my level of competency ... What I want to > >> > be able to do is things like put "<2.2%*%10^{-16}" in a table cell, > >> > who's name I can already set to "p[Wilcoxon]" ... > >> > > >> > Joh > >> > > >> > On Wednesday 04 August 2010 09:15:43 you wrote: > >> >> Hi, > >> >> > >> >> I don't know the answer to your question (how to make a data.frame > >> >> with expressions), but if you have a list of expressions you could > >> >> try the following, > >> >> > >> >> http://code.google.com/p/gridextra/wiki/testExpressions > >> >> > >> >> I'm open to suggestions for your original query (what is the best way > >> >> to do it – parse each string and coerce it as an expression?) > >> >> > >> >> HTH, > >> >> > >> >> baptiste > >> >> > >> >> On Aug 4, 2010, at 12:05 AM, Johannes Graumann wrote: > >> >>> Hi, > >> >>> > >> >>> Is there any way to get an expression into a data.frame, such that > >> >>> "grid.table" from "gridExtra" will plot it evaluated in the table > >> >>> body? The docu does it for the header, but is the body possible? > >> >>> > >> >>> Thanks, Joh > >> >>> > >> >>> __ > >> >>> R-help@r-project.org mailing list > >> >>> https://stat.ethz.ch/mailman/listinfo/r-help > >> >>> PLEASE do read the posting guide > >> >>> http://www.R-project.org/posting-guide.html and provide commented, > >> >>> minimal, self-contained, reproducible code. signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Relation 1.5*IQR/Percentile in case of a normal Distribution
On Saturday 14 August 2010 23:08:31 Peter Dalgaard wrote: > Johannes Graumann wrote: > > Hi, > > > > can someone point me at material to understand how in > > http://upload.wikimedia.org/wikipedia/commons/8/89/Boxplot_vs_PDF.png the > > "fivenum"-corresponding percentages might be calculated? > > Looks like a pretty straightforward application of pnorm() and qnorm(). > > pnorm(4*qnorm(.75), lower=F) # Q3 + 1.5 IQR = 4 Q3 since IQR = 2 Q3 > > [1] 0.003488302 > > gives the tail probabilities of .35%, and the rest is by definition. Thanks a lot! Joh signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Relation 1.5*IQR/Percentile in case of a normal Distribution
Hi, can someone point me at material to understand how in http://upload.wikimedia.org/wikipedia/commons/8/89/Boxplot_vs_PDF.png the "fivenum"-corresponding percentages might be calculated? Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grid.table and expression in table body?
Great! I will give it a try ASAP! Thanks! Joh On Wednesday 04 August 2010 16:47:12 baptiste Auguié wrote: > I added a parse argument to grid.table so that when switched to TRUE > (default FALSE) all the text strings are interpreted as expressions > (inspired by ggplot2::geom_text), > > d <- data.frame("alpha", "beta") > grid.table(d, parse=T) > > you'll need revision 258 of gridExtra for this to work (googlecode now, > r-forge in the following days, CRAN in the next stable version). > > HTH, > > baptiste > > On Aug 4, 2010, at 9:56 AM, Johannes Graumann wrote: > > Hi Baptiste, > > > > This is, I fear a bit beyond my level of competency ... What I want to be > > able to do is things like put "<2.2%*%10^{-16}" in a table cell, who's > > name I can already set to "p[Wilcoxon]" ... > > > > Joh > > > > On Wednesday 04 August 2010 09:15:43 you wrote: > >> Hi, > >> > >> I don't know the answer to your question (how to make a data.frame with > >> expressions), but if you have a list of expressions you could try the > >> following, > >> > >> http://code.google.com/p/gridextra/wiki/testExpressions > >> > >> I'm open to suggestions for your original query (what is the best way to > >> do it – parse each string and coerce it as an expression?) > >> > >> HTH, > >> > >> baptiste > >> > >> On Aug 4, 2010, at 12:05 AM, Johannes Graumann wrote: > >>> Hi, > >>> > >>> Is there any way to get an expression into a data.frame, such that > >>> "grid.table" from "gridExtra" will plot it evaluated in the table body? > >>> The docu does it for the header, but is the body possible? > >>> > >>> Thanks, Joh > >>> > >>> __ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html and provide commented, > >>> minimal, self-contained, reproducible code. signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grid.table and expression in table body?
Hi, Is there any way to get an expression into a data.frame, such that "grid.table" from "gridExtra" will plot it evaluated in the table body? The docu does it for the header, but is the body possible? Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] List to data frame
Thanks a lot! This solves my problem! Joh On Monday 26 July 2010 17:06:37 Joshua Wiley wrote: > Hi, > > Here is another option if you already have a list you want to convert. > This will handle different elements of the list being different > lengths. > > #Using your example data > mydata <- list(c(1,2,3),c(4,5,6)) > > data.frame( > OriginalListIndex = rep(x = seq_along(mydata), >times = unlist(lapply(mydata, length))), > Item = unlist(mydata) > ) > > #Just to demonstrate that this method works generally > mydata <- list(c(1,2,3), c(7,6), c(3,4,5,6,7,8,9)) > > data.frame( > OriginalListIndex = rep(x = seq_along(mydata), >times = unlist(lapply(mydata, length))), > Item = unlist(mydata) > ) > > > HTH, > > Josh > > On Mon, Jul 26, 2010 at 7:46 AM, Johannes Graumann > > wrote: > > Hi, > > > > Any ideas on how to efficiently convert > > > >> list(c(1,2,3),c(4,5,6)) > > > > to > > > >> data.frame(OriginalListIndex=c(1,1,1,2,2,2),Item=c(1,2,3,4,5,6)) > > > > Thanks for any hints, > > > > Joh > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] List to data frame
Hi, Any ideas on how to efficiently convert > list(c(1,2,3),c(4,5,6)) to > data.frame(OriginalListIndex=c(1,1,1,2,2,2),Item=c(1,2,3,4,5,6)) Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Apply: Output matrix orientation
David Winsemius wrote: > > On May 27, 2010, at 7:24 AM, Johannes Graumann wrote: > >> Hi, >> >> Why is the result of below "apply" call rotated with respect to the >> input >> and how to remedy this? > > Because the processing you requested is with respect to rows and the > construction of matrices is by default by columns. > > ?t Thanks. t solved my problem without having to load another package. Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Apply: Output matrix orientation
Hi, Why is the result of below "apply" call rotated with respect to the input and how to remedy this? Thanks, Joh .ZScore <- function(input){ #cat(input,"\n") z <- (input - mean(input))/sd(input) return(z) } apply(data.frame(x1=c(1,2,3,4,5),x2=c(2,3,4,5,6),x3=c(3,4,5,6,7)),1,.ZScore) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] error bars on barplot
Jim Lemon wrote: > On 04/09/2010 08:55 PM, Samantha Reynolds wrote: >> Hi >> >> I was hoping someone might be able to help me I have this data: >> >> birdid timetaken numvisits ptachchoice time bold >> 1087 810 1 AM0 >> 108728 6 1 PM0 >> 108713 3 2 AM0 >> 1087 121 0 2 PM0 >> 1046 121 0 1 AM1 >> 1046 121 0 1 PM1 >> >> i've plotted the means like this: >> >> by(numvisits,patchchoice,summary) >> numvisits.means<- >> by(numvisits,list(time=time,patchchoice=patchchoice),mean) >> numvisits.means >> barplot(numvisits.means,xlab="Patch Choice",ylab="Number of >> Visits",col=c("red","darkblue"),beside=T,ylim=c(0,4)) >> labs<-c("AM","PM") >> legend(1.09,3.98,labs,fill=cols) >> >> and need to add error bars, but i'm unsure as to how to do this. >> > Hi Sam, > Perhaps I should submit a FAQ on this one. Try: > > bar.err (agricolae) > plotCI (gplots) > xYplot (Hmisc) > error.bars (psych) > dispersion (plotrix) > plotCI (plotrix) > > and there are probably others. > > Jim geom_errorbar (ggplot2) Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2, density barplot and geom_point layer
Thanks so much. Solved. Joh hadley wickham wrote: > Because of the way you've constructed the plot with qplot, you need to > use: > > myPlot + geom_point( > data=medians, > aes(x=med,shape=cut, y=0), > size=2.5, > ) > > Hadley > > On Wed, Apr 7, 2010 at 5:11 AM, Johannes Graumann > wrote: >> Hi, >> >> Please consider the example below. How can I manage to overlay the points >> the way I want in the second case? >> >> Thanks, Joh >> >> library(ggplot2) >> >> # Modify data to match "real" case >> myDiamonds <- diamonds >> myDiamonds[["clarity"]] <- as.character(myDiamonds[["clarity"]]) >> myDiamonds[myDiamonds[["clarity"]]=="I1","clarity"] <- 1 >> myDiamonds[myDiamonds[["clarity"]]=="SI2","clarity"] <- 2 >> myDiamonds[myDiamonds[["clarity"]]=="SI1","clarity"] <- 3 >> myDiamonds[myDiamonds[["clarity"]]=="VS2","clarity"] <- 4 >> myDiamonds[myDiamonds[["clarity"]]=="VS1","clarity"] <- 5 >> myDiamonds[myDiamonds[["clarity"]]=="VVS2","clarity"] <- 6 >> myDiamonds[myDiamonds[["clarity"]]=="VVS1","clarity"] <- 7 >> myDiamonds[myDiamonds[["clarity"]]=="IF","clarity"] <- 8 >> myDiamonds[["clarity"]] <- as.numeric(myDiamonds[["clarity"]]) >> >> # Calculate medians >> medians <- ddply( >> myDiamonds, >> .(cut), >> summarize, >> med=median(clarity, na.rm=TRUE) >> ) >> >> # Works >> myPlot <- qplot( >> factor(clarity), >> data=myDiamonds, >> fill=cut, >> geom="bar", >> position="dodge" >> ) >> >> myPlot + >> geom_point( >> data=medians, >> aes(x=med,shape=cut), >> y=0, >> size=2.5, >> ) >> >> # Doesn't work - I want density rather than count >> myPlot <-qplot( >> factor(clarity), >> y=..count../sum(..count..), >> data=myDiamonds, >> fill=cut, >> geom="bar", >> position="dodge" >> ) >> >> myPlot + >> geom_point( >> data=medians, >> aes(x=med,shape=cut), >> y=0, >> size=2.5, >> ) >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2, density barplot and geom_point layer
Hi, Please consider the example below. How can I manage to overlay the points the way I want in the second case? Thanks, Joh library(ggplot2) # Modify data to match "real" case myDiamonds <- diamonds myDiamonds[["clarity"]] <- as.character(myDiamonds[["clarity"]]) myDiamonds[myDiamonds[["clarity"]]=="I1","clarity"] <- 1 myDiamonds[myDiamonds[["clarity"]]=="SI2","clarity"] <- 2 myDiamonds[myDiamonds[["clarity"]]=="SI1","clarity"] <- 3 myDiamonds[myDiamonds[["clarity"]]=="VS2","clarity"] <- 4 myDiamonds[myDiamonds[["clarity"]]=="VS1","clarity"] <- 5 myDiamonds[myDiamonds[["clarity"]]=="VVS2","clarity"] <- 6 myDiamonds[myDiamonds[["clarity"]]=="VVS1","clarity"] <- 7 myDiamonds[myDiamonds[["clarity"]]=="IF","clarity"] <- 8 myDiamonds[["clarity"]] <- as.numeric(myDiamonds[["clarity"]]) # Calculate medians medians <- ddply( myDiamonds, .(cut), summarize, med=median(clarity, na.rm=TRUE) ) # Works myPlot <- qplot( factor(clarity), data=myDiamonds, fill=cut, geom="bar", position="dodge" ) myPlot + geom_point( data=medians, aes(x=med,shape=cut), y=0, size=2.5, ) # Doesn't work - I want density rather than count myPlot <-qplot( factor(clarity), y=..count../sum(..count..), data=myDiamonds, fill=cut, geom="bar", position="dodge" ) myPlot + geom_point( data=medians, aes(x=med,shape=cut), y=0, size=2.5, ) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: Adding points to a density plot
Thanks a lot! This got me started! Joh Dennis Murphy wrote: > Hi: > > Try this: > > library(ggplot2) > movmed <- ddply(movies, .(decade), summarise, med = median(rating)) > m + geom_point(data = movmed, aes(x = med), y = 0, size = 2) > > HTH, > Dennis > > On Wed, Mar 31, 2010 at 4:46 AM, Johannes Graumann > > wrote: > >> Hi, >> >> Consider something like >> > library(ggplot2) >> > movies$decade <- round_any(movies$year, 10) >> > m <- qplot(rating,data=movies,colour=factor(decade),geom="density") >> > m >> (modified from "?stat_density"). >> >> I'd like to add on the line y=0 a dot for the median of each "decade" >> category (using the same colour coding as the "fill"). I'm failing >> miserably >> at all my >> > m + geom_point >> based approaches and would appreciate if someone could show me the >> ggplot2 way of achieving this. >> >> Thanks, Joh >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: Adding points to a density plot
Hi, Consider something like > library(ggplot2) > movies$decade <- round_any(movies$year, 10) > m <- qplot(rating,data=movies,colour=factor(decade),geom="density") > m (modified from "?stat_density"). I'd like to add on the line y=0 a dot for the median of each "decade" category (using the same colour coding as the "fill"). I'm failing miserably at all my > m + geom_point based approaches and would appreciate if someone could show me the ggplot2 way of achieving this. Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: "varwidth"-equivalent for geom_boxplot?
Apologies. from the "boxplot" documentation: "... if varwidth is TRUE, the boxes are drawn with widths proportional to the square-roots of the number of observations in the groups." I find this option often very useful. Thanks for any insight into how to achieve this with geom_boxplot. Joh On Wednesday 10 March 2010 16:12:49 hadley wickham wrote: > What is varwidth? > > Hadley > > On Wed, Mar 10, 2010 at 1:55 PM, Johannes Graumann > > wrote: > > Hi, > > > > Is there such a thing? If no: is it easily simulated? > > > > thanks, Joh > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: "varwidth"-equivalent for geom_boxplot?
Hi, Is there such a thing? If no: is it easily simulated? thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: Changing colour scheme for bar plot filling?
Works. Thank you! Joh On Wednesday 10 March 2010 11:13:09 you wrote: > not with the theme, as far as I know, but you can do: > > set_default_scale("fill", "discrete","grey") > > baptiste > > On 10 March 2010 10:31, Johannes Graumann wrote: > > Indeed. Thank you. Is there a global switch analogous to > > "theme_set(theme_bw())"? > > > > thanks for your help, Joh > > > > On Wednesday 10 March 2010 10:29:05 baptiste auguie wrote: > >> Hi, > >> > >> last_plot() + scale_fill_grey() > >> > >> should do it > >> > >> HTH, > >> > >> baptiste > >> > >> On 10 March 2010 09:46, Johannes Graumann wrote: > >> > Hello, > >> > > >> > I'd like to sitch to a monochrome/bw color-palette for the filling of > >> > geom_bar-bars (produced via "qplot" as in the example below). Hours of > >> > googling didn't yield anything useful, so I thought, I'd just ask ... > >> > > >> > Thanks, Joh > >> > > >> > library(ggplot2) > >> > qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(cyl)) > >> > > >> > __ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html and provide commented, > >> > minimal, self-contained, reproducible code. > signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2: Changing colour scheme for bar plot filling?
Indeed. Thank you. Is there a global switch analogous to "theme_set(theme_bw())"? thanks for your help, Joh On Wednesday 10 March 2010 10:29:05 baptiste auguie wrote: > Hi, > > last_plot() + scale_fill_grey() > > should do it > > HTH, > > baptiste > > On 10 March 2010 09:46, Johannes Graumann wrote: > > Hello, > > > > I'd like to sitch to a monochrome/bw color-palette for the filling of > > geom_bar-bars (produced via "qplot" as in the example below). Hours of > > googling didn't yield anything useful, so I thought, I'd just ask ... > > > > Thanks, Joh > > > > library(ggplot2) > > qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(cyl)) > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2: Changing colour scheme for bar plot filling?
Hello, I'd like to sitch to a monochrome/bw color-palette for the filling of geom_bar-bars (produced via "qplot" as in the example below). Hours of googling didn't yield anything useful, so I thought, I'd just ask ... Thanks, Joh library(ggplot2) qplot(factor(cyl), data=mtcars, geom="bar", fill=factor(cyl)) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: barchart, error bars and grouped data
Thanks. I switched to ggplot2 which offers error bars. Joh Dieter Menne wrote: > > > Johannes wrote: >> >> >> How can I, given the code snippet below, draw the error bars in the >> center of each grouped bar rather than in the center of the group? >> > > http://markmail.org/message/oljgimkav2qcdyre > > Dieter > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice: barchart, error bars and grouped data
Hi, How can I, given the code snippet below, draw the error bars in the center of each grouped bar rather than in the center of the group? Thanks for any hints, Joh library(lattice) barley[["SD"]] <- 5 barchart( yield ~ variety | site, data = barley, groups=year, origin=0, lowDev=barley[["SD"]], highDev=barley[["SD"]], panel = function( x, y, ..., lowDev, highDev ){ panel.barchart(x, y, ...) panel.segments( as.numeric(x), as.numeric(y) - lowDev, as.numeric(x), as.numeric(y) + highDev, col = 'red', lwd = 2, ...) } ) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lattice: How to implement "varwidth" analogous to "graphics::boxplot" in "bwplot"?
Johannes Graumann wrote: > Has anybody solved this? For the benefit of others: after studying > ?panel.bwplot I have to admit that > bwplot(..., varwidth = TRUE) solves the issue. It's just not documented at > ?bwplot Cheers, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice/ylim: how to fix ylim[1], but have ylim[2] dynamically calculated?
Rolf Turner wrote: > > On 15/02/2010, at 9:40 AM, Johannes Graumann wrote > > > > (In response to some advice from David Winsemius): > >> I am quite certain that this is the most elaborately worded version of >> "RTFM" I have ever come across. > > > I nominate this as a fortune. (Despite Prof. Winsemius's later > protestation that his advice was *not* a version of RTFM.) Uh, oh ... certified notoriety? Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice/ylim: how to fix ylim[1], but have ylim[2] dynamically calculated?
Deepayan Sarkar wrote: > On Sun, Feb 14, 2010 at 7:33 AM, Johannes Graumann > wrote: >> Hello, >> >> When drawing "barcharts", I find it not helpful if ylim[1] != 0 - bars >> for a quantity of 0, that do not show a length of 0 are quite >> non-intuitive. >> >> I have tried to study >> > library(lattice) >> > panel.barchart >> but am unable to figure out where ylim is taken care of and how one might >> fix ylim[1] to 0 for barcharts ... >> >> Can anyone point out how to tackle this? > > Are you sure you are not looking for 'origin=0' (described in > ?panel.barchart)? I sure am - thank you! Following the same path for "bwplot" I found the embarrassingly simple answer to my earlier question regarding "varwidth" in ... Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lattice/ylim: how to fix ylim[1], but have ylim[2] dynamically calculated?
David Winsemius wrote: > > On Feb 14, 2010, at 10:33 AM, Johannes Graumann wrote: > >> Hello, >> >> When drawing "barcharts", I find it not helpful if ylim[1] != 0 - >> bars for a >> quantity of 0, that do not show a length of 0 are quite non-intuitive. >> >> I have tried to study >> > library(lattice) >> > panel.barchart >> but am unable to figure out where ylim is taken care of and how one >> might >> fix ylim[1] to 0 for barcharts ... >> >> Can anyone point out how to tackle this? > > Looking at Sarkar's "Lattice" text in chapter 8 section 3 "Limits and > Aspect Ratio", it appears from subsection 1 that the prepanel function > can used to supply values of xlim and ylim values. From subsection 2 > he clarifies that xlim and ylim can also be specified on a per panel > basis (and here I am guessing that this would be within a scales > argument) when relation="free". At the end of that section he offers > two examples using ylim: the first is not plotted but the second uses > the prepanel mechanism for Fig 8.1 and that is probably available on > the Lattice website. > > In the same subsection is offered an alternative to specifying an > explicit scales$y$limits to be interpreted as ylim values. > > My hope it that these ideas and references will be of some use in > identifying productive places to look for further documentation. I am quite certain that this is the most elaborately worded version of "RTFM" I have ever come across. I shall go and do so. Sincerely, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lattice/ylim: how to fix ylim[1], but have ylim[2] dynamically calculated?
Hello, When drawing "barcharts", I find it not helpful if ylim[1] != 0 - bars for a quantity of 0, that do not show a length of 0 are quite non-intuitive. I have tried to study > library(lattice) > panel.barchart but am unable to figure out where ylim is taken care of and how one might fix ylim[1] to 0 for barcharts ... Can anyone point out how to tackle this? Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice: How to implement "varwidth" analogous to "graphics::boxplot" in "bwplot"?
Hello, Has anybody solved this? Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] memDecompress and zlib compressed base64 encoded string
Prof Brian Ripley wrote: >> I have zlib compressed strings (example is attached) > > What is that file? Not gzip compression: > > gannet% file compressed.txt > compressed.txt: ASCII text, with very long lines > > since gzip uses a magic header that 'file' knows about. And even if > the header was stripped, such files are 8-bit and yours is ASCII. > Try >> x <- 'Johannes Graumann' >> xx <- charToRaw(x) >> xxx <- memCompress(xx, "g") >> rawToChar(xxx) > [1] "x\x9c\xf3\xca\xcfH\xcc\xcbK-Vp/J,\xcd\0052\001:\n\006\x90" > > to see what a real gzipped string looks like. > >> and would like to decompress them using memDecompress ... >> >> I try this: >>> connection <- file("compressed.txt","r") >>> compressed <- readLines(connection) I am dealing with mass spectrometric data in a XML file format (mzXML). The biggest part of the contained data is actual mass spectra that are base64 encoded and optionally compressed using http://zlib.net (saving quite some storage space). When they are compressed I just get an XML node that looks like this CONTENT OF THE ORIGINAL ATTACHMENT HERE I would like to be able to decompress that string and thought that memDecompress was the right tool to do so ... > You have not told us the 'at a minimum' information requested in the > posting guide. But you should not expect that to read a binary file, > especially not in a MBCS locale. We have readBin for that purpose. I'm actually reading this in as a string from the XML file ... >>> memDecompress(as.raw(compressed),type="g") > > I don't think you know what as.raw does: it does not convert bytes in > a character string to raw (for which you need charToRaw). > > It is always a good idea to look at each stage of your computation: > >> as.raw(compressed) > [1] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 > [26] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 Yup, that was plain stupid and trying to make memDecompress run at all (since handing it the character string also resulted in an error. > sessionInfo() R version 2.10.1 (2009-12-14) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rkward_0.5.1 loaded via a namespace (and not attached): [1] tools_2.10.1 Thanks for any further hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] memDecompress and zlib compressed base64 encoded string
Hi, I have zlib compressed strings (example is attached) and would like to decompress them using memDecompress ... I try this: > connection <- file("compressed.txt","r") > compressed <- readLines(connection) > memDecompress(as.raw(compressed),type="g") Error in memDecompress(as.raw(compressed), type = "g") : internal error -3 in memDecompress(2) In addition: Warning messages: 1: In memDecompress(as.raw(compressed), type = "g") : NAs introduced by coercion 2: In memDecompress(as.raw(compressed), type = "g") : out-of-range values treated as 0 in coercion to raw Can anyone nudge me into the right direction regarding this? Thanks, JoheJw03Hdcjt//B3CKtLRkk6KlbBnZOu/rLjKzs1f23qOMSLuE0t5DgyYqM6FCxgdlFj72+lofm9+L8/755/m4b+e6znWd64z3Oee6E6F1dOrgnwitY8s6s4ukGtfZj9IGxqwdO27wqHe1cA3NrtgM9ykNw3/DAsWl6SR4Q5nY7SH8pJiVZYhQzSZ0LOAj7EmzbZ3geHLp0gqupdlrvGEouVxdD4+Q+YkSWCWuGE+An2V+Wk3ZXuwEqYUDO5v1ZBPZM+Jl9mD4iJbdxGfLerTfVwua09Ls8ZBEtNEtOEemt9zJJrNnpTtGssvYIDabvSIWr1sK/0fb3eaL0J2GdNIvCXalbLdvcDSN3dgCrhTbts2FIfK4nbnsP+wHqdc+toC9wf4n3dVEjBxwEPaksJp3cIKi/rwHXKfobr0Bw6jwgDM8KrJykd+uaj7ui9S7OWsvDX4n3W3IdmVd2JVsCJ2fhee0O0+5ufpf+I9qwrgZ8IPK/zTyCzFWbQ2aDe1UZuEr4Vhlndo0uEZcuBYA+b5CDrM3Wb6vPV+le/n69vZhXdmNonIt6tPeSFq4cxgsJu9FRvAudWqPerj3h0PwdGMRuq+VTL+vPzuVdZdGXGRfSyMbsh3ZEQ43TqfBpdS56X4YSOEXV8NDFJ3VFF6m0aP/B9+Kg73ri9AoA3lcVBd2NLvir2GGnmwiW8r+KzWqz1pQu8guUEUPRFPoprR94Qd3Ka3Hm8BUulLPDpZR58w/n5+J/lmnRFgjLXl8o/bsUHah1F2btWWd2cVsAJtF2zWNYaVi22Q3fKNsPhIqwjz0lY3dlsMuSjuPYjiKRn9cDfn+PELYXPYfVvYfYQHTWA82lj3F1oqjvY+IsMC65KRSwba0/akbdKCxRV/gbOF70RbukOkDk9iz7BNp1GX2f9JoQ7Yr6yISPtTAVbRBLwTuoYKW8TCfDn6IhjdoqtUL+J/YbpwswmKayONierET2fXSWL6f2Bj2J! FsrjasrXrmfhWY04osRdKCVLlFwNjn3QT5xnuL+nemQ7yeO7yfusTReQ/qRn+NHPzaTvcS+Fkl+o0XYJz2yUvsIO9O2zivhKAqYtAiuoPEVfz7vFrfccb+f+Dl94uf0iZ/Tf43/ur9xDHuCrZH+LY4/mon1zYygA3l0+fN5Np2a6AA96VjzEzCJtt0Ph2fF0243oHw++5s2YK3ZIVKT79I2rdj+7DR2i6hKXANjaaH/fHiK/I4+gw9o7t58sd9UjdQKkmA7md6U2Lms7Nf3x1Szsp/ZH9uC7ctOEaX2B6A7+Y7fBmOo4NdteJKi9njBWhr6sovYj+f6c9NO2FYeFyfYOSznl7iZjWa5PBO5PJPqiJStuK8kM7L++As60LLi+3A2bZjwxx3UY2sgTBbuMyzhOT7uqTRZk5XtPtxiCRvIHmKvsO8dYq1biXDLRrR2mSm0U+pNrwvHKR0dB8G1Sqs7ETCMSoyGwULSr28Ab8vjLeVzCrdqxcrnFL4nhM1nb7CyXw/f21TcWLYN2lPvkeuhK82L14SbaNi0rTBa/DzQC57g9LJ8wvfVZWU5h8eUsrI+hcc2YK3ZoeJrVTFcRKeqL0N/xfK7Ch5URNo8eFkxTUuF7ygrtEyExxmJtDrtYHd5fNw4di0bxvL9x/H9x7dmB7DTxaOfeXAb7Y5Nggl0c95SeIaO/lsOH9Pg1toiPEFDjHY5B63kcQlD2IWsbN/hKQZsV9aFXc3uEzf8NsHDZOt3F1bTpA3P4Fdy6b1dhKe2oJaDcmE/kbHPDMrxLzzVg41jT0ufyTgr/Hlztg87hXUX57KtYCy17HwdnqKunSzhA7I+fUKEv1An/buh0Fymf6Fi57E+f40w6MAOZ5exwWyuePxAC/5D6xecgJ/oyrVYEWHYhH732gV7052LA+EkWu04Fm4SW1q+gVHyeMMTrKw/EUay/kSYrWBD2Hz2BvuZDLoMExFtm1EvvX9hH+o5yRpOJ! pNuNtCdtPN7Q9kvRrQ9xT6QtlNnzVmVuHnhLZxHwWcqoY9iFYn7apeuLD1uBC8q827OgK+ VVoFIZ65PQaN0YBdx8m0bKJ93hPkqdh8r46aIjifZWmknNbYdq4hCZQJ0I7N3S6E3KdP+gwfIrr0TvCA+nke+nWT8E9FZj+3CyvglYnUlK8fTiDWGbHd2LLuWBi8zh/uVxkcawUJl1WQNeEfZNm8f/KH0tDIUEWtNyK8S5bB2kIP6hy5wpjx+rSebzMp+LmLrafaRdFs91pJ1EoNdQuECavG+BfSjjQXzYSZtsHGFlWRe+Of7/4kTpuEiYjtf/3a+/u3jWNneI/beYX9K97VhB7GzxGXNcriDbAaoYDIFhznC83Rv7gb4XKn7Xl9EhGpT9e/BsAMtMl4JR4iDJ/pDru+hu9k8Vs63IsIN2G4sl2/4GjZM9DNNgYUUsuYVvK207IxyDv+hDNP4LiIiTBS7dpPgQCr/fhXOFFcN90Au34gklss3Qo4jEVG9WVd2ExvNnhTave/DWnKa+0JERKtRjLUXNKf9NdlQoZ77/3w/T1jbVEBu79Hp7EX2jTTOhh3GLmGD2BxqMOwuvEazzb7Aj7Q56aGIiG9C9rddYC+x61I0nCTTx/P1xnM/EM/9QALX70R9lut3IrevxNXif65n4D6af1EXHqHt7/XgLVoszsFv5FKB8k1qRf2PJ8EBpD1zAJwuj0/axiaycnyK+HGW5fL9qcXyff8cJl6svgyX0pDV72EwbU6fA3Np87pW8B8a554PP5FZMzsR8aupuKt6AOV8LeIXP6dfm1nZL0XWNWUd2NnsTjZVrJ0xDZaT/fDt8CXN6dRNRKrpkpXhcthJTHZ7CEfJ9Gor2b2snI9GNotnz7ByHhPZXIO1Fj9NLsCh1GLPYbiYJkXfhQEUfmA8PESHL1XDq3SgXgT8QNPblYjIFo1FyDdPKOPtyBYT2Y2sfL6Rm/qwU1gPNo49LX6mL4CPaHwazr+5Ps24qwetUJ6hcIjYmjsJyvWOyM0B7CH2qjT2CHuLleN/ZFxrdqBI0E2EMym7! 12voqdjYroXJyqjUVfC8YpvhBZ9TVvszIjJeR9jtOwflPDMyfiS7gpXjU+SB+awvm8lWsrI/jkw3ovVWTaAdfdB4C8crek1q4Xp68N8RGEHTNC3hMYeJT+PgfXlcRh22LSvj5SjrHPYf9pO0fVPWXlwwLIBTyHpFCPSgsb3+GEeqd9dhCen/+0dZH6JsNNj2rJx3Ru2pZOW6Q9TeRmwPdqLYGbwObqSuTZJgFC3ufBeepLEdi+AD8XbaTxG1T467UfssWCdWxlNRYTL+jNrfjO3LTmW3UB0jLxiPeGkrLKU+doPgE7K/0lZEhWtSzyUp0JZyLvaEwxXNtQvgMkUjbQgMoQOpETBffFu7Asr1jKhwOb+IimjJyjg5KqWK/SZNbcUOYGdSg6Sl0JO89C1hMh3aXAHLaN+0W/AF6Z1/I6LSdGX6tM7saFbGC9FWE9iNbBR7kn0gbhj9FNHW9UiJ1YWWND4hGw6hVuVT4CKR1Pk7lPOIaOts9h9W1oto98PsLVa2j2gPE3aQ+P7OB86mrTmR0IuCxv0xjUZeKoEXxNkgLSj76egtBmx3Vo7P0d4X2bdSH0PWjp1AOnf2ww20rjYFRtGut0HwBPWzmQMfCLeajSLaV9b7aN92rMLK9ha9T/bX0aHabAd2JLuStHIM4F7aYdEJHqH9/jbwNo35HQJ/iI49y0R0mIwbosMGs3PYXdKoSla24+joRmxPdqJIbpQIN5HNxSIYQ+Nmj4enyHLiTfhILFuF/GPkulF0jDXrzMrxNDqe7yvel81ir7DvxeW0HiI6oTE5P8HnhN7k7fQYutLmJi0g4t663jBOpk8oYeX6RHSinH/GJM5lvdkMtpJ95/Bz2lERk2RMk7RbwV6U1rsHdKW4F9ehOw29PAfGCZ+qMVCOKzFJcp4Xk6zFyucR22ADG8WeZB9KNetT3Tn7oTXt/VwDnRW1zolwmWLZsxUMUZq/mwQL6MxeC1gtPqA9x2rK! +hurZcIOZuXzi224nU1my9iXUj098XV5G9iNugYWwXE0pqUHXE8OLf4YKV5MvgFPcHo5f4 jVr8fK5xgX1pkdw65lI9gTVOfXWPiAIsZPFHH769OV3gtheyrqkAmHkWpOFFw+2Hj6VrhHHrf/CHuH/SUND2Cz2eusXC+Oi2gpUkeehwOpxc5KOBv9fVvoRR2+9Ibp4sKsIFjJ6WW/HhfZmJVxS7z5VVb2H/EWzdn+7Ezh5/AB7qCB3kUwjUI3lcGLFJzlBP9HbQdkinhLY4fbsR+hjFvjLSezW1gZj8VvK2bluBe/XY21
Re: [R] Method
myVector <- c(seq(10),23,35) length(myVector) myVector[length(myVector)] it's unclear to me which of the two you want ... HTH, Joh yonosoyelmejor wrote: > > Hello, i would like to ask you another question. Is exist anymethod to > vectors that tells me the last element?That is to say,I have a vector, I > want to return the position of last element. I hope having explained. > > A greeting, > Ignacio. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbol in the plot
plot(sigma, delta1, ylim=range(-0.5, 2), xlab='sigma', ylab='delta',pch=22, type='o') points(sigma, delta2, col='red', axes=FALSE, pch=1,type='o') legend("topleft",c(expression(Delta*1),expression(delta*2)),fill=TRUE,col=c("black","red"),pch=c(1,22)) See: >?plotmath Still, gimme a easily runnable example next time. HTH, Joh gcheer3 wrote: > > Joh, thank you very much. sorry for confusing you. I didn't make my > question > clear. I tried your code it looks much better than my original one. Just > I prefer I can write the greek letter delta1 and delta 2 instead of words > 'delta1' and 'delta2'. Also, it will be nice if there is a square symbol > next to delta1 and a circle symbol next to delta 2, since sometimes I have > to print the graph in a white and black paper. Thanks for any suggestions. > Sorry for not asking question clearly. > > > Johannes Graumann-2 wrote: >> >> How about >> >> plot(sigma, delta1, ylim=range(-0.5, 2), xlab='sigma', ylab='delta', >> pch=22, >> type='o') >> points(sigma, delta2, col='red', axes=FALSE, type='o') >> legend("topleft",c("Delta1","Delta2"),fill=TRUE,col=c("black","red")) >> >> Send runnable example next time. >> >> HTH, Joh >> >> gcheer3 wrote: >> >>> >>> TO be specific, here is how I graphed >>> >>> plot(sigma, delta1, ylim=range(-0.5, 2), xlab='sigma', >>> ylab='delta1--square delta2--circle', pch=22, type='o') >>> par(new=TRUE) >>> plot(sigma, delta2, ylim=range(-0.5, 2), xlab='sigma', >>> ylab='delta1--square delta2--circle', col='red', axes=FALSE, type='o') >>> >>> Thanks a lot >>> >>> >>> gcheer3 wrote: >>>> >>>> a graph question. Thanks a lot in advance. >>>> >>>> I made two scatterplots on one graph (sigma vs. delta1, sigma vs. >>>> delta2) >>>> (20 observations of delta1, delta2 and corresponding sigma) the x-axis >>>> is >>>> sigma, the y-axis is either delta1 or delta2. I connected both >>>> scatterplots. To seperate them, one curves is a line with circles, the >>>> other curve is a line with squares on it. >>>> >>>> I want to make a notation either on the y-axis or on the graph. The >>>> notiaion is "delta1--square; delta2--circle". So when people look at >>>> the graph, they can easily tell each curve's meaning. The curve with >>>> squares on it means the sigma vs. delta1, and the curve with circles on >>>> it means sigma vs. delta2. I think I can use 'expression' to write >>>> delta1, delta2 and sigma in greek letters, but I am not sure how to >>>> denote the square and cirle I graphed. >>>> >>> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] symbol in the plot
How about plot(sigma, delta1, ylim=range(-0.5, 2), xlab='sigma', ylab='delta', pch=22, type='o') points(sigma, delta2, col='red', axes=FALSE, type='o') legend("topleft",c("Delta1","Delta2"),fill=TRUE,col=c("black","red")) Send runnable example next time. HTH, Joh gcheer3 wrote: > > TO be specific, here is how I graphed > > plot(sigma, delta1, ylim=range(-0.5, 2), xlab='sigma', > ylab='delta1--square delta2--circle', pch=22, type='o') > par(new=TRUE) > plot(sigma, delta2, ylim=range(-0.5, 2), xlab='sigma', > ylab='delta1--square delta2--circle', col='red', axes=FALSE, type='o') > > Thanks a lot > > > gcheer3 wrote: >> >> a graph question. Thanks a lot in advance. >> >> I made two scatterplots on one graph (sigma vs. delta1, sigma vs. delta2) >> (20 observations of delta1, delta2 and corresponding sigma) the x-axis is >> sigma, the y-axis is either delta1 or delta2. I connected both >> scatterplots. To seperate them, one curves is a line with circles, the >> other curve is a line with squares on it. >> >> I want to make a notation either on the y-axis or on the graph. The >> notiaion is "delta1--square; delta2--circle". So when people look at the >> graph, they can easily tell each curve's meaning. The curve with squares >> on it means the sigma vs. delta1, and the curve with circles on it means >> sigma vs. delta2. I think I can use 'expression' to write delta1, delta2 >> and sigma in greek letters, but I am not sure how to denote the square >> and cirle I graphed. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Normal distribution test
Markus Mehrwald wrote: > Hi all, > > I am completely new to R and my knowledge of statistics is quite small > so I hope you can help my. > I have three dimensional point data which represents (and this is what I > do not know for sure) a normal distribution. Now I want to test if this > is true or not and as I can remember from statistics lessons I can use > Chi-Square test for distribution test. BUT: I have realy no idea how to > do this with R and additionally if my assumptions are correct and if > this is possible with R at all. > > Thank you very much in advance for any answer. > Markus See ?shapiro.test or ?ks.test HTH, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to identify what is used as EOL in a given file?
Hi, Is there any R-generic, OS-agnostic way to figure out what end-of-line character is being used in a file to be processed by "readLines"? Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scanning grep through huge files
Hi, I'm dealing which huge files I would like to index. On a linux system "grep -buo " hands me the byte offsets for "PATTERN" very quickly and I am looking to emulate that speed and ease with native R tools - for portability and elegance. "gregexpr" should be able to do that but I fail to combine it with "scan" or an equivalent to parse the whole file without having to read it all into memory. I'd be grateful for any hints on how to do this without a "pipe("grep -buo ")". Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex matching that gives byte offset?
On Monday 02 November 2009 13:41:45 Prof Brian Ripley wrote: > On Mon, 2 Nov 2009, Johannes Graumann wrote: > > Hmmm ... that should do it, thanks. But how would one use this on a file > > without reading it into memory completely? > > ?file, ?readLines, ?readBin > > will tell you about connections. ... all of which I only get to read by the line and a regexpr on that will not give me the absolute offset. "grep -buo" on the unix command line is really fast for this. If I can't find the native R equivalent, I'm of a mind to do this via a sys call - ugly and not portable, but SOOO fast ... is it possible in R? Joh > > > Joh > > > > On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote: > >> Do you mean like regexpr() (on the same help page)? > >> > >> Depending on your locale, you might actually prefer the character > >> offset: if you want to match in a MBCS and have byte offsets you will > >> need to work a bit harder if useBytes=TRUE is not sufficient for you. > >> > >> On Wed, 28 Oct 2009, Johannes Graumann wrote: > >>> Hi, > >>> > >>> Is there any way of doing 'grep' ore something like it on the content > >>> of a text file and extract the byte positioning of the match in the > >>> file? I'm facing the need to access rather largish (>600MB) XML files > >>> and would like to be able to index them ... > >>> > >>> Thanks for any help or flogging, > >>> > >>> Joh > >>> > >>> __ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html and provide commented, > >>> minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Regex matching that gives byte offset?
Hmmm ... that should do it, thanks. But how would one use this on a file without reading it into memory completely? Joh On Wednesday 28 October 2009 16:29:00 Prof Brian Ripley wrote: > Do you mean like regexpr() (on the same help page)? > > Depending on your locale, you might actually prefer the character > offset: if you want to match in a MBCS and have byte offsets you will > need to work a bit harder if useBytes=TRUE is not sufficient for you. > > On Wed, 28 Oct 2009, Johannes Graumann wrote: > > Hi, > > > > Is there any way of doing 'grep' ore something like it on the content of > > a text file and extract the byte positioning of the match in the file? > > I'm facing the need to access rather largish (>600MB) XML files and would > > like to be able to index them ... > > > > Thanks for any help or flogging, > > > > Joh > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Regex matching that gives byte offset?
Hi, Is there any way of doing 'grep' ore something like it on the content of a text file and extract the byte positioning of the match in the file? I'm facing the need to access rather largish (>600MB) XML files and would like to be able to index them ... Thanks for any help or flogging, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Vector grouping challenge
Just so. I got until 'split' but was stuck on how to get the breaks ... Thank you! Joh jim holtman wrote: > Is this what you want: > >> testVector <- c(12,32,NA,NA,56,NA,78,65,87,NA,NA,NA,90) >> # get the breaks at the NAs >> xb <- cumsum(!is.na(testVector)) >> split(seq(length(testVector)), xb) > $`1` > [1] 1 > > $`2` > [1] 2 3 4 > > $`3` > [1] 5 6 > > $`4` > [1] 7 > > $`5` > [1] 8 > > $`6` > [1] 9 10 11 12 > > $`7` > [1] 13 > > > On Wed, Oct 28, 2009 at 7:57 AM, Johannes Graumann > wrote: >> Dear all, >> >> Is there an efficient way to get this list >>> testList <- list(c(1),c(2,3,4),c(5,6),c(7),c(8),c(9,10,11,12),c(13)) >> >> from this vector >>> testVector <- c(12,32,NA,NA,56,NA,78,65,87,NA,NA,NA,90) >> ? >> >> Basically the vector should be grouped, such that non-NA and all >> following NAs end up in one group. >> >> Thanks for any hint, >> >> Joh >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vector grouping challenge
Dear all, Is there an efficient way to get this list > testList <- list(c(1),c(2,3,4),c(5,6),c(7),c(8),c(9,10,11,12),c(13)) from this vector > testVector <- c(12,32,NA,NA,56,NA,78,65,87,NA,NA,NA,90) ? Basically the vector should be grouped, such that non-NA and all following NAs end up in one group. Thanks for any hint, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make XML support Expat?
On Sunday 25 October 2009 00:38:54 you wrote: > xmlEventParse() is intended for handling files that we don't want to keep > in memory. The branches parameter does make it easier to deal with > sub-trees as the document is being parsed. And within these branches one > can use XPath. Very interesting. I'll check it out. > So how big are the files you are working with? Suprisingly, reading > 70Mb files into memory and doing XPath can be quite fast. I need to access repeatedly data in multiple files larger 600 MB ... quite the fun. One more question: is it possible to run xmlEventParse and whenever a given tag is hit get the bit offset of the tag for indexing purposes? Thanks for any hints, Joh signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make XML support Expat?
Thanks for your input. If I understand correctly, XPath requires the whole document to be resident in memory. That is not an option given the size of documents I'm facing ... I'll go with the standard streaming implementation of the XML package and see how far I get. Thanks, Joh On Saturday 24 October 2009 23:31:46 Duncan Temple Lang wrote: > Johannes Graumann wrote: > > Hi, > > > > I had heard that Expat is was faster. Your mail actually made me go check > > google for some comparisons and that does not seem the case ... do you > > have any insight into this? > > A couple of points.. > > i) At this point, I don't have any data about which of libxml2 and expat > are faster C-level parsers > > ii) Since you are calling the parser from R and then presumably working the > resluting content via manipulation in R, these R-level operations are > likely to be the slower parts of the overall process. > > iii) I tend to use XPath for processing the resulting XML DOM/tree. That > makes things quite fast (and also easy to express if you know XPath). > expat is a parser and doesn't provide XPath facilities. So you would > lose out big time in terms of speed here. > > iv) Xerces is an alternative, but again doesn't have a full XPath > implementation by itself, AFAIK. > > > So basically, I wouldn't prematurely worry about speed. > If you have a test case, you can profile the code and see > where the bottlenecks are. > > D. > > > Thanks, Joh > > > > On Saturday 24 October 2009 20:38:23 Duncan Temple Lang wrote: > >> Hi Joh. > >> > >> What particular aspects of expat do you want that libxml2 and > >> the XML package currently cannot provide? > >> > >> The early versions of the XML package (for the first few years) > >> could support expat and libxml2 as the C++/C-level parsers. > >> However, the support for expat was not maintained, so while > >> it could be resurrected and I have thought about it at several > >> times, I doubt it would compile out of the box now as > >> expat has most likely changed significantly. > >> > >> > >> If you wanted to experiment with the expat support in the package, > >> use > >> > >> R CMD INSTALL --configure-args='--with-expat' XML > >> > >> and that will endeavor to find the expat libraries, etc. > >> > >> > >> HTH, > >> > >> D. > >> > >> Johannes Graumann wrote: > >>> Hi, > >>> > >>> How can I make the result of the following lines "TRUE"? > >>> > >>>> install.packages("XML") > >>>> library(XML) > >>>> supportsExpat() > >>> > >>> [1] FALSE > >>> > >>> I'm on linux, looked into the actual package, but don't seem to be able > >>> to wrap my head around how to compile this in ... > >>> > >>> Any pointers are welcome, > >>> > >>> Thanks Joh > >>> > >>> __ > >>> R-help@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html and provide commented, > >>> minimal, self-contained, reproducible code. > signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to make XML support Expat?
Hi, I had heard that Expat is was faster. Your mail actually made me go check google for some comparisons and that does not seem the case ... do you have any insight into this? Thanks, Joh On Saturday 24 October 2009 20:38:23 Duncan Temple Lang wrote: > Hi Joh. > > What particular aspects of expat do you want that libxml2 and > the XML package currently cannot provide? > > The early versions of the XML package (for the first few years) > could support expat and libxml2 as the C++/C-level parsers. > However, the support for expat was not maintained, so while > it could be resurrected and I have thought about it at several > times, I doubt it would compile out of the box now as > expat has most likely changed significantly. > > > If you wanted to experiment with the expat support in the package, > use > > R CMD INSTALL --configure-args='--with-expat' XML > > and that will endeavor to find the expat libraries, etc. > > > HTH, > > D. > > Johannes Graumann wrote: > > Hi, > > > > How can I make the result of the following lines "TRUE"? > > > >> install.packages("XML") > >> library(XML) > >> supportsExpat() > > > > [1] FALSE > > > > I'm on linux, looked into the actual package, but don't seem to be able > > to wrap my head around how to compile this in ... > > > > Any pointers are welcome, > > > > Thanks Joh > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html and provide commented, > > minimal, self-contained, reproducible code. > signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to make XML support Expat?
Hi, How can I make the result of the following lines "TRUE"? > install.packages("XML") > library(XML) > supportsExpat() [1] FALSE I'm on linux, looked into the actual package, but don't seem to be able to wrap my head around how to compile this in ... Any pointers are welcome, Thanks Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] loop and plot
Rene wrote: > Dear all, > > I am stuck at applying loop function for creating separated plots. > > I have coding like below: > > dataset.table <- > table(data.frame(var1=c(1,2,3,1,2,3,1),colour=c("a","b","c","c","a","b","b") > )) > kk = function(f) > { > ls=as.character(f) > pie(dataset.table[ls,],main=ls) > box() > } > > kk(1) > kk(2) > kk(3) > > By using above code, I can create 3 single plot respectively, but when I > type kk(1:3), obviously it will not work. > > I know I have to vectorise the coding, then I can use command kk(1:3). I > try to use loop: > > kk = function(f) > { > ls=as.character(f) > for (i in length(f)) > { > pie(dataset.table[ls[i],],main=ls[i]) > box() > } > } > kk(1:3) > > the above code only gives me the last pie plot (ie. kk(3) plot) instead of > 3 plots respectively. > > Can someone please guide me how to revise the loop coding, and produce 3 > separated plots one after another on the screen by typing kk(1:3)? > > Thanks a lot. > > Rene. Your code is probably doing what you want, but over-plotting the graphs so quickly, you only see the last one. Inserting readline("Hit to proceed.") after your "box()" statement might give you what you want. Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] readBin: read from defined offset TO defined offset?
Thanks guys! Duncan's hints regarding "character" (which I was naturally using ;0) and the double "readBin" solved my problem - I'm extracting an index from a REALLY big XML file to get fast direct access to subsections, so that I only have to parse them rather than the whole thing (only SAX-style passing would be possible, since there's no way the thing will fit into memory). Thanks again, Joh Johannes Graumann wrote: > Hello, > > With the help of "seek" I can start "readBin" from any byte offset within > my file that I deem appropriate. > What I would like to do is to be able to define the endpoint of that read > as well. Is there any solution to that already out there? > > Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] readBin: read from defined offset TO defined offset?
Hello, With the help of "seek" I can start "readBin" from any byte offset within my file that I deem appropriate. What I would like to do is to be able to define the endpoint of that read as well. Is there any solution to that already out there? Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xmlEventParse returning trimmed content?
Hi Duncan, Thanks for your thoughts. "trim=FALSE" does not fix my issues, so I attach pared down versions of my script and data file. Thanks for any further hint. Joh Duncan Temple Lang wrote: > Hi Johannes > > I would "guess" that the trimming of the text occurs because > you do not specify trim = FALSE in the call to xmlEventParse(). > If you specify this, you might well get the results you expect. > If not, can you post the actual file you are reading so we can > reproduce your results. > >D. > > Johannes Graumann wrote: >> Hello, >> >> I wrote the function below and have the problem, that the "text" bit >> returns only a trimmed version (686 chars as far as I can see) of the >> content under the "fetchPeaks" condition. >> Any hunches why that might be? >> >> Thanks for pointer, Joh >> >> xmlEventParse(fileName, >> list( >> startElement=function(name, attrs){ >> if(name == "scan"){ >> if(.GlobalEnv$ms2Scan == TRUE & .GlobalEnv$scanDone == TRUE){ >> cat(.GlobalEnv$scanNum,"\n") >> MakeSpektrumEntry() >> } >> .GlobalEnv$scanDone <- FALSE >> .GlobalEnv$fetchPrecMz <- FALSE >> .GlobalEnv$fetchPeaks <- FALSE >> .GlobalEnv$ms2Scan <- FALSE >> if(attrs[["msLevel"]] == "2"){ >> .GlobalEnv$ms2Scan <- TRUE >> .GlobalEnv$scanNum <- as.integer(attrs[["num"]]) >> } >> } else if(name == "precursorMz" & .GlobalEnv$ms2Scan == TRUE){ >> .GlobalEnv$fetchPrecMz <- TRUE >> } else if(name == "peaks" & .GlobalEnv$ms2Scan == TRUE){ >> .GlobalEnv$fetchPeaks <- TRUE >> } >> }, >> text=function(text){ >> if(.GlobalEnv$fetchPrecMz == TRUE){ >> .GlobalEnv$precursorMz <- as.numeric(text) >> .GlobalEnv$fetchPrecMz <- FALSE >> } >> if(.GlobalEnv$fetchPeaks == TRUE){ >> .GlobalEnv$peaks <- text >> .GlobalEnv$fetchPeaks <- FALSE >> .GlobalEnv$scanDone <- TRUE >> } >> } >> ) >> ) >> >>> sessionInfo() >> R version 2.9.0 beta (2009-04-03 r48277) >> x86_64-pc-linux-gnu >> >> locale: >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8 >> >> attached base packages: >> [1] splines stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] caMassClass_1.6 MASS_7.2-46 digest_0.3.1caTools_1.9 >> [5] bitops_1.0-4.1 rpart_3.1-43nnet_7.2-46 e1071_1.5-19 >> [9] class_7.2-46PROcess_1.19.1 Icens_1.15.2survival_2.35-4 >> [13] RCurl_0.94-1XML_2.3-0 rkward_0.5.0 >> >> loaded via a namespace (and not attached): >> [1] tools_2.9.0 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xmlEventParse returning trimmed content?
Hello, I wrote the function below and have the problem, that the "text" bit returns only a trimmed version (686 chars as far as I can see) of the content under the "fetchPeaks" condition. Any hunches why that might be? Thanks for pointer, Joh xmlEventParse(fileName, list( startElement=function(name, attrs){ if(name == "scan"){ if(.GlobalEnv$ms2Scan == TRUE & .GlobalEnv$scanDone == TRUE){ cat(.GlobalEnv$scanNum,"\n") MakeSpektrumEntry() } .GlobalEnv$scanDone <- FALSE .GlobalEnv$fetchPrecMz <- FALSE .GlobalEnv$fetchPeaks <- FALSE .GlobalEnv$ms2Scan <- FALSE if(attrs[["msLevel"]] == "2"){ .GlobalEnv$ms2Scan <- TRUE .GlobalEnv$scanNum <- as.integer(attrs[["num"]]) } } else if(name == "precursorMz" & .GlobalEnv$ms2Scan == TRUE){ .GlobalEnv$fetchPrecMz <- TRUE } else if(name == "peaks" & .GlobalEnv$ms2Scan == TRUE){ .GlobalEnv$fetchPeaks <- TRUE } }, text=function(text){ if(.GlobalEnv$fetchPrecMz == TRUE){ .GlobalEnv$precursorMz <- as.numeric(text) .GlobalEnv$fetchPrecMz <- FALSE } if(.GlobalEnv$fetchPeaks == TRUE){ .GlobalEnv$peaks <- text .GlobalEnv$fetchPeaks <- FALSE .GlobalEnv$scanDone <- TRUE } } ) ) > sessionInfo() R version 2.9.0 beta (2009-04-03 r48277) x86_64-pc-linux-gnu locale: LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=en_US.UTF-8;LC_ADDRESS=en_US.UTF-8;LC_TELEPHONE=en_US.UTF-8;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] caMassClass_1.6 MASS_7.2-46 digest_0.3.1caTools_1.9 [5] bitops_1.0-4.1 rpart_3.1-43nnet_7.2-46 e1071_1.5-19 [9] class_7.2-46PROcess_1.19.1 Icens_1.15.2survival_2.35-4 [13] RCurl_0.94-1XML_2.3-0 rkward_0.5.0 loaded via a namespace (and not attached): [1] tools_2.9.0 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get name of current function?
Thanks a lot. Exactly what I was looking for. Joh Prof Brian Ripley wrote: > On Tue, 20 Jan 2009, Johannes Graumann wrote: > >> Hello, >> >> Is there a way to get the name of the function currently running? > > It may not even have a name (you can write functions anonymously as > 'function(x) x+1' in function arguments). I think rather the point is > that you can get the name (if any) of the current call (and f1 and f2 > may be two names for the same function). > > You can use match.call() or the sys* functions to help you. > > x <- function() match.call()[[1]] > > would probably be enough for your purposes. > >> I'd like to have something like this >> x <- function(){ >> myName <- getNameOfCurrentFunction >> cat(myName) >> } >> so that >> x() >> would result in >> "x" >> >> Thanks for any pointers, >> >> Joh > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to get name of current function?
Hello, Is there a way to get the name of the function currently running? I'd like to have something like this x <- function(){ myName <- getNameOfCurrentFunction cat(myName) } so that x() would result in "x" Thanks for any pointers, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficiency challenge: MANY subsets
Many thanks for this example, which doesn't entirely cover my case since I have as many "indexes" entries as "sequences" entries. It was very educational none the less and I used it to come up with something a bit faster than what I had before. The main trick I used though was naming all entries in "sequences" and "indexes" likes so name(indexes) <- seq(length(indexes) and then do a lapply on "names(indexes)", which allows me to access both lists easily. What I end up with is this: fragments <- lapply( names(indexes), function(x){ lapply( indexes[[x]], function(.range){ .range <- seq.int( .range[1], .range[2] ) unlist(lapply(sequences[x], '[', .range),use.names=FALSE) } ) } ) Although this is still quite slow, it's much faster than what I had before. Any further comments are highly welcome. I can send the real "sequences" and "indexes" as exported R objects ... Thanks, Joh jim holtman wrote: > Try this one; it is doing a list of 7000 in under 2 seconds: > >> sequences <- list( > + > + > + > c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I" > + ,"M", + > + > + > "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F", > "N","I","N","I","N","I","D","K","M","Y","I","H","*") > + ) >> >> >> >> indexes <- list( > + list( > + c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) > + ) > + ) >> >> indexes <- rep(indexes,10) >> sequences <- rep(sequences,7000) >> >> system.time({ > + fragments <- lapply(indexes, function(.seq){ > + lapply(.seq, function(.range){ > + .range <- seq(.range[1], .range[2]) # save since we use several > times > + lapply(sequences, '[', .range) > + }) > + }) > + }) >user system elapsed >1.240.001.26 >> >> > > > On Fri, Jan 16, 2009 at 3:16 PM, Johannes Graumann > wrote: >> Thanks. Very elegant, but doesn't solve the problem of the outer "for" >> loop, since I now would rewrite the code like so: >> >> fragments <- list() >> for(iN in seq(length(sequences))){ >> cat(paste(iN,"\n")) >> fragments[[iN]] <- >>lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, >>as.list(g))]) >> } >> >> still very slow for length(sequences) ~ 7000. >> >> Joh >> >> On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote: >>> Try this: >>> >>> lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, >>> as.list(g))]) >>> >>> On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann < >>> >>> johannes_graum...@web.de> wrote: >>> > Hello, >>> > >>> > I have a list of character vectors like this: >>> > >>> > sequences <- list( >>> > >>> > >>> > c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I" >>> >,"M", >>> > >>> > >>> > "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y"," >>> >F", "N","I","N","I","N","I","D","K","M","Y","I","H","*") >>> > ) >>> > >>> > and another list of subset ranges like this: >>> > >>> > indexes <- list( >>> > list( >>> >c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) >>> > ) >>> > ) >>> > >>> > What I now want to do is to subset each entry in "sequences" >>> > (sequences[[1]]) with all ranges in the corresponding low level list >>> > in "indexes" (indexes[[1]]). Here is what I came up with. >>> > >>> > fragments <- list() >>> > for(iN in seq(length(sequences))){ >>> > cat(paste(iN,"\n")) >>> > tmpFragments <- sapply( >>> >indexes[[iN]], >>> >function(x){ >>> > sequences[[iN]][seq.int(x[1],x[2])] >>> >} >>> > ) >>> > fragments[[iN]] <- tmpFragments >>> > } >>> > >>> > This works fine, but "sequences" contains thousands of entries and the >>> > corresponding "indexes" are sometimes hundreds of ranges long, so this >>> > whole >>> > process is EXTREMELY inefficient. >>> > >>> > Does somebody out there take the challenge and show me a way on how to >>> > speed >>> > this up? >>> > >>> > Thanks for any hints, >>> > >>> > Joh >>> > >>> > __ >>> > R-help@r-project.org mailing list >>> > https://stat.ethz.ch/mailman/listinfo/r-help >>> > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > and provide commented, minimal, self-contained, reproducible code. >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html and provide commented, >> minimal, self-contained, reproducible code. >> >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficiency challenge: MANY subsets
Thanks. Very elegant, but doesn't solve the problem of the outer "for" loop, since I now would rewrite the code like so: fragments <- list() for(iN in seq(length(sequences))){ cat(paste(iN,"\n")) fragments[[iN]] <- lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))]) } still very slow for length(sequences) ~ 7000. Joh On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote: > Try this: > > lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))]) > > On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann < > > johannes_graum...@web.de> wrote: > > Hello, > > > > I have a list of character vectors like this: > > > > sequences <- list( > > > > > > c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I" > >,"M", > > > > > > "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y"," > >F", "N","I","N","I","N","I","D","K","M","Y","I","H","*") > > ) > > > > and another list of subset ranges like this: > > > > indexes <- list( > > list( > >c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) > > ) > > ) > > > > What I now want to do is to subset each entry in "sequences" > > (sequences[[1]]) with all ranges in the corresponding low level list in > > "indexes" (indexes[[1]]). Here is what I came up with. > > > > fragments <- list() > > for(iN in seq(length(sequences))){ > > cat(paste(iN,"\n")) > > tmpFragments <- sapply( > >indexes[[iN]], > >function(x){ > > sequences[[iN]][seq.int(x[1],x[2])] > >} > > ) > > fragments[[iN]] <- tmpFragments > > } > > > > This works fine, but "sequences" contains thousands of entries and the > > corresponding "indexes" are sometimes hundreds of ranges long, so this > > whole > > process is EXTREMELY inefficient. > > > > Does somebody out there take the challenge and show me a way on how to > > speed > > this up? > > > > Thanks for any hints, > > > > Joh > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. signature.asc Description: This is a digitally signed message part. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Efficiency challenge: MANY subsets
Hello, I have a list of character vectors like this: sequences <- list( c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M", "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F", "N","I","N","I","N","I","D","K","M","Y","I","H","*") ) and another list of subset ranges like this: indexes <- list( list( c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) ) ) What I now want to do is to subset each entry in "sequences" (sequences[[1]]) with all ranges in the corresponding low level list in "indexes" (indexes[[1]]). Here is what I came up with. fragments <- list() for(iN in seq(length(sequences))){ cat(paste(iN,"\n")) tmpFragments <- sapply( indexes[[iN]], function(x){ sequences[[iN]][seq.int(x[1],x[2])] } ) fragments[[iN]] <- tmpFragments } This works fine, but "sequences" contains thousands of entries and the corresponding "indexes" are sometimes hundreds of ranges long, so this whole process is EXTREMELY inefficient. Does somebody out there take the challenge and show me a way on how to speed this up? Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Efficient passing through big data.frame and modifying select
Marvelous! Thanks guys for your hints and time! Very smooth now! Joh On Wednesday 26 November 2008 03:41:49 Henrik Bengtsson wrote: > Alright, here are another $.02: using 'use.names=FALSE' in unlist() is > much faster than the default 'use.names=TRUE'. /Henrik > > On Tue, Nov 25, 2008 at 6:40 PM, Henrik Bengtsson <[EMAIL PROTECTED]> wrote: > > My $.02: Using argument 'fixed=TRUE' in strsplit() is much faster than > > the default 'fixed=FALSE'. /Henrik > > > > On Tue, Nov 25, 2008 at 1:02 PM, William Dunlap <[EMAIL PROTECTED]> wrote: > >>> -Original Message- > >>> From: William Dunlap > >>> Sent: Tuesday, November 25, 2008 9:16 AM > >>> To: '[EMAIL PROTECTED]' > >>> Subject: Re: [R] Efficient passing through big data.frame and > >>> modifying select fields > >>> > >>> > Johannes Graumann johannes_graumann at web.de > >>> > Tue Nov 25 15:16:01 CET 2008 > >>> > > >>> > Hi all, > >>> > > >>> > I have relatively big data frames (> 1 rows by 80 columns) > >>> > that need to be exposed to "merge". Works marvelously well in > >>> > general, but some fields of the data frames actually contain > >>> > multiple ";"-separated values encoded as a character string without > >>> > defined order, which makes the fields not match each other. > >>> > > >>> > Example: > >>> > > frame1[1,1] > >>> > > >>> > [1] "some;thing" > >>> > > >>> > >frame2[2,1] > >>> > > >>> > [2] "thing;some" > >>> > > >>> > In order to enable merging/duplicate identification of columns > >>> > containing these strings, I wrote the following function, which > >>> > passes through the rows one by one, identifies ";"-containing cells, > >>> > splits and resorts them. > >>> > > >>> > ResortCombinedFields <- function(dframe){ > >>> > if(!is.data.frame(dframe)){ > >>> >stop("\"ResortCombinedFields\" input needs to be a data frame.") > >>> > } > >>> > for(row in seq(nrow(dframe))){ > >>> >for(mef in grep(";",dframe[row,])){ > >>> > >>> I needed to add drop=TRUE to the above dframe[row,] for this to work. > >>> > >>> > dframe[row,mef] <- > >>> > >>> paste(sort(unlist(strsplit(dframe[row,mef],";"))),collapse=";") > >>> > >>> >} > >>> > } > >>> > return(dframe) > >>> > } > >>> > > >>> > works fine, but is horribly inefficient. How might this be > >>> > >>> tackled more elegantly? > >>> > >>> > Thanks for any input, Joh > >>> > >>> It is usually faster to loop over columns of an data frame and use row > >>> subscripting, if needed, on individual columns. E.g., the following > >>> 2 are much quicker on a sample 1000 by 4 dataset I made with > >>> > >>> dframe<-data.frame(lapply(c(One=1,Two=2,Three=3), > >>>function(i)sapply(1:1000, > >>> function(i) > >>> > >>> paste(sample(LETTERS[1:5],size=sample(3,size=1),repl=FALSE), > >>> collapse=";"))), > >>>stringsAsFactors=FALSE) > >>> dframe$Four<-sample(LETTERS[1:5], size=nrow(dframe), > >>> replace=TRUE) # no ;'s in column Four > >>> > >>> The first function, f1, doesn't try to find which rows may > >>> need adjusting > >>> and the second, f2, does. > >>> > >>> f1 <- function(dframe){ > >>> if(!is.data.frame(dframe)){ > >>> stop("\"ResortCombinedFields\" input needs to be a data frame.") > >>> } > >>> for(icol in seq_len(ncol(dframe))){ > >>> dframe[,icol] <- unlist(lapply(strsplit(dframe[,icol], > >>> ";"), function(parts) paste(sort(parts), collapse=";"))) > >>> } > >>> return(dframe) > >>> } > >>> > >>> f2 <- > >>> function(dframe){ > >>> if(!is.data.frame(dframe)){ > >>> stop("\&qu
[R] Efficient passing through big data.frame and modifying select fields
Hi all, I have relatively big data frames (> 1 rows by 80 columns) that need to be exposed to "merge". Works marvelously well in general, but some fields of the data frames actually contain multiple ";"-separated values encoded as a character string without defined order, which makes the fields not match each other. Example: > frame1[1,1] [1] "some;thing" >frame2[2,1] [2] "thing;some" In order to enable merging/duplicate identification of columns containing these strings, I wrote the following function, which passes through the rows one by one, identifies ";"-containing cells, splits and resorts them. ResortCombinedFields <- function(dframe){ if(!is.data.frame(dframe)){ stop("\"ResortCombinedFields\" input needs to be a data frame.") } for(row in seq(nrow(dframe))){ for(mef in grep(";",dframe[row,])){ dframe[row,mef] <- paste(sort(unlist(strsplit(dframe[row,mef],";"))),collapse=";") } } return(dframe) } works fine, but is horribly inefficient. How might this be tackled more elegantly? Thanks for any input, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best way of figuring out whether graphical elements overlap?
Thank very much for this very cool help! Joh Greg Snow wrote: > There is also the spread.labs function in the TeachingDemos package that > uses a different method from the plotrix function and should not move any > labels that are not overlapping. There are also the dynIdentify and > TkIdentify functions in the same package that allow you to interactively > move labels around to where you are happy with their positions, then > returns the coordinates to use for the positions in a final version of the > plot. > > If you want to check by hand, you can use the strheight and strwidth > functions to find the bounding rectangles and see if they overlap (there > can be some cases where the actual text does not overlap even if the > rectangles do). > > Hope this helps, > > -- > Gregory (Greg) L. Snow Ph.D. > Statistical Data Center > Intermountain Healthcare > [EMAIL PROTECTED] > 801.408.8111 > > >> -Original Message- >> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >> project.org] On Behalf Of Johannes Graumann >> Sent: Tuesday, October 28, 2008 8:42 AM >> To: [EMAIL PROTECTED] >> Subject: [R] Best way of figuring out whether graphical elements >> overlap? >> >> Hi all, >> >> I'm plotting impulses, where some of them should have labels hovering >> above them. I know of plotrix' spread.labels function, but would like >> to save that for instances where there truely is to little space for >> the label. >> Does anybody have any hints what' the most efficient way might be to >> achieve the following: >> - plot an impulse plot >> - before placing each of a vector of text labels, check (using >> strhight/width), whether this collides graphically with anything >> already plotted and only plot it if not. >> >> Thanks for any hints, Joh >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] "plot": Howto get parameters befor plotting anything?
Hello, Is it possible to get all "par" content calculated for "plot" without actually plotting anything? I'm missing an option "plot=FALSE" ... "type="n"" will still open a device and draw the axes ... Thanks, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Best way of figuring out whether graphical elements overlap?
Hi all, I'm plotting impulses, where some of them should have labels hovering above them. I know of plotrix' spread.labels function, but would like to save that for instances where there truely is to little space for the label. Does anybody have any hints what' the most efficient way might be to achieve the following: - plot an impulse plot - before placing each of a vector of text labels, check (using strhight/width), whether this collides graphically with anything already plotted and only plot it if not. Thanks for any hints, Joh __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] gsubfn, strapply, REGEX Problem
Thank you! Joh Gabor Grothendieck wrote: > Have just made an improvement to the development > version to ignore escaped left parens in the regexp > in setting the backref default. This improvement > should address your problem so that this now > works without errors: > > library(gsubfn) > # overwrite relevant function with devel version of it > source("http://gsubfn.googlecode.com/svn/trunk/R/gsubfn.R";) > > strapply("S(AC,P)TVDK(8)EELVQK(8)", ".[(].{1,2}[)]|.")[[1]] > > On Tue, Oct 28, 2008 at 8:53 AM, Gabor Grothendieck > <[EMAIL PROTECTED]> wrote: >> The default has changed to be the negative of its prior >> value so that would account for it. The current >> default is backref = -k where k is the number of left parens in >> the regexp. That means that it passes only the >> back references (and not the match) if it thinks there >> are any backreferences. Usually this revised default >> is what is wanted but the unusual aspect of this example >> is that the parens don't represent back references >> and the "wrong" default happened to work anyways. >> >> I guess the bottom line is that if you use parens in >> your regexp that are not intended to be back references >> then its important to specify backref= explicitly. >> >> The NEWS file in the gsubfn distribution does mention >> the change. >> >> On Tue, Oct 28, 2008 at 8:32 AM, Johannes Graumann >> <[EMAIL PROTECTED]> wrote: >>> -BEGIN PGP SIGNED MESSAGE- >>> Hash: SHA512 >>> >>> Thanks for looking at this. The "\"" was an oversight for the example, >>> but the "backref" bit solves my problem ... I wonder whether that used >>> to be the default and was recently changed? >>> >>> Thanks for your help! >>> >>> Joh >>> >>> Gabor Grothendieck wrote: >>> >>>> There is no quote terminating the first argument and you >>>> need to add the backref = 0 argument so that it does >>>> not interpret the parentheses in the regular expression >>>> as back references. >>>> >>>> Its not clear to me what the intention is here so there >>>> may be further changes needed but the ones above >>>> result in no error message. >>>> >>>> On Tue, Oct 28, 2008 at 7:39 AM, Johannes Graumann >>>> <[EMAIL PROTECTED]> wrote: >>>>> Hi all, >>>>> >>>>> I swear this used to work: >>>>> >>>>> library(gsubfn) >>>>> strapply("S(AC,P)TVDK(8)EELVQK(8), ".[(].{1,2}[)]|.")[[1]] >>>>> >>>>> But somewhere along the update path it stopped ... now giving me this >>>>> >>>>> Error in base::gsub(pattern, rs, x, ...) : >>>>> invalid backreference 2 in regular expression >>>>> >>>>> Can't figure it out. What am I doing wrong? >>>>> >>>>> Thanks for any hints, Joh >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html and provide commented, >>>>> minimal, self-contained, reproducible code. >>>>> >>>> >>>> __ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html and provide commented, >>>> minimal, self-contained, reproducible code. >>> -BEGIN PGP SIGNATURE- >>> Version: GnuPG v1.4.9 (GNU/Linux) >>> >>> iQIcBAEBCgAGBQJJBwZjAAoJEK3uDRxoATjEb9gP/ioWCERhZrLAaeIPMc1PSmVV >>> nsWojOneSNruSESMgmocrKkOYbkPVZmmBetK9gw4sw9hLErGjy1MsebHVr40pNK2 >>> Bajm7mXJ1wbd7EDlvRfS3KpBkPvPlUmSMlp2fMoYaswcyt6Rokr3S512UlkvlLWU >>> QNd8NMx4iRFPn3dA84SW1SqWaKIXtpTME35k1VQw0dGvv8iTgsY6pAHWkEoezuue >>> g/tGY8kc2WjBpvVjSVDD4uAuzO9T502n1AjsUs+/bxVRBPIJJktFzkOJbhKQabuJ >>> 2NfEX45B4Y/f1nMff5KQ1IS4LQUUzNwzvEuwHuw2CXfKnzopNUUjU3rcCaHwOIJz >>> yecnRXpGwVX+dHaLH156voiHJqpsz7tUoIUOvAQumfwmPajK9Z/KKwLoXdXQ22gB >>> 5469gcVBI+z31euijZMRMW12M7ZidABnHd2afxrwQRZyU9sexemzVzSdAlIpIgr5 >>> JG62rxpFY2ImmzTDncZpNik68cviB1ZLloH4twJxFk/T7DmS3x17wVofPb1y