Re: [R] R dataset copyrights
On 24/04/2014 22:33, Greg Snow wrote: Many, probably even most (but I have not checked) of the datasets available in R packages have help files with a references section. That section should lead you to an original source that may have the copyright information and is what should be referenced. My understanding (but I am not a lawyer, do not play one on TV, or claim to be any type of legal expert) is that you cannot copyright facts, but you can copyright the layout and presentation of facts. So real data about the real world cannot be copyrighted, but the layout and presentation can be. So if you photocopy a page from a journal and post that you may be in trouble for copying and distributing the layout and presentation of the data, but not the data itself. But if you transform the numbers to a file to be read by the computer then you have just copied the facts which are not copyrighted. You most likely also copied the layout (which numbers/strings are in which rows ...). There are legal precedents involving telephone directories, for example. There was a May 2007 thread about this: see https://stat.ethz.ch/pipermail/r-help/2007-May/131780.html and replies. On the other hand simulated or otherwise made up datasets could be considered to be fiction and therefore able to be copyrighted. I remember hearing (but I don't remember where or when) that some textbook authors are encouraged to use simulated data instead of real data (it can have the same mean, sd, etc. as a real dataset so the interpretation is the same) in textbooks so that the copyright of the textbook also applies to the data. It is not always clear whether a dataset is fact or simulated, so it is best to obtain permission or check official statements from the source. Beyond what is legal you should consider what is right. Even if you don't have to cite a data source, you should try to give credit where it is due (and possibly blame if there is an error). At a minimum you should cite original sources when they can be found and also mention where you obtained the data if not from the original source. Think of the effort that people went through to collect the data and make it available to you, how would you feel if you put that much effort into something then someone else stole the credit or other rewards. Many data sources have statements on how the data can be used and it is best to follow those instructions/requests, is it really that hard to add a reference to where the data came from and how you obtained it? In some educational cases it may be better to initially hide the source of the data, for example the outliers dataset in the TeachingDemos package would be a lot less useful for its intended purposes if students were to read its help page before analyzing it, therefore I have no problem with teachers using it without telling students where it came from (and since it is simulated I could possibly claim copyright), though I would appreciate a mention after the fact (once the lesson is learned the teacher could say "by the way, this data came from ...") and I expect that others would feel similarly (I should add a note to that effect to the documentation page). But you should check the sources to see if this is specifically allowed or disallowed. I probably have not fully answered your question, but hopefully this gives a little more guidance. On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup wrote: Hi everybody, I've been searching the web for quite a time now and haven't found a satisfying answer. I was wondering if the datasets provided within the R packages are open, and thus can be used in publications? Concretely, can the data, for example, be exported from R and uploaded in a different format (like csv) to a website to be accessible for students to work with the data in SPSS or Matlab? Is it enough to cite the source or paper or do I need a permission for every dataset? Thanks in advance for your replies, Sören Gröttrup __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] LogLikelihood of a Distribution Given Fixed Parameters
As usual I am too lazy to fight my way through the rather convoluted code presented, but it seems to me that you just want to calculate a log likelihood. And that is bog-simple: The log likelihood for i.i.d. data is just the sum of log f(y_i) where the y_i are your observed values and f() is the density function of the distribution that you have in mind. Where there is (right) censoring you take the sum of log f(y_i) over all the non-censored values and then add k*(1-F(cens.time)) where k is the number of censored values and F() is the cumulative distribution function corresponding to f(). In your case it would appear that f(y) = dlnorm(y,1.66,0.25) and F(y) = plnorm(y,1.66,0.25). Note that instead of using 1-F(cens.time) you can use plnorm(cens.time,1.66,0.25,lower=TRUE) and that instead of taking logs explicitly you can set log=TRUE in the calls to dlnorm() and plnorm(). cheers, Rolf Turner On 25/04/14 07:27, Jacob Warren (RIT Student) wrote: I'm trying to figure out if there is a way in R to get the loglikelihood of a distribution fit to a set of data where the parameter values are fixed. For example, I want to simulate data from a given alternate lognormal distribution and then I will fit it to a lognormal distribution with null parameter values to see what the likelihood of the null distribution is given random data from the alternate distribution. I have been using fitdistrplus for other purposes but I cannot use it to fix both parameter values. Here is an example of what I've been working with... nullmu<-1.66 #set null mu altmu<-1.58 #set alt mu sd.log<-0.25 #set common sigma cens.time<-6 #if simulated times are greater than this turn them into right censored times #simulating lognormal data (time) from altnative dist (sim<-rlnorm(n=samplesize, meanlog=altmu, sdlog=sd.log)) #if the time was > cens.time replace time with cens.time (sim[which(sim>cens.time)]<-cens.time) sim #create a variable indicating censoring (cens<-sim) cens[which(sim==cens.time)]<-NA cens #create the data frame to be passed to fitdistcens and fitdist (x<-data.frame(left=sim,right=cens)) #if there is censored data use fitdistcens else use fitdist ifelse(length(which(is.na(cens)))>0, simfit<-fitdistcens(censdata=x, distr="lnorm"), simfit<-fitdist(data=x[,1], distr="lnorm") ) #Now I can get the loglikelihood of the MLE fitted distribution simfit$loglik #I want to get the loglikelihood of the distribution with the null parameterization #This is what I can't get to work #I can't seem to find any function that allows me to set both parameter values #so I can get to loglikelihood of the of the parameterization given the data nulldist<-fitdistcens(censdata=x, distr="lnorm", start=list(meanlog=nullmu, sdlog=sd.log) #Then I want to do a likelihood ratio test between the two distributions pchisq((-2*simfit$loglik--2*nulldist$loglik), df=2, lower.tail=FALSE) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] HELP with fonts
Hi,I have been trying to make my axis fonts and axis labels fonts in bold even when the I write the right command. I writing font.lab=2, font.axis=2 but the bold fonts don't show up. Any help? Thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] instal tar.gz package on windows
The discussion in the forum solves your issue http://stackoverflow.com/questions/1474081/how-do-i-install-an-r-package-from-source Raghu On Thu 24 Apr 2014 08:26:33 PM CEST, KD Makatjane wrote: Good evening sir/madam My name is katleho makatjane. I am currently a B.com statistics student at North West University Mafikeng campus. I have installed R 3.1.0 on my laptop but my main problem is to install all necessary packages so that I may be able to start using it for my analysis. It gives me error while trying to install them from downloaded files. And again it can connect to the internet to download them automatically. Can you please help me out on how to install the R packages. I am using a 32bit windows 7 ultimate operating system Yours faithfully Katleho Makatjane North West University Mafikeng Campus Department of Statistics and Economics Contact: +27734630271 Vrywaringsklousule / Disclaimer: http://www.nwu.ac.za/it/gov-man/disclaimer.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- -- Raghu Erapaneedi Max Planck Institute for Molecular Biomedicine Mammalian Cell Signalling Laboratory Department of Vascular Cell Biology Roentgenstr-20 D-48149 Muenster Germany Tel:+49(0)251-70365-223 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] LogLikelihood of a Distribution Given Fixed Parameters
I'm trying to figure out if there is a way in R to get the loglikelihood of a distribution fit to a set of data where the parameter values are fixed. For example, I want to simulate data from a given alternate lognormal distribution and then I will fit it to a lognormal distribution with null parameter values to see what the likelihood of the null distribution is given random data from the alternate distribution. I have been using fitdistrplus for other purposes but I cannot use it to fix both parameter values. Here is an example of what I've been working with... nullmu<-1.66 #set null mu altmu<-1.58 #set alt mu sd.log<-0.25 #set common sigma cens.time<-6 #if simulated times are greater than this turn them into right censored times #simulating lognormal data (time) from altnative dist (sim<-rlnorm(n=samplesize, meanlog=altmu, sdlog=sd.log)) #if the time was > cens.time replace time with cens.time (sim[which(sim>cens.time)]<-cens.time) sim #create a variable indicating censoring (cens<-sim) cens[which(sim==cens.time)]<-NA cens #create the data frame to be passed to fitdistcens and fitdist (x<-data.frame(left=sim,right=cens)) #if there is censored data use fitdistcens else use fitdist ifelse(length(which(is.na(cens)))>0, simfit<-fitdistcens(censdata=x, distr="lnorm"), simfit<-fitdist(data=x[,1], distr="lnorm") ) #Now I can get the loglikelihood of the MLE fitted distribution simfit$loglik #I want to get the loglikelihood of the distribution with the null parameterization #This is what I can't get to work #I can't seem to find any function that allows me to set both parameter values #so I can get to loglikelihood of the of the parameterization given the data nulldist<-fitdistcens(censdata=x, distr="lnorm", start=list(meanlog=nullmu, sdlog=sd.log) #Then I want to do a likelihood ratio test between the two distributions pchisq((-2*simfit$loglik--2*nulldist$loglik), df=2, lower.tail=FALSE) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] detecting the sourcing of site profile on Startup versus post-Startup
Jeff, I absolutely agree it is a bad idea to rely on side effects. I did figure out one way to skin this cat. It relies on an the following from line 909 of src/main/main.c, R_LoadProfile(R_OpenSiteFile(), baseEnv); R_LockBinding(install(".Library.site"), R_BaseEnv); R_LoadProfile(R_OpenInitFile(), R_GlobalEnv); to illustrate, if one puts at the top of the site profile: if (bindingIsLocked(".Library.site", .BaseNamespaceEnv)) { # site profile has already finished loading; # put code here for that case. for example, if (identical(.BaseNamespaceEnv$.GoodJob, Sys.getpid())) { warning("you appear to be using the same file for both site and user profiles, or to have sourced this file post-startup.") } warning("this file is not intended to be used in this fashion.") } else { # site profile is in process of loading; # put code here for that case. for example, message("good job! startup loaded the correct site profile.") .GoodJob <- Sys.getpid() } Not exactly best practice to rely on an implementation detail, but I found it interesting nevertheless. Regards Ben On 04/23/2014 09:31 PM, Jeff Newmiller wrote: > Regardless of whether this is possible, it seems like a bad idea (side > effects in a functional programming environment). If you want to do something > special in startup then write a different function that does that stuff and > then call the desired functions explicitly when you want them to be called. > --- > Jeff NewmillerThe . . Go Live... > DCN:Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --- > Sent from my phone. Please excuse my brevity. > > On April 23, 2014 6:11:09 PM PDT, Benjamin Tyner wrote: >> Thanks Duncan! Yes, I considered taking advantage of .First, but was >> concerned that the .First defined by the site profile could be masked >> by a >> possible .First defined by the user profile (I neglected to mention >> that >> "--no-init-profile" [sic] in the example I gave was a simplifying >> assumption, sorry about that). >> On 04/23/2014 06:55 AM, Duncan Murdoch wrote: >> >> On 22/04/2014, 8:59 PM, Benjamin Tyner wrote: >> >> Greetings, >> Is there any way to programmatically detect whether a piece of code is >> being run within the initial (Startup) sourcing of the site profile? >> For example, say I have a site profile, "/path/to/Rprofile.site". Is >> there any function "my_func" which would return different values for >> these two instances: >> Rscript --no-site-profile --no-init-profile -e >>"sys.source('/path/to/Rprofile.site', envir = .BaseNamespaceEnv); >> my_func()" >> versus: >> export R_PROFILE=/path/to/Rprofile.site >> Rscript --no-init-profile -e "my_func()" >> >> The commandArgs() function could see the different command lines and >> your >> function could deduce the difference from that. >> As far as I know, R keeps no other records of the startup process, but >> if >> you can modify other files, you could leave a record when .First was >> run, >>and see that it was run before Rprofile.site in the first case. See >> ?Startup. >> Duncan Murdoch >> >> -- >> >> >> >> >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. -- // __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] no line number from error
On Thu, 2014-04-24 at 19:29 -0400, Duncan Murdoch wrote: > On 24/04/2014, 6:40 PM, Ross Boylan wrote: > > > r1 <- totalEffect.all(dsim, simjob) > > Error: attempt to apply non-function > > > traceback() > > 1: totalEffect.all(dsim, simjob) > > > class(totalEffect.all) > > [1] "function" > > How can I find out where in totalEffect.all the error is arising? > > My only theory for the lack of line number was that totaEffect.all was > > not a function; it is. Further, previous calls to the function worked, > > and errors in it produced line numbers. After fixing a previous error > > I'm now getting this. > > > > All my code is sourced from files except for the driver. The driver > > code is in the same file that defines totalEffect.all. > > I don't understand this. If totalEffect.all is in a file that is not > sourced, where did it come from? totalEffect is sourced; by "driver" I meant the surrounding code that sets up dsim and simjob and calls totalEffect. > > Generally the rule is that if you source a function from a file you'll > get line number information attached to it, so you should see a line > number reported when an error occurs, or during debugging. There are > exceptions: you can turn this off, and by default, it is turned off for > functions defined in packages (but you can turn it on if you re-install > from source). I'm not in a package. BTW, I encountered several more instances of the error--that is, from different spots in the code--and never got a line number. Ross > > > > > In this particular case I stepped through with the debugger and found > > that in the line > > accums[[m]]$delta$accum(up - down, data) > > > > the delta object was NULL and so accum is not a function on it. But I > > hope there's a better way to locate an error. > > If the line that triggered this error was in a function that had line > number information, it sounds like it might be a bug. Can you simplify > it down to a simple reproducible example that I could look at? > > Duncan Murdoch > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] no line number from error
On 24/04/2014, 6:40 PM, Ross Boylan wrote: > r1 <- totalEffect.all(dsim, simjob) Error: attempt to apply non-function > traceback() 1: totalEffect.all(dsim, simjob) > class(totalEffect.all) [1] "function" How can I find out where in totalEffect.all the error is arising? My only theory for the lack of line number was that totaEffect.all was not a function; it is. Further, previous calls to the function worked, and errors in it produced line numbers. After fixing a previous error I'm now getting this. All my code is sourced from files except for the driver. The driver code is in the same file that defines totalEffect.all. I don't understand this. If totalEffect.all is in a file that is not sourced, where did it come from? Generally the rule is that if you source a function from a file you'll get line number information attached to it, so you should see a line number reported when an error occurs, or during debugging. There are exceptions: you can turn this off, and by default, it is turned off for functions defined in packages (but you can turn it on if you re-install from source). In this particular case I stepped through with the debugger and found that in the line accums[[m]]$delta$accum(up - down, data) the delta object was NULL and so accum is not a function on it. But I hope there's a better way to locate an error. If the line that triggered this error was in a function that had line number information, it sounds like it might be a bug. Can you simplify it down to a simple reproducible example that I could look at? Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Perceptual Mapping
Thanks Bert, Not sure how I missed that one. Best, On 4/24/14, 11:50 AM, Bert Gunter wrote: > google on "perceptual mapping with R" > > Here is one of the hits: > > http://marketing-yogi.blogspot.com/2012/12/session-4-rcode-perceptual-maps.html > > It does not look like mds. It appears to be (closely related to?) PCA. > > Cheers, > Bert > > Bert Gunter > Genentech Nonclinical Biostatistics > (650) 467-7374 > > "Data is not information. Information is not knowledge. And knowledge > is certainly not wisdom." > H. Gilbert Welch > > > > > On Thu, Apr 24, 2014 at 10:20 AM, Noah Silverman > wrote: >> Hi, >> >> Someone just asked me to analyze a fairly large data set using something >> they called "perceptual mapping". I'm not familiar with the term, but a >> quick check in Google seems to indicate that it is just another term for >> Multidimensional Scaling. However, they insist that it is something >> different. >> >> Is anybody here familiar with "perceptual mapping" with multidimensional >> data? If so, can you point to me to any examples using R? >> >> Thanks, >> >> >> -- >> *Noah Silverman, PhD* | UCLA Department of Statistics >> 8117 Math Sciences Building, Los Angeles, CA 90095 >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. -- *Noah Silverman, PhD* | UCLA Department of Statistics 8117 Math Sciences Building, Los Angeles, CA 90095 Tel: (323) 899-9595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] no line number from error
> r1 <- totalEffect.all(dsim, simjob) Error: attempt to apply non-function > traceback() 1: totalEffect.all(dsim, simjob) > class(totalEffect.all) [1] "function" How can I find out where in totalEffect.all the error is arising? My only theory for the lack of line number was that totaEffect.all was not a function; it is. Further, previous calls to the function worked, and errors in it produced line numbers. After fixing a previous error I'm now getting this. All my code is sourced from files except for the driver. The driver code is in the same file that defines totalEffect.all. In this particular case I stepped through with the debugger and found that in the line accums[[m]]$delta$accum(up - down, data) the delta object was NULL and so accum is not a function on it. But I hope there's a better way to locate an error. R 3.0.3 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Fast way to populate a sparse matrix
Convert your 'targets' matrix into a 2 column matrix with the 1st column representing the row and the 2nd the column where you want your values, then change the values to a single vector and you can just use the targets matrix as the subsetting in 1 step without (explicit) looping, for example: library(Matrix) adjM <- Matrix(0,nrow=10,ncol=10) locs <- cbind( sample(1:10), sample(1:10) ) vals <- rnorm(10) adjM[ locs ] <- vals I would expect this to be faster than looping (but have not tested). On Thu, Apr 24, 2014 at 9:45 AM, Tom Wright wrote: > I need to generate a sparse matrix. Currently I have the data held in two > regular matrices. One 'targets' holds the column subscripts while the other > 'scores' holds the values. I have written a 'toy' sample below. Using this > approach takes about 90 seconds to populate a 3 x 3 element matrix. > I'm going to need to scale this up by a factor of about 1000 so I really > need a faster way of populating the sparse matrix. > Any advice received gratefully. > > # toy code starts here > > require('Matrix') > set.seed(0) > > adjM<-Matrix(0,nrow=10,ncol=10) > > #generate the scores for the sparse matrix, with the target locations > targets<-matrix(nrow=10,ncol=5) > scores<-matrix(nrow=10,ncol=5) > for(iloc in 1:10) > { > targets[iloc,]<-sample(1:10,5,replace=FALSE) > scores[iloc,]<-rnorm(5) > } > > #populate the sparse matrix > for(iloc in 1:10) > { > adjM[iloc,targets[iloc,!is.na(targets[iloc,])]]<-scores[iloc,!is.na > (targets[iloc,])] > } > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R dataset copyrights
Many, probably even most (but I have not checked) of the datasets available in R packages have help files with a references section. That section should lead you to an original source that may have the copyright information and is what should be referenced. My understanding (but I am not a lawyer, do not play one on TV, or claim to be any type of legal expert) is that you cannot copyright facts, but you can copyright the layout and presentation of facts. So real data about the real world cannot be copyrighted, but the layout and presentation can be. So if you photocopy a page from a journal and post that you may be in trouble for copying and distributing the layout and presentation of the data, but not the data itself. But if you transform the numbers to a file to be read by the computer then you have just copied the facts which are not copyrighted. On the other hand simulated or otherwise made up datasets could be considered to be fiction and therefore able to be copyrighted. I remember hearing (but I don't remember where or when) that some textbook authors are encouraged to use simulated data instead of real data (it can have the same mean, sd, etc. as a real dataset so the interpretation is the same) in textbooks so that the copyright of the textbook also applies to the data. It is not always clear whether a dataset is fact or simulated, so it is best to obtain permission or check official statements from the source. Beyond what is legal you should consider what is right. Even if you don't have to cite a data source, you should try to give credit where it is due (and possibly blame if there is an error). At a minimum you should cite original sources when they can be found and also mention where you obtained the data if not from the original source. Think of the effort that people went through to collect the data and make it available to you, how would you feel if you put that much effort into something then someone else stole the credit or other rewards. Many data sources have statements on how the data can be used and it is best to follow those instructions/requests, is it really that hard to add a reference to where the data came from and how you obtained it? In some educational cases it may be better to initially hide the source of the data, for example the outliers dataset in the TeachingDemos package would be a lot less useful for its intended purposes if students were to read its help page before analyzing it, therefore I have no problem with teachers using it without telling students where it came from (and since it is simulated I could possibly claim copyright), though I would appreciate a mention after the fact (once the lesson is learned the teacher could say "by the way, this data came from ...") and I expect that others would feel similarly (I should add a note to that effect to the documentation page). But you should check the sources to see if this is specifically allowed or disallowed. I probably have not fully answered your question, but hopefully this gives a little more guidance. On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup wrote: > Hi everybody, > > I've been searching the web for quite a time now and haven't found a > satisfying answer. I was wondering if the datasets provided within the R > packages are open, and thus can be used in publications? Concretely, can the > data, for example, be exported from R and uploaded in a different format > (like csv) to a website to be accessible for students to work with the data > in SPSS or Matlab? Is it enough to cite the source or paper or do I need a > permission for every dataset? > > Thanks in advance for your replies, > Sören Gröttrup > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Unable to install rqpd
How did you try to install the package? What happens when you try? What operating system are you using? What version of R are you using? Did you try Google, and read any of the other discussions of how to install rqpd? Did you read the posting guide (linked at bottom of this and every message) and provide the necessary background information? Sarah On Thu, Apr 24, 2014 at 7:04 AM, Vishal Chari wrote: > Hello, > > I am unable to install package rqpd. I have also tried to download for > source but not able to do so. > Please help > > thank in advance > regards > vishal > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] instal tar.gz package on windows
Normally on Windows you should install the Windows binary from the *.zip file, not the source from the *.tar.gz file. If you look at a CRAN page the available files are labeled that way. You might also be interested in ?install.packages There are further instructions available on your local CRAN mirror, including: Installation of Packages Please type help("INSTALL") or help("install.packages") in R for information on how to install packages from this repository. The manual R Installation and Administration (also contained in the R base sources) explains the process in detail. Sarah On Thu, Apr 24, 2014 at 2:26 PM, KD Makatjane <23085...@nwu.ac.za> wrote: > Good evening sir/madam > My name is katleho makatjane. I am currently a B.com statistics student at > North West University Mafikeng campus. I have installed R 3.1.0 on my laptop > but my main problem is to install all necessary packages so that I may be > able to start using it for my analysis. It gives me error while trying to > install them from downloaded files. And again it can connect to the internet > to download them automatically. Can you please help me out on how to install > the R packages. I am using a 32bit windows 7 ultimate operating system > > Yours faithfully > -- Sarah Goslee http://www.functionaldiversity.org __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] instal tar.gz package on windows
Good evening sir/madam My name is katleho makatjane. I am currently a B.com statistics student at North West University Mafikeng campus. I have installed R 3.1.0 on my laptop but my main problem is to install all necessary packages so that I may be able to start using it for my analysis. It gives me error while trying to install them from downloaded files. And again it can connect to the internet to download them automatically. Can you please help me out on how to install the R packages. I am using a 32bit windows 7 ultimate operating system Yours faithfully Katleho Makatjane North West University Mafikeng Campus Department of Statistics and Economics Contact: +27734630271 Vrywaringsklousule / Disclaimer: http://www.nwu.ac.za/it/gov-man/disclaimer.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Perceptual Mapping
google on "perceptual mapping with R" Here is one of the hits: http://marketing-yogi.blogspot.com/2012/12/session-4-rcode-perceptual-maps.html It does not look like mds. It appears to be (closely related to?) PCA. Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." H. Gilbert Welch On Thu, Apr 24, 2014 at 10:20 AM, Noah Silverman wrote: > Hi, > > Someone just asked me to analyze a fairly large data set using something > they called "perceptual mapping". I'm not familiar with the term, but a > quick check in Google seems to indicate that it is just another term for > Multidimensional Scaling. However, they insist that it is something > different. > > Is anybody here familiar with "perceptual mapping" with multidimensional > data? If so, can you point to me to any examples using R? > > Thanks, > > > -- > *Noah Silverman, PhD* | UCLA Department of Statistics > 8117 Math Sciences Building, Los Angeles, CA 90095 > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Metafor: How to integrate effectsizes?
Hello! I am using the metafor package for my master's thesis as an R-newbie. While calculating effectsizes from my dataset (mean values and standarddeviations) using "escalc" shouldn't be a problem (I hope ;-)), I wonder how I could at this point integrate additional studies, which only state conhens d (no information about mean value and sds available), to calculate an overall analysis. I would be very grateful for your support! Best regards, Verena [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Perceptual Mapping
Hi, Someone just asked me to analyze a fairly large data set using something they called "perceptual mapping". I'm not familiar with the term, but a quick check in Google seems to indicate that it is just another term for Multidimensional Scaling. However, they insist that it is something different. Is anybody here familiar with "perceptual mapping" with multidimensional data? If so, can you point to me to any examples using R? Thanks, -- *Noah Silverman, PhD* | UCLA Department of Statistics 8117 Math Sciences Building, Los Angeles, CA 90095 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Unable to install rqpd
Hello, I am unable to install package rqpd. I have also tried to download for source but not able to do so. Please help thank in advance regards vishal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Remove top values from a data set
Is this what you want: > myData <- rnorm(1000) > length(myData) [1] 1000 > top90 <- quantile(myData, prob = 0.9) > low90 <- myData[myData < top90] > length(low90) [1] 900 > > Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Thu, Apr 24, 2014 at 10:55 AM, Nasrin Pak wrote: > Hi all; > > I have a data set that I want to remove the top values above 90th > percentile from it. Any suggestions? > > Thank you; > > -- > > *Nasrin Pak, MSc* > > Air Quality Scientist > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] INET_NTOA equivalent?
Thank you, el on 2014-04-24, 10:33 Martin Maechler said the following: >> "EL" == Eberhard Lisse >> on Thu, 24 Apr 2014 01:21:37 +0100 writes: > > EL> In MySQL > EL> SELECT INET_ATON('127.0.0.1') > > EL> returns the integer 2130706433 > > EL> Is there a function in R to reverse that, ie so that something like > > EL> ip <- inet_ntoa(2130706433) > > EL> would put '127.0.0.1' into ip? > > almost: > > install.packages("sfsmisc") > require("sfsmisc") > > # NTOA : > > > digitsBase(2130706433, base = 256) > Class 'basedInt'(base = 256) [1:1] >[,1] > [1,] 127 > [2,]0 > [3,]0 > [4,]1 > > # ATON : > > > as.intBase(digitsBase(2130706433, base = 256), base = 256) > 1 > 2130706433 > > > > So, an easy solution seems > > >> ip.ntoa <- function(n) paste(sfsmisc::digitsBase(n, base = 256), >> collapse=".") >> ip.ntoa(2130706433) > [1] "127.0.0.1" >> > > but that does not vectorize (work for length(n) > 1 ) > correctly. > > The correct solution then is > > ip.ntoa <- function(n) > apply(sfsmisc::digitsBase(n, base = 256), 2, paste, collapse=".") > > and that does work nicely: > >> ip.ntoa(10+ (0:10)) > > [1] "59.154.202.0" "59.154.202.1" "59.154.202.2" "59.154.202.3" > "59.154.202.4" > [6] "59.154.202.5" "59.154.202.6" "59.154.202.7" "59.154.202.8" > "59.154.202.9" > [11] "59.154.202.10" > > right ? > > -- > Martin Maechler, ETH Zurich > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta-question about R
On 04/23/14 23:22, William Dunlap wrote: Aren't those files support for named semaphores (made with sem_open())? Packages like BH and RSQLite contain calls to sem_open. Is your long-running R process using such a package? I don't think you would want to delete those files, but perhaps you can look into whatever R package creates them and see if you can modify the code to give them better names and then add those names to rkhunter's whitelist. You don't seem to understand what I'm asking. I have zero intention of deleting those files. I'm sure that my user's long-running job is creating them. What I'm asking is if ANYONE HERE knows if there is some configuration file, or command inside R, that would tell R, whatever package it's using (I assume that all packages inherit from the top-level process), when it creates files in /dev/shm, to name them something that I can use with wildcards in rkhunter's configuration file so that rkhunter ignores them. mark -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Jim Lemon Sent: Wednesday, April 23, 2014 2:18 PM To: m.r...@5-cent.us Cc: r-help@r-project.org Subject: Re: [R] meta-question about R On 04/23/2014 11:58 PM, m.r...@5-cent.us wrote: This really isn't about R, but configuring R. We're running R 3.0.2-1, the current default package, on CentOS 6.5 On a long-running job, R is creating files in /dev/shm: each set of three files are named (8 hex digits)-(4 hex digits)-(4 hex digits)-(4 hex digits)-(12 hex digits), and then sem.(same as the name)_counter_mutex, and (same as the name)_counter. For example, 156d23b0-9e67-46e2-afab-14a648252890 156d23b0-9e67-46e2-afab-14a648252890_counter sem.156d23b0-9e67-46e2-afab-14a648252890_counter_mutex Is there some way to configure R to add a prefix, say, to each of these files? We're running rkhunter (rootkit hunter) for security, and it complains about suspicious files, and I'd like some way to be able to tell it to, say, ignore R_temp.whatever Hi mark, I assume that the problem is to identify the files in /dev/shm, not to simply change your R code to tack the prefix onto the files as it produces them. As your hexadecimal digits are probably randomly generated, the solution may be to identify all the files that have "_counter_mutex" in the name, then chip off the appropriate bits to get the troublesome first name. filenames<-list.files(pattern="_counter_mutex") # function to return the two other filenames strip_fn<-function(x) { f2<-substr(x,5,nchar(x)-6) f1<-substr(f2,1,nchar(f2)-8) return(c(f1,f2)) } # get all the filenames filenames<-c(filenames,unlist(sapply(filenames,strip_fn))) # stick on the prefix newfilenames<-paste("R_temp",filenames,sep=".") # create the commands fnmove<-paste("mv",filenames,newfilenames) # move the filenames for(fn in 1:length(fnmove)) system(fnmove[fn]) Warning - I haven't tested the last bit of this, but it should work. There is probably some really neat string of heiroglyphs in a regular expression that will do this as well. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta-question about R
Duncan Murdoch wrote: > On 24/04/2014, 7:42 AM, Jim Lemon wrote: >> On 04/24/2014 08:52 PM, mark wrote: >>> On 04/23/14 23:22, William Dunlap wrote: >>> deleting those files. I'm sure that my user's long-running job is >>> creating them. What I'm asking is if ANYONE HERE knows if there is some >>> configuration file, or command inside R, that would tell R, whatever >>> package it's using (I assume that all packages inherit from the >>> top-level process), when it creates files in /dev/shm, to name them >>> something that I can use with wildcards in rkhunter's configuration >>> file so that rkhunter ignores them. >> You are correct, I didn't understand what you were asking. Doing a bit >> of searching, the sem_open function's first argument is the name of the >> file that is to be created. It doesn't sound like you are specifying >> these filenames, so it is probably a matter of finding the function that >> calls sem_open or sem_init. I would approach this by grepping the source >> code of the functions that you are calling, but as I have no idea what >> these functions are (or how many levels of function calling goes on >> before one of these two functions is called), I can't provide a >> straightforward answer. If you do find the offending function, you can >> just edit the source code to include your "R_temp" prefix, save the >> edited function, and "source" it to replace the function that is not >> providing the prefixes. > > Using debug(sem_open) is a quick way to find who is calling them. R > will break execution when it enters that function. Use the debugger > "where" command to see the calling stack. Thank you both very much - that's what I needed to know. One question, though - is there an R.conf or something, where the default is format of that filename is set? I've looked through the rpm for R-core, and what .../etc/... files are in it, and I don't see that. Is there such a config, or is that hard-coded into R itself? mark __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Remove top values from a data set
Hi all; I have a data set that I want to remove the top values above 90th percentile from it. Any suggestions? Thank you; -- *Nasrin Pak, MSc* Air Quality Scientist [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rgl and axes3d() labels
Or perhaps the documentation could be updated to clear up what works and what doesn't. It seems pretty confusing to put options in the docs that do not work as described. -Alex > On Apr 24, 2014, at 4:05 AM, Duncan Murdoch wrote: > >> On 23/04/2014, 9:02 PM, Alex Reynolds wrote: >> Unfortunately, that doesn't help as it removes axis lines. It looks like >> I can't use segments3d() without knowing what the bounds are of the >> current axes and I don't know what to call to expose those. >> >> Thanks again for your help, though, I appreciate it. Hopefully this gets >> fixed in a future release! > > There is no bug, so it won't be fixed. > > Duncan Murdoch > >> >> -Alex >> >> >> On Wed, Apr 23, 2014 at 5:34 PM, Duncan Murdoch >> mailto:murdoch.dun...@gmail.com>> wrote: >> >>On 23/04/2014, 7:51 PM, Alex Reynolds wrote: >> >>I am making an rgl-based 3d plot. It works fine, except when I >>try to >>remove axis value labels and tick marks with axes3d(labels=FALSE, >>ticks=FALSE): >> >>--- >>rgl.open() >>offset <- 50 >>par3d(windowRect=c(offset, offset, 1280+offset, 1280+offset)) >>rm(offset) >>rgl.clear() >>rgl.viewpoint(theta=__thetaStart, phi=30, fov=30, zoom=1) >>spheres3d(df$PC1, df$PC2, df$PC3, radius=featureRadius, >>color=df$rColor, >>alpha=featureTransparency, shininess=featureShininess) >>aspect3d(1, 1, 1) >> >>/* -- */ >>axes3d(col='black', box=FALSE, labels=FALSE, ticks=FALSE) >>/* -- */ >> >>title3d("", "", "PCoA1", "PCoA2", "PCoA3", col='black', line=1) >>texts3d(df$PC1, df$PC2, df$PC3, text=df$ctName, color="blue", >>adj=c(0,0)) >>bg3d("white") >>rgl.clear(type='lights') >>rgl.light(-45, 20, ambient='black', diffuse='#dd', >>specular='white') >>rgl.light(60, 30, ambient='#dd', diffuse='#dd', >>specular='black') >>filename <- paste("results/PCoA.labeled.__pdf", sep="") >>rgl.postscript(filename, fmt="pdf") >>--- >> >>When I run this code, these flags are ignored and I still get >>axis labels >>and tick marks. What am I misunderstanding about the documentation? >> >> >>If you specify edges="bbox" (the default), labels is ignored, and >>the bbox3d() function is used to draw the axes. There's no ticks >>argument, so it'll be absorbed by the ... argument. >> >>I don't know what you want, but you might get it with >> >> axes3d(edges=c("x", "y", "z"), col='black', box=FALSE, >>labels=FALSE, tick=FALSE) >> >>This won't join the axis lines at the lower corner; if that's what >>you want, I'd just draw them explicitly using segments3d. >> >>BTW, mixing rgl.* functions with *3d functions is likely to give you >>strange results. I don't recommend it. >> >>Duncan Murdoch > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] mvpart question - how to calculate deviance explained by variables?
âlibrary(mvpart) The r code I used: mvpart(ept~cond+phlab+doc+episub+embed+woodtot+shade+Q+stgrad+veldpth, data=mydata, method="anova", xv="1se", xval=5, xvmult=1000)â Part of the tabular output of the tree: *1) root 295 3905.9860 4.806780 * * 2) cond>=194.15 77 491.2468 2.493506 * 4) cond>=309.7 25 62.1600 1.44 * 5) cond< 309.7 52 388. 3.00 * * 3) cond< 194.15 218 2857.1560 5.623853 * *6) embed>=82.5 114 891.9649 4.017544 * Is there a convenient way to calculate the deviance explained by each variables? For instance, I did it manually for one variable in one split as below: the deviance explained by cond = *1 â (2857+491)/3905.98 = 0.1426* Thank you. - Kumar Mainali ᧠[[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hi , Is it possible select a different number of rows by each group with R????
Hi Marta, If your first dataset "field2" is greater than the number of rows for a particular "field1" in second dataset, this error could happen. e.g. using modified dat1: dat1 <- structure(list(field1 = 1:3, field2 = c(3L, 20L, 4L)), .Names = c("field1", "field2"), class = "data.frame", row.names = c(NA, -3L)) lapply(split(dat2,dat2$field1),function(x) x[sample(1:nrow(x),dat1$field2[!is.na(match(dat1$field1,x$field1))],replace=FALSE),]) #Error in sample.int(length(x), size, replace, prob) : # cannot take a sample larger than the population when 'replace = FALSE' #In that case, res <- do.call(rbind, lapply(split(dat2, dat2$field1), function(x) { length1 <- dat1$field2[!is.na(match(dat1$field1, x$field1))] length2 <- if (length1 >= nrow(x)) nrow(x) else length1 x[sample(nrow(x), length2, replace = FALSE), ] })) ##instead of randomly selecting 20 rows for field1==2 in dat2, the above code selected the maximum number of rows nrow(dat2[dat2$field1==2,]) #[1] 8 res[1:2,] # field1 field3 field4 field5 #1.8 1 0.67 Sp Jm2 #1.6 1 0.58 Sp Rm6 A.K. Hi Arun, Thanks for your suggestions. I tried your new script, with a little sample works well. However when I tried with the huge database, the script doesn't work. ´ Error in sample.int(length(x), size, replace, prob) : cannot take a sample larger than the population when 'replace = FALSE' On Wednesday, April 23, 2014 11:01 PM, arun wrote: Hi Marta, If you need random selection, you could use: do.call(rbind,lapply(split(dat2,dat2$field1),function(x) x[sample(1:nrow(x),dat1$field2[!is.na(match(dat1$field1,x$field1))],replace=FALSE),])) A.K. On Tuesday, April 22, 2014 1:45 PM, arun wrote: Hi Marta, It's not clear whether you wanted to select the first "n" rows specified by field2 in the first dataset or just random rows. ##using a modified example if my guess is correct dat1 <- structure(list(field1 = 1:3, field2 = c(3L, 6L, 4L)), .Names = c("field1", "field2"), class = "data.frame", row.names = c(NA, -3L)) dat2 <- structure(list(field1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), field3 = c(0.375, 0.416667, 0.458333, 0.5, 0.541667, 0.58, 0.625, 0.67, 0.708333, 0.75, 0.791667, 0.83, 0.875, 0.58, 0.625, 0.67, 0.708333, 0.75, 0.791667, 0.83, 0.875, 0.708333, 0.75, 0.791667, 0.83, 0.875), field4 = c("Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp"), field5 = c("Rm1", "Rm2", "Rm3", "Rm4", "Rm5", "Rm6", "Jm1", "Jm2", "Jm3", "Jm4", "Jm5", "Jm6", "Jm7", "Rm6", "Jm1", "Jm2", "Jm3", "Jm4", "Jm5", "Jm6", "Jm7", "Jm3", "Jm4", "Jm5", "Jm6", "Jm7")), .Names = c("field1", "field3", "field4", "field5"), class = "data.frame", row.names = c(NA, -26L)) ##for selecting the first 'n' rows dat2New <- merge(dat1,dat2,by="field1") library(plyr) res1 <- ddply(dat2New,.(field1),function(x) head(x,unique(x$field2)))[,-2] #or res2 <- dat2[with(dat1,rep(match(field1, dat2$field1),field2)+sequence(field2)-1),] A.K. Sorry, I think now the message is correct. Hi , Is it possible select a different number of rows by each group with R I must to select different number (specific quantity in field2:Table1) of rows in each group(field1:Table2). I have these 2 tables: Table1 field1 field2 1 3 2 6 3 9 4 3 5 3 6 3 7 3 8 9 9 6 10 3 11 3 12 3 13 3 14 3 Table2 field1 field3 field4 field5 1 0.375 Sp Rm1 1 0.416667 Sp Rm2 1 0.458333 Sp Rm3 1 0.5 Sp Rm4 1 0.541667 Sp Rm5 1 0.58 Sp Rm6 1 0.625 Sp Jm1 1 0.67 Sp Jm2 1 0.708333 Sp Jm3 1 0.75 Sp Jm4 1 0.791667 Sp Jm5 1 0.83 Sp Jm6 1 0.875 Sp Jm7 thx!!! On Monday, April 21, 2014 4:02 PM, Marta Tobeña wrote: Hi , Is it possible select a different number of rows by each group with R I must to select different number (specific quantity in field2:Table1) of rows in each group(field1:Table2). I have these 2 tables:Table1Table2field1field2field1field3field4field51310.375SpRm12610.416667SpRm23910.458333SpRm34310.5SpRm45310.541667SpRm56310.58SpRm67310.625SpJm18910.67SpJm29610.708333SpJm310310.75SpJm411310.791667SpJm512310.83SpJm613310.875SpJm714320.916667SpJm820.958333SpJm921SpJm1021.041667SpJm1121.08SpJm1221.125SpJm1321.17SpJm1421.208333SpJm1521.25SpJm1621.291667SpJm1721.33SpJm18Thanks youMarta [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do rea
[R] Fast way to populate a sparse matrix
I need to generate a sparse matrix. Currently I have the data held in two regular matrices. One 'targets' holds the column subscripts while the other 'scores' holds the values. I have written a 'toy' sample below. Using this approach takes about 90 seconds to populate a 3 x 3 element matrix. I'm going to need to scale this up by a factor of about 1000 so I really need a faster way of populating the sparse matrix. Any advice received gratefully. # toy code starts here require('Matrix') set.seed(0) adjM<-Matrix(0,nrow=10,ncol=10) #generate the scores for the sparse matrix, with the target locations targets<-matrix(nrow=10,ncol=5) scores<-matrix(nrow=10,ncol=5) for(iloc in 1:10) { targets[iloc,]<-sample(1:10,5,replace=FALSE) scores[iloc,]<-rnorm(5) } #populate the sparse matrix for(iloc in 1:10) { adjM[iloc,targets[iloc,!is.na(targets[iloc,])]]<-scores[iloc,!is.na (targets[iloc,])] } [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta-question about R
On 24/04/2014 9:56 AM, m.r...@5-cent.us wrote: Duncan Murdoch wrote: > On 24/04/2014, 7:42 AM, Jim Lemon wrote: >> On 04/24/2014 08:52 PM, mark wrote: >>> On 04/23/14 23:22, William Dunlap wrote: >>> deleting those files. I'm sure that my user's long-running job is >>> creating them. What I'm asking is if ANYONE HERE knows if there is some >>> configuration file, or command inside R, that would tell R, whatever >>> package it's using (I assume that all packages inherit from the >>> top-level process), when it creates files in /dev/shm, to name them >>> something that I can use with wildcards in rkhunter's configuration >>> file so that rkhunter ignores them. >> You are correct, I didn't understand what you were asking. Doing a bit >> of searching, the sem_open function's first argument is the name of the >> file that is to be created. It doesn't sound like you are specifying >> these filenames, so it is probably a matter of finding the function that >> calls sem_open or sem_init. I would approach this by grepping the source >> code of the functions that you are calling, but as I have no idea what >> these functions are (or how many levels of function calling goes on >> before one of these two functions is called), I can't provide a >> straightforward answer. If you do find the offending function, you can >> just edit the source code to include your "R_temp" prefix, save the >> edited function, and "source" it to replace the function that is not >> providing the prefixes. > > Using debug(sem_open) is a quick way to find who is calling them. R > will break execution when it enters that function. Use the debugger > "where" command to see the calling stack. Thank you both very much - that's what I needed to know. One question, though - is there an R.conf or something, where the default is format of that filename is set? I've looked through the rpm for R-core, and what .../etc/... files are in it, and I don't see that. Is there such a config, or is that hard-coded into R itself? There isn't an R.conf file. Jim told you how the filename is set in the low level sem_open; you'll have to look at the source of the caller to see how it determines the name it uses. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CHAID in R
Hi, I want to implement CHAID in R, but at this point am not sure how to go about it. Would be glad if someone please helps me out with it. I am attaching the data set for your perusal. The variable in the 1st column is the dependent variable. Thanks, Preetam -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] metafor - rstudent(res) - omitted rows
At 11:56 22/04/2014, Dipl. Kfm Dominik Wagner MSc; MSc wrote: Dear all, I am quite new to R. Now my following easy question. I use metafor and performed an outlier test with rstudent(res). This is resulting in 1000 rows of 1578 and 578 omitted rows (starting with row 598). 1. How can I display all 1578 rows in R-studio? Because in the standardized residual plot it starts with study 1 (see attachment). In R-studio with row 598. 2. How can I just plot the standardized residuals with manipulated x-axis to see every single study? I cannot help with your Rstudio probelm as I do not use it but as far as your plotting question is concerned: 1 - do you really want to see all of the residuals? Why not just keep the ones outside the range -2 to +2 which you might then need to study further 2 - the pictures would probably be clearer if you identify and do not print out the two studies with r very close to -1 as they are compressing everything else 3 - hollow circles are often a good idea when you have overprinting. Thank you very much for your help. Cordially Dominik -- _ *Dipl.-Kfm. Dominik Wagner MSc. MSc.* Content-Type: application/pdf; name="Rplot.pdf" Content-Disposition: attachment; filename="Rplot.pdf" X-Attachment-Id: f_hub2q8dv0 Michael Dewey i...@aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta-question about R
On 24/04/2014, 7:42 AM, Jim Lemon wrote: On 04/24/2014 08:52 PM, mark wrote: On 04/23/14 23:22, William Dunlap wrote: Aren't those files support for named semaphores (made with sem_open())? Packages like BH and RSQLite contain calls to sem_open. Is your long-running R process using such a package? I don't think you would want to delete those files, but perhaps you can look into whatever R package creates them and see if you can modify the code to give them better names and then add those names to rkhunter's whitelist. You don't seem to understand what I'm asking. I have zero intention of deleting those files. I'm sure that my user's long-running job is creating them. What I'm asking is if ANYONE HERE knows if there is some configuration file, or command inside R, that would tell R, whatever package it's using (I assume that all packages inherit from the top-level process), when it creates files in /dev/shm, to name them something that I can use with wildcards in rkhunter's configuration file so that rkhunter ignores them. mark Hi mark, You are correct, I didn't understand what you were asking. Doing a bit of searching, the sem_open function's first argument is the name of the file that is to be created. It doesn't sound like you are specifying these filenames, so it is probably a matter of finding the function that calls sem_open or sem_init. I would approach this by grepping the source code of the functions that you are calling, but as I have no idea what these functions are (or how many levels of function calling goes on before one of these two functions is called), I can't provide a straightforward answer. If you do find the offending function, you can just edit the source code to include your "R_temp" prefix, save the edited function, and "source" it to replace the function that is not providing the prefixes. Using debug(sem_open) is a quick way to find who is calling them. R will break execution when it enters that function. Use the debugger "where" command to see the calling stack. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] meta-question about R
On 04/24/2014 08:52 PM, mark wrote: On 04/23/14 23:22, William Dunlap wrote: Aren't those files support for named semaphores (made with sem_open())? Packages like BH and RSQLite contain calls to sem_open. Is your long-running R process using such a package? I don't think you would want to delete those files, but perhaps you can look into whatever R package creates them and see if you can modify the code to give them better names and then add those names to rkhunter's whitelist. You don't seem to understand what I'm asking. I have zero intention of deleting those files. I'm sure that my user's long-running job is creating them. What I'm asking is if ANYONE HERE knows if there is some configuration file, or command inside R, that would tell R, whatever package it's using (I assume that all packages inherit from the top-level process), when it creates files in /dev/shm, to name them something that I can use with wildcards in rkhunter's configuration file so that rkhunter ignores them. mark Hi mark, You are correct, I didn't understand what you were asking. Doing a bit of searching, the sem_open function's first argument is the name of the file that is to be created. It doesn't sound like you are specifying these filenames, so it is probably a matter of finding the function that calls sem_open or sem_init. I would approach this by grepping the source code of the functions that you are calling, but as I have no idea what these functions are (or how many levels of function calling goes on before one of these two functions is called), I can't provide a straightforward answer. If you do find the offending function, you can just edit the source code to include your "R_temp" prefix, save the edited function, and "source" it to replace the function that is not providing the prefixes. Jim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Request for R " Initial value of MLE"
> Sir I have this problem, > > > res <- > maxLik(logLik=loglik1,start=c(a=1.5,b=1.5,c=1.5,dee=2),method="BFGS") > There were 50 or more warnings (use warnings() to see the first 50) > > summary(res) > "Maximum Likelihood estimation > BFGS maximisation, 0 iterations > Return code 100: Initial value out of range." > > Dear sir how we give the initial value to estimate the parameters. i) Avoid cross-posting; some folk get a bit snippy about that. ii) read the help page for maxLik and look for an argument specifying the initial value of parameters S Ellison *** This email and any attachments are confidential. Any use...{{dropped:8}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] rgl and axes3d() labels
On 23/04/2014, 9:02 PM, Alex Reynolds wrote: Unfortunately, that doesn't help as it removes axis lines. It looks like I can't use segments3d() without knowing what the bounds are of the current axes and I don't know what to call to expose those. Thanks again for your help, though, I appreciate it. Hopefully this gets fixed in a future release! There is no bug, so it won't be fixed. Duncan Murdoch -Alex On Wed, Apr 23, 2014 at 5:34 PM, Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote: On 23/04/2014, 7:51 PM, Alex Reynolds wrote: I am making an rgl-based 3d plot. It works fine, except when I try to remove axis value labels and tick marks with axes3d(labels=FALSE, ticks=FALSE): --- rgl.open() offset <- 50 par3d(windowRect=c(offset, offset, 1280+offset, 1280+offset)) rm(offset) rgl.clear() rgl.viewpoint(theta=__thetaStart, phi=30, fov=30, zoom=1) spheres3d(df$PC1, df$PC2, df$PC3, radius=featureRadius, color=df$rColor, alpha=featureTransparency, shininess=featureShininess) aspect3d(1, 1, 1) /* -- */ axes3d(col='black', box=FALSE, labels=FALSE, ticks=FALSE) /* -- */ title3d("", "", "PCoA1", "PCoA2", "PCoA3", col='black', line=1) texts3d(df$PC1, df$PC2, df$PC3, text=df$ctName, color="blue", adj=c(0,0)) bg3d("white") rgl.clear(type='lights') rgl.light(-45, 20, ambient='black', diffuse='#dd', specular='white') rgl.light(60, 30, ambient='#dd', diffuse='#dd', specular='black') filename <- paste("results/PCoA.labeled.__pdf", sep="") rgl.postscript(filename, fmt="pdf") --- When I run this code, these flags are ignored and I still get axis labels and tick marks. What am I misunderstanding about the documentation? If you specify edges="bbox" (the default), labels is ignored, and the bbox3d() function is used to draw the axes. There's no ticks argument, so it'll be absorbed by the ... argument. I don't know what you want, but you might get it with axes3d(edges=c("x", "y", "z"), col='black', box=FALSE, labels=FALSE, tick=FALSE) This won't join the axis lines at the lower corner; if that's what you want, I'd just draw them explicitly using segments3d. BTW, mixing rgl.* functions with *3d functions is likely to give you strange results. I don't recommend it. Duncan Murdoch __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Cramer Rao upper bound computation
Dear R users; I have a question about Cramer Rao upper/lower bounds Is it possible to compute Crammer Rao upper/lower bounds from residuals and corresponding covariance matrices ? Any suggestions will be appreciated, thanks in advance. M.O __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] INET_NTOA equivalent?
> "EL" == Eberhard Lisse > on Thu, 24 Apr 2014 01:21:37 +0100 writes: EL> In MySQL EL> SELECT INET_ATON('127.0.0.1') EL> returns the integer 2130706433 EL> Is there a function in R to reverse that, ie so that something like EL> ip <- inet_ntoa(2130706433) EL> would put '127.0.0.1' into ip? almost: install.packages("sfsmisc") require("sfsmisc") # NTOA : > digitsBase(2130706433, base = 256) Class 'basedInt'(base = 256) [1:1] [,1] [1,] 127 [2,]0 [3,]0 [4,]1 # ATON : > as.intBase(digitsBase(2130706433, base = 256), base = 256) 1 2130706433 > So, an easy solution seems > ip.ntoa <- function(n) paste(sfsmisc::digitsBase(n, base = 256), collapse=".") > ip.ntoa(2130706433) [1] "127.0.0.1" > but that does not vectorize (work for length(n) > 1 ) correctly. The correct solution then is ip.ntoa <- function(n) apply(sfsmisc::digitsBase(n, base = 256), 2, paste, collapse=".") and that does work nicely: > ip.ntoa(10+ (0:10)) [1] "59.154.202.0" "59.154.202.1" "59.154.202.2" "59.154.202.3" "59.154.202.4" [6] "59.154.202.5" "59.154.202.6" "59.154.202.7" "59.154.202.8" "59.154.202.9" [11] "59.154.202.10" right ? -- Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Derivative of expm function
> Wagner Bonat > on Wed, 23 Apr 2014 12:12:17 +0200 writes: > Hi all ! > I am look for some efficient method to compute the derivative of > exponential matrix function in R. For example, I have a simple matrix like > log.Sigma <- matrix(c(par1, rho, rho, par2),2,2) > require(Matrix) > Sigma <- expm(log.Sigma) > I want some method to compute the derivatives of Sigma in relation the > parameters par1, par2 and rho. Some idea ? The 'expm' package has slightly newer / more reliable algorithms for the matrix exponential. It also contains an expmFrechet() function which computes the Frechet derivative of the matrix exponential. I'm pretty confident -- but did not start thinking more deeply -- that this should provide the necessary parts to get partial derivatives like yours as well. Martin Maechler, ETH Zurich > Wagner Hugo Bonat > LEG - Laboratório de Estatística e Geoinformação > UFPR - Universidade Federal do Paraná __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.