Re: [R] R dataset copyrights

2014-04-24 Thread Prof Brian Ripley

On 24/04/2014 22:33, Greg Snow wrote:

Many, probably even most (but I have not checked) of the datasets
available in R packages have help files with a references section.
That section should lead you to an original source that may have the
copyright information and is what should be referenced.

My understanding (but I am not a lawyer, do not play one on TV, or
claim to be any type of legal expert) is that you cannot copyright
facts, but you can copyright the layout and presentation of facts.  So
real data about the real world cannot be copyrighted, but the layout
and presentation can be.  So if you photocopy a page from a journal
and post that you may be in trouble for copying and distributing the
layout and presentation of the data, but not the data itself.  But if
you transform the numbers to a file to be read by the computer then
you have just copied the facts which are not copyrighted.


You most likely also copied the layout (which numbers/strings are in 
which rows ...).  There are legal precedents involving telephone 
directories, for example.


There was a May 2007 thread about this: see 
https://stat.ethz.ch/pipermail/r-help/2007-May/131780.html and replies.




On the other hand simulated or otherwise made up datasets could be
considered to be fiction and therefore able to be copyrighted.  I
remember hearing (but I don't remember where or when) that some
textbook authors are encouraged to use simulated data instead of real
data (it can have the same mean, sd, etc. as a real dataset so the
interpretation is the same) in textbooks so that the copyright of the
textbook also applies to the data.  It is not always clear whether a
dataset is fact or simulated, so it is best to obtain permission or
check official statements from the source.

Beyond what is legal you should consider what is right.  Even if you
don't have to cite a data source, you should try to give credit where
it is due (and possibly blame if there is an error).  At a minimum you
should cite original sources when they can be found and also mention
where you obtained the data if not from the original source.  Think of
the effort that people went through to collect the data and make it
available to you, how would you feel if you put that much effort into
something then someone else stole the credit or other rewards.  Many
data sources have statements on how the data can be used and it is
best to follow those instructions/requests, is it really that hard to
add a reference to where the data came from and how you obtained it?

In some educational cases it may be better to initially hide the
source of the data, for example the outliers dataset in the
TeachingDemos package would be a lot less useful for its intended
purposes if students were to read its help page before analyzing it,
therefore I have no problem with teachers using it without telling
students where it came from (and since it is simulated I could
possibly claim copyright), though I would appreciate a mention after
the fact (once the lesson is learned the teacher could say "by the
way, this data came from ...") and I expect that others would feel
similarly (I should add a note to that effect to the documentation
page).  But you should check the sources to see if this is
specifically allowed or disallowed.

I probably have not fully answered your question, but hopefully this
gives a little more guidance.

On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup
 wrote:

Hi everybody,

I've been searching the web for quite a time now and haven't found a
satisfying answer. I was wondering if the datasets provided within the R
packages are open, and thus can be used in publications? Concretely, can the
data, for example, be exported from R and uploaded in a different format
(like csv) to a website to be accessible for students to work with the data
in SPSS or Matlab? Is it enough to cite the source or paper or do I need a
permission for every dataset?

Thanks in advance for your replies,
Sören Gröttrup

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] LogLikelihood of a Distribution Given Fixed Parameters

2014-04-24 Thread Rolf Turner


As usual I am too lazy to fight my way through the rather convoluted 
code presented, but it seems to me that you just want to calculate a log 
likelihood.  And that is bog-simple:


The log likelihood for i.i.d. data is just the sum of log f(y_i) where 
the y_i are your observed values and f() is the density function of the 
distribution that you have in mind.


Where there is (right) censoring you take the sum of log f(y_i) over all
the non-censored values and then add k*(1-F(cens.time)) where k is the 
number of censored values and F() is the cumulative distribution 
function corresponding to f().


In your case it would appear that f(y) = dlnorm(y,1.66,0.25) and
F(y) = plnorm(y,1.66,0.25).  Note that instead of using 1-F(cens.time) 
you can use plnorm(cens.time,1.66,0.25,lower=TRUE) and that instead of 
taking logs explicitly you can set log=TRUE in the calls to dlnorm() and 
plnorm().


cheers,

Rolf Turner

On 25/04/14 07:27, Jacob Warren (RIT Student) wrote:

I'm trying to figure out if there is a way in R to get the loglikelihood of
a distribution fit to a set of data where the parameter values are fixed.
For example, I want to simulate data from a given alternate lognormal
distribution and then I will fit it to a lognormal distribution with null
parameter values to see what the likelihood of the null distribution is
given random data from the alternate distribution.

I have been using fitdistrplus for other purposes but I cannot use it to
fix both parameter values.

Here is an example of what I've been working with...

nullmu<-1.66 #set null mu
altmu<-1.58 #set alt mu
sd.log<-0.25 #set common sigma
cens.time<-6 #if simulated times are greater than this turn them into right
censored times

#simulating lognormal data (time) from altnative dist
(sim<-rlnorm(n=samplesize, meanlog=altmu, sdlog=sd.log))
#if the time was > cens.time replace time with cens.time
(sim[which(sim>cens.time)]<-cens.time)
sim

#create a variable indicating censoring
(cens<-sim)
cens[which(sim==cens.time)]<-NA
cens

#create the data frame to be passed to fitdistcens and fitdist
(x<-data.frame(left=sim,right=cens))


#if there is censored data use fitdistcens else use fitdist
ifelse(length(which(is.na(cens)))>0,
simfit<-fitdistcens(censdata=x, distr="lnorm"),
simfit<-fitdist(data=x[,1], distr="lnorm")
)

#Now I can get the loglikelihood of the MLE fitted distribution
simfit$loglik

#I want to get the loglikelihood of the distribution with the null
parameterization
#This is what I can't get to work
#I can't seem to find any function that allows me to set both parameter
values
#so I can get to loglikelihood of the of the parameterization given the data
nulldist<-fitdistcens(censdata=x, distr="lnorm", start=list(meanlog=nullmu,
sdlog=sd.log)

#Then I want to do a likelihood ratio test between the two distributions
pchisq((-2*simfit$loglik--2*nulldist$loglik), df=2, lower.tail=FALSE)


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] HELP with fonts

2014-04-24 Thread christian millan
Hi,I have been trying to make my axis fonts and axis labels fonts in bold even 
when the I write the right command. I writing font.lab=2, font.axis=2 but the 
bold fonts don't show up. Any help?
Thanks!

  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] instal tar.gz package on windows

2014-04-24 Thread Raghu

The discussion in the forum solves your issue

http://stackoverflow.com/questions/1474081/how-do-i-install-an-r-package-from-source

Raghu

On Thu 24 Apr 2014 08:26:33 PM CEST, KD Makatjane wrote:

Good evening sir/madam
My name is katleho makatjane. I am currently a B.com statistics student at 
North West University Mafikeng campus.  I have installed R 3.1.0 on my laptop 
but my main problem is to install all necessary packages so that I may be able 
to start using it for my analysis. It gives me error while trying to install 
them from downloaded files. And again it can connect to the internet to 
download them automatically. Can you please help me out on how to install the R 
packages. I am using a 32bit windows 7 ultimate operating system

Yours faithfully

Katleho Makatjane
North West University
Mafikeng Campus
Department of Statistics and Economics
Contact: +27734630271


Vrywaringsklousule / Disclaimer:  
http://www.nwu.ac.za/it/gov-man/disclaimer.html


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
--
Raghu Erapaneedi
Max Planck Institute for Molecular Biomedicine
Mammalian Cell Signalling Laboratory
Department of Vascular Cell Biology
Roentgenstr-20
D-48149 Muenster
Germany
Tel:+49(0)251-70365-223

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] LogLikelihood of a Distribution Given Fixed Parameters

2014-04-24 Thread Jacob Warren (RIT Student)
I'm trying to figure out if there is a way in R to get the loglikelihood of
a distribution fit to a set of data where the parameter values are fixed.
For example, I want to simulate data from a given alternate lognormal
distribution and then I will fit it to a lognormal distribution with null
parameter values to see what the likelihood of the null distribution is
given random data from the alternate distribution.

I have been using fitdistrplus for other purposes but I cannot use it to
fix both parameter values.

Here is an example of what I've been working with...

nullmu<-1.66 #set null mu
altmu<-1.58 #set alt mu
sd.log<-0.25 #set common sigma
cens.time<-6 #if simulated times are greater than this turn them into right
censored times

#simulating lognormal data (time) from altnative dist
(sim<-rlnorm(n=samplesize, meanlog=altmu, sdlog=sd.log))
#if the time was > cens.time replace time with cens.time
(sim[which(sim>cens.time)]<-cens.time)
sim

#create a variable indicating censoring
(cens<-sim)
cens[which(sim==cens.time)]<-NA
cens

#create the data frame to be passed to fitdistcens and fitdist
(x<-data.frame(left=sim,right=cens))


#if there is censored data use fitdistcens else use fitdist
ifelse(length(which(is.na(cens)))>0,
simfit<-fitdistcens(censdata=x, distr="lnorm"),
simfit<-fitdist(data=x[,1], distr="lnorm")
)

#Now I can get the loglikelihood of the MLE fitted distribution
simfit$loglik

#I want to get the loglikelihood of the distribution with the null
parameterization
#This is what I can't get to work
#I can't seem to find any function that allows me to set both parameter
values
#so I can get to loglikelihood of the of the parameterization given the data
nulldist<-fitdistcens(censdata=x, distr="lnorm", start=list(meanlog=nullmu,
sdlog=sd.log)

#Then I want to do a likelihood ratio test between the two distributions
pchisq((-2*simfit$loglik--2*nulldist$loglik), df=2, lower.tail=FALSE)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] detecting the sourcing of site profile on Startup versus post-Startup

2014-04-24 Thread Benjamin Tyner
Jeff,

I absolutely agree it is a bad idea to rely on side effects.

I did figure out one way to skin this cat. It relies on an the following
from line 909 of src/main/main.c,

R_LoadProfile(R_OpenSiteFile(), baseEnv);
R_LockBinding(install(".Library.site"), R_BaseEnv);
R_LoadProfile(R_OpenInitFile(), R_GlobalEnv);

to illustrate, if one puts at the top of the site profile:

   if (bindingIsLocked(".Library.site", .BaseNamespaceEnv)) {
  # site profile has already finished loading;
  # put code here for that case. for example,
  if (identical(.BaseNamespaceEnv$.GoodJob, Sys.getpid())) {
  warning("you appear to be using the same file for both site
and user profiles, or to have sourced this file post-startup.")
  }
  warning("this file is not intended to be used in this fashion.")
   } else {
  # site profile is in process of loading;
  # put code here for that case. for example,
  message("good job! startup loaded the correct site profile.")
  .GoodJob <- Sys.getpid()
   }

Not exactly best practice to rely on an implementation detail, but I
found it interesting nevertheless.

Regards
Ben


On 04/23/2014 09:31 PM, Jeff Newmiller wrote:
> Regardless of whether this is possible, it seems like a bad idea (side 
> effects in a functional programming environment). If you want to do something 
> special in startup then write a different function that does that stuff and 
> then call the desired functions explicitly when you want them to be called.
> ---
> Jeff NewmillerThe .   .  Go Live...
> DCN:Basics: ##.#.   ##.#.  Live Go...
>   Live:   OO#.. Dead: OO#..  Playing
> Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
> /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
> --- 
> Sent from my phone. Please excuse my brevity.
>
> On April 23, 2014 6:11:09 PM PDT, Benjamin Tyner  wrote:
>> Thanks  Duncan!  Yes, I considered taking advantage of .First, but was
>> concerned that the .First defined by the site profile could be masked
>> by a
>> possible .First defined by the user profile (I neglected to mention
>> that
>> "--no-init-profile"  [sic]  in  the  example  I gave was a simplifying
>>   assumption, sorry about that).
>>   On 04/23/2014 06:55 AM, Duncan Murdoch wrote:
>>
>> On 22/04/2014, 8:59 PM, Benjamin Tyner wrote:
>>
>> Greetings,
>> Is there any way to programmatically detect whether a piece of code is
>>   being run within the initial (Startup) sourcing of the site profile?
>>   For example, say I have a site profile, "/path/to/Rprofile.site". Is
>>   there any function "my_func" which would return different values for
>> these two instances:
>> Rscript --no-site-profile --no-init-profile -e
>>"sys.source('/path/to/Rprofile.site',  envir  = .BaseNamespaceEnv);
>> my_func()"
>> versus:
>> export R_PROFILE=/path/to/Rprofile.site
>> Rscript --no-init-profile -e "my_func()"
>>
>> The commandArgs() function could see the different command lines and
>> your
>> function could deduce the difference from that.
>> As far as I know, R keeps no other records of the startup process, but
>> if
>> you can modify other files, you could leave a record when .First was
>> run,
>>and see that it was run before Rprofile.site in the first case. See
>> ?Startup.
>> Duncan Murdoch
>>
>>   --
>>   
>>
>>
>> 
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


-- 
//

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] no line number from error

2014-04-24 Thread Ross Boylan
On Thu, 2014-04-24 at 19:29 -0400, Duncan Murdoch wrote:
> On 24/04/2014, 6:40 PM, Ross Boylan wrote:
> >   > r1 <- totalEffect.all(dsim, simjob)
> >   Error: attempt to apply non-function
> >   > traceback()
> >   1: totalEffect.all(dsim, simjob)
> >   > class(totalEffect.all)
> >   [1] "function"
> > How can I find out where in totalEffect.all the error is arising?
> > My only theory for the lack of line number was that totaEffect.all was
> > not a function; it is.  Further, previous calls to the function worked,
> > and errors in it produced line numbers.  After fixing a previous error
> > I'm now getting this.
> >
> > All my code is sourced from files except for the driver.  The driver
> > code is in the same file that defines totalEffect.all.
> 
> I don't understand this.  If totalEffect.all is in a file that is not 
> sourced, where did it come from?
totalEffect is sourced; by "driver" I meant the surrounding code that
sets up dsim and simjob and calls totalEffect.
> 
> Generally the rule is that if you source a function from a file you'll 
> get line number information attached to it, so you should see a line 
> number reported when an error occurs, or during debugging.  There are 
> exceptions:  you can turn this off, and by default, it is turned off for 
> functions defined in packages (but you can turn it on if you re-install 
> from source).
I'm not in a package.

BTW, I encountered several more instances of the error--that is, from
different spots in the code--and never got a line number.

Ross
> 
> >
> > In this particular case I stepped through with the debugger and found
> > that in the  line
> > accums[[m]]$delta$accum(up - down, data)
> >
> > the delta object was NULL and so accum is not a function on it.  But I
> > hope there's a better way to locate an error.
> 
> If the line that triggered this error was in a function that had line 
> number information, it sounds like it might be a bug.  Can you simplify 
> it down to a simple reproducible example that I could look at?
> 
> Duncan Murdoch
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] no line number from error

2014-04-24 Thread Duncan Murdoch

On 24/04/2014, 6:40 PM, Ross Boylan wrote:

  > r1 <- totalEffect.all(dsim, simjob)
  Error: attempt to apply non-function
  > traceback()
  1: totalEffect.all(dsim, simjob)
  > class(totalEffect.all)
  [1] "function"
How can I find out where in totalEffect.all the error is arising?
My only theory for the lack of line number was that totaEffect.all was
not a function; it is.  Further, previous calls to the function worked,
and errors in it produced line numbers.  After fixing a previous error
I'm now getting this.

All my code is sourced from files except for the driver.  The driver
code is in the same file that defines totalEffect.all.


I don't understand this.  If totalEffect.all is in a file that is not 
sourced, where did it come from?


Generally the rule is that if you source a function from a file you'll 
get line number information attached to it, so you should see a line 
number reported when an error occurs, or during debugging.  There are 
exceptions:  you can turn this off, and by default, it is turned off for 
functions defined in packages (but you can turn it on if you re-install 
from source).




In this particular case I stepped through with the debugger and found
that in the  line
accums[[m]]$delta$accum(up - down, data)

the delta object was NULL and so accum is not a function on it.  But I
hope there's a better way to locate an error.


If the line that triggered this error was in a function that had line 
number information, it sounds like it might be a bug.  Can you simplify 
it down to a simple reproducible example that I could look at?


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Perceptual Mapping

2014-04-24 Thread Noah Silverman
Thanks Bert,

Not sure how I missed that one.

Best,


On 4/24/14, 11:50 AM, Bert Gunter wrote:
> google on "perceptual mapping with R"
>
> Here is one of the hits:
>
> http://marketing-yogi.blogspot.com/2012/12/session-4-rcode-perceptual-maps.html
>
> It does not look like mds. It appears to be  (closely related to?) PCA.
>
> Cheers,
> Bert
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
> (650) 467-7374
>
> "Data is not information. Information is not knowledge. And knowledge
> is certainly not wisdom."
> H. Gilbert Welch
>
>
>
>
> On Thu, Apr 24, 2014 at 10:20 AM, Noah Silverman  
> wrote:
>> Hi,
>>
>> Someone just asked me to analyze a fairly large data set using something
>> they called "perceptual mapping".  I'm not familiar with the term, but a
>> quick check in Google seems to indicate that it is just another term for
>> Multidimensional Scaling.  However, they insist that it is something
>> different.
>>
>> Is anybody here familiar with "perceptual mapping" with multidimensional
>> data?  If so, can you point to me to any examples using R?
>>
>> Thanks,
>>
>>
>> --
>> *Noah Silverman, PhD* | UCLA Department of Statistics
>> 8117 Math Sciences Building, Los Angeles, CA 90095
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

-- 
*Noah Silverman, PhD* | UCLA Department of Statistics
8117 Math Sciences Building, Los Angeles, CA 90095
Tel: (323) 899-9595

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] no line number from error

2014-04-24 Thread Ross Boylan
 > r1 <- totalEffect.all(dsim, simjob)
 Error: attempt to apply non-function
 > traceback()
 1: totalEffect.all(dsim, simjob)
 > class(totalEffect.all)
 [1] "function"
How can I find out where in totalEffect.all the error is arising?
My only theory for the lack of line number was that totaEffect.all was
not a function; it is.  Further, previous calls to the function worked,
and errors in it produced line numbers.  After fixing a previous error
I'm now getting this.

All my code is sourced from files except for the driver.  The driver
code is in the same file that defines totalEffect.all.

In this particular case I stepped through with the debugger and found
that in the  line
accums[[m]]$delta$accum(up - down, data)

the delta object was NULL and so accum is not a function on it.  But I
hope there's a better way to locate an error.

R 3.0.3

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fast way to populate a sparse matrix

2014-04-24 Thread Greg Snow
Convert your 'targets' matrix into a 2 column matrix with the 1st
column representing the row and the 2nd the column where you want your
values, then change the values to a single vector and you can just use
the targets matrix as the subsetting in 1 step without (explicit)
looping, for example:

library(Matrix)

adjM <- Matrix(0,nrow=10,ncol=10)

locs <- cbind( sample(1:10), sample(1:10) )
vals <- rnorm(10)

adjM[ locs ] <- vals

I would expect this to be faster than looping (but have not tested).

On Thu, Apr 24, 2014 at 9:45 AM, Tom Wright  wrote:
> I need to generate a sparse matrix. Currently I have the data held in two
> regular matrices. One 'targets' holds the column subscripts while the other
> 'scores' holds the values. I have written a 'toy' sample below. Using this
> approach takes about 90 seconds to populate a 3 x 3 element matrix.
> I'm going to need to scale this up by a factor of about 1000 so I really
> need a faster way of populating the sparse matrix.
> Any advice received gratefully.
>
> # toy code starts here
>
> require('Matrix')
> set.seed(0)
>
> adjM<-Matrix(0,nrow=10,ncol=10)
>
> #generate the scores for the sparse matrix, with the target locations
> targets<-matrix(nrow=10,ncol=5)
> scores<-matrix(nrow=10,ncol=5)
> for(iloc in 1:10)
>   {
>   targets[iloc,]<-sample(1:10,5,replace=FALSE)
>   scores[iloc,]<-rnorm(5)
>   }
>
> #populate the sparse matrix
> for(iloc in 1:10)
>   {
>   adjM[iloc,targets[iloc,!is.na(targets[iloc,])]]<-scores[iloc,!is.na
> (targets[iloc,])]
>   }
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R dataset copyrights

2014-04-24 Thread Greg Snow
Many, probably even most (but I have not checked) of the datasets
available in R packages have help files with a references section.
That section should lead you to an original source that may have the
copyright information and is what should be referenced.

My understanding (but I am not a lawyer, do not play one on TV, or
claim to be any type of legal expert) is that you cannot copyright
facts, but you can copyright the layout and presentation of facts.  So
real data about the real world cannot be copyrighted, but the layout
and presentation can be.  So if you photocopy a page from a journal
and post that you may be in trouble for copying and distributing the
layout and presentation of the data, but not the data itself.  But if
you transform the numbers to a file to be read by the computer then
you have just copied the facts which are not copyrighted.

On the other hand simulated or otherwise made up datasets could be
considered to be fiction and therefore able to be copyrighted.  I
remember hearing (but I don't remember where or when) that some
textbook authors are encouraged to use simulated data instead of real
data (it can have the same mean, sd, etc. as a real dataset so the
interpretation is the same) in textbooks so that the copyright of the
textbook also applies to the data.  It is not always clear whether a
dataset is fact or simulated, so it is best to obtain permission or
check official statements from the source.

Beyond what is legal you should consider what is right.  Even if you
don't have to cite a data source, you should try to give credit where
it is due (and possibly blame if there is an error).  At a minimum you
should cite original sources when they can be found and also mention
where you obtained the data if not from the original source.  Think of
the effort that people went through to collect the data and make it
available to you, how would you feel if you put that much effort into
something then someone else stole the credit or other rewards.  Many
data sources have statements on how the data can be used and it is
best to follow those instructions/requests, is it really that hard to
add a reference to where the data came from and how you obtained it?

In some educational cases it may be better to initially hide the
source of the data, for example the outliers dataset in the
TeachingDemos package would be a lot less useful for its intended
purposes if students were to read its help page before analyzing it,
therefore I have no problem with teachers using it without telling
students where it came from (and since it is simulated I could
possibly claim copyright), though I would appreciate a mention after
the fact (once the lesson is learned the teacher could say "by the
way, this data came from ...") and I expect that others would feel
similarly (I should add a note to that effect to the documentation
page).  But you should check the sources to see if this is
specifically allowed or disallowed.

I probably have not fully answered your question, but hopefully this
gives a little more guidance.

On Tue, Apr 22, 2014 at 11:29 AM, Soeren Groettrup
 wrote:
> Hi everybody,
>
> I've been searching the web for quite a time now and haven't found a
> satisfying answer. I was wondering if the datasets provided within the R
> packages are open, and thus can be used in publications? Concretely, can the
> data, for example, be exported from R and uploaded in a different format
> (like csv) to a website to be accessible for students to work with the data
> in SPSS or Matlab? Is it enough to cite the source or paper or do I need a
> permission for every dataset?
>
> Thanks in advance for your replies,
> Sören Gröttrup
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unable to install rqpd

2014-04-24 Thread Sarah Goslee
How did you try to install the package?
What happens when you try?
What operating system are you using?
What version of R are you using?

Did you try Google, and read any of the other discussions of how to
install rqpd?
Did you read the posting guide (linked at bottom of this and every
message) and provide the necessary background information?

Sarah

On Thu, Apr 24, 2014 at 7:04 AM, Vishal Chari  wrote:
> Hello,
>
>  I am unable to install package rqpd. I have also tried to download for 
> source but not able to do so.
> Please help
>
> thank in advance
> regards
> vishal
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] instal tar.gz package on windows

2014-04-24 Thread Sarah Goslee
Normally on Windows you should install the Windows binary from the
*.zip file, not the source from the *.tar.gz file.

If you look at a CRAN page the available files are labeled that way.

You might also be interested in
?install.packages

There are further instructions available on your local CRAN mirror, including:

Installation of Packages

Please type help("INSTALL") or help("install.packages") in R for
information on how to install packages from this repository. The
manual R Installation and Administration (also contained in the R base
sources) explains the process in detail.


Sarah

On Thu, Apr 24, 2014 at 2:26 PM, KD Makatjane <23085...@nwu.ac.za> wrote:
> Good evening sir/madam
> My name is katleho makatjane. I am currently a B.com statistics student at 
> North West University Mafikeng campus.  I have installed R 3.1.0 on my laptop 
> but my main problem is to install all necessary packages so that I may be 
> able to start using it for my analysis. It gives me error while trying to 
> install them from downloaded files. And again it can connect to the internet 
> to download them automatically. Can you please help me out on how to install 
> the R packages. I am using a 32bit windows 7 ultimate operating system
>
> Yours faithfully
>

-- 
Sarah Goslee
http://www.functionaldiversity.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] instal tar.gz package on windows

2014-04-24 Thread KD Makatjane
Good evening sir/madam
My name is katleho makatjane. I am currently a B.com statistics student at 
North West University Mafikeng campus.  I have installed R 3.1.0 on my laptop 
but my main problem is to install all necessary packages so that I may be able 
to start using it for my analysis. It gives me error while trying to install 
them from downloaded files. And again it can connect to the internet to 
download them automatically. Can you please help me out on how to install the R 
packages. I am using a 32bit windows 7 ultimate operating system
 
Yours faithfully
 
Katleho Makatjane
North West University
Mafikeng Campus
Department of Statistics and Economics
Contact: +27734630271
 
 
Vrywaringsklousule / Disclaimer:  
http://www.nwu.ac.za/it/gov-man/disclaimer.html 
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Perceptual Mapping

2014-04-24 Thread Bert Gunter
google on "perceptual mapping with R"

Here is one of the hits:

http://marketing-yogi.blogspot.com/2012/12/session-4-rcode-perceptual-maps.html

It does not look like mds. It appears to be  (closely related to?) PCA.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
H. Gilbert Welch




On Thu, Apr 24, 2014 at 10:20 AM, Noah Silverman  wrote:
> Hi,
>
> Someone just asked me to analyze a fairly large data set using something
> they called "perceptual mapping".  I'm not familiar with the term, but a
> quick check in Google seems to indicate that it is just another term for
> Multidimensional Scaling.  However, they insist that it is something
> different.
>
> Is anybody here familiar with "perceptual mapping" with multidimensional
> data?  If so, can you point to me to any examples using R?
>
> Thanks,
>
>
> --
> *Noah Silverman, PhD* | UCLA Department of Statistics
> 8117 Math Sciences Building, Los Angeles, CA 90095
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Metafor: How to integrate effectsizes?

2014-04-24 Thread Verena Weinbir
Hello!

I am using the metafor package for my master's thesis as an R-newbie. While
calculating effectsizes from my dataset (mean values and
standarddeviations) using "escalc" shouldn't be a problem (I hope ;-)), I
wonder how I could at this point integrate additional studies, which only
state conhens d (no information about mean value and sds available), to
calculate an overall analysis.  I would be very grateful for your support!

Best regards,

Verena

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Perceptual Mapping

2014-04-24 Thread Noah Silverman
Hi,

Someone just asked me to analyze a fairly large data set using something
they called "perceptual mapping".  I'm not familiar with the term, but a
quick check in Google seems to indicate that it is just another term for
Multidimensional Scaling.  However, they insist that it is something
different.

Is anybody here familiar with "perceptual mapping" with multidimensional
data?  If so, can you point to me to any examples using R?

Thanks,


-- 
*Noah Silverman, PhD* | UCLA Department of Statistics
8117 Math Sciences Building, Los Angeles, CA 90095

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Unable to install rqpd

2014-04-24 Thread Vishal Chari
Hello,

 I am unable to install package rqpd. I have also tried to download for source 
but not able to do so.
Please help

thank in advance
regards
vishal

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Remove top values from a data set

2014-04-24 Thread jim holtman
Is this what you want:

> myData <- rnorm(1000)
> length(myData)
[1] 1000
> top90 <- quantile(myData, prob = 0.9)
> low90 <- myData[myData < top90]
> length(low90)
[1] 900
>
>



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.


On Thu, Apr 24, 2014 at 10:55 AM, Nasrin Pak  wrote:

> Hi all;
>
> I have a data set that I want to remove the top values above 90th
> percentile from it. Any suggestions?
>
> Thank you;
>
> --
>
> *Nasrin Pak, MSc*
>
> Air Quality Scientist
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] INET_NTOA equivalent?

2014-04-24 Thread Eberhard Lisse
Thank you,

el

on 2014-04-24, 10:33 Martin Maechler said the following:
>> "EL" == Eberhard Lisse 
>> on Thu, 24 Apr 2014 01:21:37 +0100 writes:
> 
> EL> In MySQL
> EL> SELECT INET_ATON('127.0.0.1')
> 
> EL> returns the integer 2130706433
> 
> EL> Is there a function in R to reverse that, ie so that something like
> 
> EL> ip <- inet_ntoa(2130706433)
> 
> EL> would put  '127.0.0.1' into ip?
> 
> almost:
> 
>   install.packages("sfsmisc")
>   require("sfsmisc")
> 
>   # NTOA :
> 
>   > digitsBase(2130706433, base = 256)
>   Class 'basedInt'(base = 256) [1:1]
>[,1]
>   [1,]  127
>   [2,]0
>   [3,]0
>   [4,]1
> 
>   # ATON :
> 
>   > as.intBase(digitsBase(2130706433, base = 256), base = 256)
>  1 
>   2130706433 
>   > 
> 
> So, an easy solution seems
> 
> 
>> ip.ntoa <- function(n) paste(sfsmisc::digitsBase(n, base = 256), 
>> collapse=".")
>> ip.ntoa(2130706433)
> [1] "127.0.0.1"
>>
> 
> but that does not vectorize (work for  length(n) > 1 )
> correctly.
> 
> The correct solution then is
> 
> ip.ntoa <- function(n) 
> apply(sfsmisc::digitsBase(n, base = 256), 2, paste, collapse=".")
> 
> and that does work nicely:
> 
>> ip.ntoa(10+ (0:10))
> 
>  [1] "59.154.202.0"  "59.154.202.1"  "59.154.202.2"  "59.154.202.3"  
> "59.154.202.4" 
>  [6] "59.154.202.5"  "59.154.202.6"  "59.154.202.7"  "59.154.202.8"  
> "59.154.202.9" 
> [11] "59.154.202.10"
> 
> right ?
> 
> --
> Martin Maechler, ETH Zurich
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta-question about R

2014-04-24 Thread mark

On 04/23/14 23:22, William Dunlap wrote:

Aren't those files support for named semaphores (made with sem_open())?
Packages like BH and RSQLite contain calls to sem_open.   Is your long-running
R process using such a package?

I don't think you would want to delete those files, but perhaps you can look 
into
whatever R package creates them and see if you can modify the code to give
them better names and then add those names to rkhunter's whitelist.


You don't seem to understand what I'm asking. I have zero intention of 
deleting those files. I'm sure that my user's long-running job is creating 
them. What I'm asking is if ANYONE HERE knows if there is some configuration 
file, or command inside R, that would tell R, whatever package it's using (I 
assume that all packages inherit from the top-level process), when it creates 
files in /dev/shm, to name them something that I can use with wildcards in 
rkhunter's configuration file so that rkhunter ignores them.


mark




-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf
Of Jim Lemon
Sent: Wednesday, April 23, 2014 2:18 PM
To: m.r...@5-cent.us
Cc: r-help@r-project.org
Subject: Re: [R] meta-question about R

On 04/23/2014 11:58 PM, m.r...@5-cent.us wrote:

This really isn't about R, but configuring R. We're running R 3.0.2-1, the
current default package, on CentOS 6.5 On a long-running job, R is
creating files in /dev/shm: each set of three files are named (8 hex
digits)-(4 hex digits)-(4 hex digits)-(4 hex digits)-(12 hex digits), and
then sem.(same as the name)_counter_mutex, and (same as the name)_counter.

For example,
156d23b0-9e67-46e2-afab-14a648252890
156d23b0-9e67-46e2-afab-14a648252890_counter
sem.156d23b0-9e67-46e2-afab-14a648252890_counter_mutex

Is there some way to configure R to add a prefix, say, to each of these
files? We're running rkhunter (rootkit hunter) for security, and it
complains about suspicious files, and I'd like some way to be able to tell
it to, say, ignore R_temp.whatever


Hi mark,
I assume that the problem is to identify the files in /dev/shm, not to
simply change your R code to tack the prefix onto the files as it
produces them. As your hexadecimal digits are probably randomly
generated, the solution may be to identify all the files that have
"_counter_mutex" in the name, then chip off the appropriate bits to get
the troublesome first name.

filenames<-list.files(pattern="_counter_mutex")
# function to return the two other filenames
strip_fn<-function(x) {
   f2<-substr(x,5,nchar(x)-6)
   f1<-substr(f2,1,nchar(f2)-8)
   return(c(f1,f2))
}
# get all the filenames
filenames<-c(filenames,unlist(sapply(filenames,strip_fn)))
# stick on the prefix
newfilenames<-paste("R_temp",filenames,sep=".")
# create the commands
fnmove<-paste("mv",filenames,newfilenames)
# move the filenames
for(fn in 1:length(fnmove)) system(fnmove[fn])

Warning - I haven't tested the last bit of this, but it should work.
There is probably some really neat string of heiroglyphs in a regular
expression that will do this as well.

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta-question about R

2014-04-24 Thread m . roth
Duncan Murdoch wrote:
> On 24/04/2014, 7:42 AM, Jim Lemon wrote:
>> On 04/24/2014 08:52 PM, mark wrote:
>>> On 04/23/14 23:22, William Dunlap wrote:

>>> deleting those files. I'm sure that my user's long-running job is
>>> creating them. What I'm asking is if ANYONE HERE knows if there is some
>>> configuration file, or command inside R, that would tell R, whatever
>>> package it's using (I assume that all packages inherit from the
>>> top-level process), when it creates files in /dev/shm, to name them
>>> something that I can use with wildcards in rkhunter's configuration
>>> file so that rkhunter ignores them.

>> You are correct, I didn't understand what you were asking. Doing a bit
>> of searching, the sem_open function's first argument is the name of the
>> file that is to be created. It doesn't sound like you are specifying
>> these filenames, so it is probably a matter of finding the function that
>> calls sem_open or sem_init. I would approach this by grepping the source
>> code of the functions that you are calling, but as I have no idea what
>> these functions are (or how many levels of function calling goes on
>> before one of these two functions is called), I can't provide a
>> straightforward answer. If you do find the offending function, you can
>> just edit the source code to include your "R_temp" prefix, save the
>> edited function, and "source" it to replace the function that is not
>> providing the prefixes.
>
> Using debug(sem_open) is a quick way to find who is calling them.  R
> will break execution when it enters that function.  Use the debugger
> "where" command to see the calling stack.

Thank you both very much - that's what I needed to know. One question,
though - is there an R.conf or something, where the default is format of
that filename is set? I've looked through the rpm for R-core, and what
.../etc/... files are in it, and I don't see that. Is there such a config,
or is that hard-coded into R itself?

mark

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Remove top values from a data set

2014-04-24 Thread Nasrin Pak
Hi all;

I have a data set that I want to remove the top values above 90th
percentile from it. Any suggestions?

Thank you;

-- 

*Nasrin Pak, MSc*

Air Quality Scientist

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl and axes3d() labels

2014-04-24 Thread Alex Reynolds
Or perhaps the documentation could be updated to clear up what works and what 
doesn't. It seems pretty confusing to put options in the docs that do not work 
as described.

-Alex

> On Apr 24, 2014, at 4:05 AM, Duncan Murdoch  wrote:
> 
>> On 23/04/2014, 9:02 PM, Alex Reynolds wrote:
>> Unfortunately, that doesn't help as it removes axis lines. It looks like
>> I can't use segments3d() without knowing what the bounds are of the
>> current axes and I don't know what to call to expose those.
>> 
>> Thanks again for your help, though, I appreciate it. Hopefully this gets
>> fixed in a future release!
> 
> There is no bug, so it won't be fixed.
> 
> Duncan Murdoch
> 
>> 
>> -Alex
>> 
>> 
>> On Wed, Apr 23, 2014 at 5:34 PM, Duncan Murdoch
>> mailto:murdoch.dun...@gmail.com>> wrote:
>> 
>>On 23/04/2014, 7:51 PM, Alex Reynolds wrote:
>> 
>>I am making an rgl-based 3d plot. It works fine, except when I
>>try to
>>remove axis value labels and tick marks with axes3d(labels=FALSE,
>>ticks=FALSE):
>> 
>>---
>>rgl.open()
>>offset <- 50
>>par3d(windowRect=c(offset, offset, 1280+offset, 1280+offset))
>>rm(offset)
>>rgl.clear()
>>rgl.viewpoint(theta=__thetaStart, phi=30, fov=30, zoom=1)
>>spheres3d(df$PC1, df$PC2, df$PC3, radius=featureRadius,
>>color=df$rColor,
>>alpha=featureTransparency, shininess=featureShininess)
>>aspect3d(1, 1, 1)
>> 
>>/* -- */
>>axes3d(col='black', box=FALSE, labels=FALSE, ticks=FALSE)
>>/* -- */
>> 
>>title3d("", "", "PCoA1", "PCoA2", "PCoA3", col='black', line=1)
>>texts3d(df$PC1, df$PC2, df$PC3, text=df$ctName, color="blue",
>>adj=c(0,0))
>>bg3d("white")
>>rgl.clear(type='lights')
>>rgl.light(-45, 20, ambient='black', diffuse='#dd',
>>specular='white')
>>rgl.light(60, 30, ambient='#dd', diffuse='#dd',
>>specular='black')
>>filename <- paste("results/PCoA.labeled.__pdf", sep="")
>>rgl.postscript(filename, fmt="pdf")
>>---
>> 
>>When I run this code, these flags are ignored and I still get
>>axis labels
>>and tick marks. What am I misunderstanding about the documentation?
>> 
>> 
>>If you specify edges="bbox" (the default), labels is ignored, and
>>the bbox3d() function is used to draw the axes.  There's no ticks
>>argument, so it'll be absorbed by the ... argument.
>> 
>>I don't know what you want, but you might get it with
>> 
>>  axes3d(edges=c("x", "y", "z"), col='black', box=FALSE,
>>labels=FALSE, tick=FALSE)
>> 
>>This won't join the axis lines at the lower corner; if that's what
>>you want, I'd just draw them explicitly using segments3d.
>> 
>>BTW, mixing rgl.* functions with *3d functions is likely to give you
>>strange results.  I don't recommend it.
>> 
>>Duncan Murdoch
> 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] mvpart question - how to calculate deviance explained by variables?

2014-04-24 Thread Kumar Mainali
​library(mvpart)

The r code I used:

mvpart(ept~cond+phlab+doc+episub+embed+woodtot+shade+Q+stgrad+veldpth,
data=mydata, method="anova", xv="1se", xval=5, xvmult=1000)​


Part of the tabular output of the tree:

*1) root 295 3905.9860 4.806780 *

*   2) cond>=194.15 77  491.2468 2.493506 *

 4) cond>=309.7 25   62.1600 1.44 *

 5) cond< 309.7 52  388. 3.00 *

*   3) cond< 194.15 218 2857.1560 5.623853 *

 *6) embed>=82.5 114  891.9649 4.017544  *

Is there a convenient way to calculate the deviance explained by each
variables? For instance, I did it manually for one variable in one split as
below:
the deviance explained by cond =
*1 – (2857+491)/3905.98 = 0.1426*


Thank you.

- Kumar Mainali
ᐧ

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Hi , Is it possible select a different number of rows by each group with R????

2014-04-24 Thread arun
Hi Marta,

If your first dataset "field2" is greater than the number of rows for a 
particular "field1" in second dataset, this error could happen.

e.g. using modified dat1:
dat1 <- structure(list(field1 = 1:3, field2 = c(3L, 20L, 4L)), .Names = 
c("field1", 
"field2"), class = "data.frame", row.names = c(NA, -3L))




 lapply(split(dat2,dat2$field1),function(x) 
x[sample(1:nrow(x),dat1$field2[!is.na(match(dat1$field1,x$field1))],replace=FALSE),])
#Error in sample.int(length(x), size, replace, prob) : 
#  cannot take a sample larger than the population when 'replace = FALSE'

#In that case, 
res <- do.call(rbind, lapply(split(dat2, dat2$field1), function(x) {
    length1 <- dat1$field2[!is.na(match(dat1$field1, x$field1))]
    length2 <- if (length1 >= nrow(x)) 
    nrow(x) else length1
    x[sample(nrow(x), length2, replace = FALSE), ]
}))



##instead of randomly selecting 20 rows for field1==2 in dat2, the above code 
selected the maximum number of rows

nrow(dat2[dat2$field1==2,])
#[1] 8


res[1:2,]
#    field1   field3 field4 field5
#1.8  1 0.67 Sp    Jm2
#1.6  1 0.58 Sp    Rm6



A.K.

Hi Arun,

Thanks for your suggestions.

I tried your new script, with a little sample works well. However when I tried 
with the huge database, the script doesn't work. ´

Error in sample.int(length(x), size, replace, prob) :
  cannot take a sample larger than the population when 'replace = FALSE' 



On Wednesday, April 23, 2014 11:01 PM, arun  wrote:


Hi Marta,
If you need random selection, you could use:

do.call(rbind,lapply(split(dat2,dat2$field1),function(x) 
x[sample(1:nrow(x),dat1$field2[!is.na(match(dat1$field1,x$field1))],replace=FALSE),]))
A.K.



On Tuesday, April 22, 2014 1:45 PM, arun  wrote:


Hi Marta,
It's not clear whether you wanted to select the first "n" rows specified by 
field2 in the first dataset or just random rows.
##using a modified example if my guess is correct

dat1 <- structure(list(field1 = 1:3, field2 = c(3L, 6L, 4L)), .Names = 
c("field1", 
"field2"), class = "data.frame", row.names = c(NA, -3L))



dat2 <- structure(list(field1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
3L), field3 = c(0.375, 0.416667, 0.458333, 0.5, 0.541667, 0.58, 
0.625, 0.67, 0.708333, 0.75, 0.791667, 0.83, 0.875, 0.58, 
0.625, 0.67, 0.708333, 0.75, 0.791667, 0.83, 0.875, 0.708333, 
0.75, 0.791667, 0.83, 0.875), field4 = c("Sp", "Sp", "Sp", 
"Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", 
"Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", "Sp", 
"Sp"), field5 = c("Rm1", "Rm2", "Rm3", "Rm4", "Rm5", "Rm6", "Jm1", 
"Jm2", "Jm3", "Jm4", "Jm5", "Jm6", "Jm7", "Rm6", "Jm1", "Jm2", 
"Jm3", "Jm4", "Jm5", "Jm6", "Jm7", "Jm3", "Jm4", "Jm5", "Jm6", 
"Jm7")), .Names = c("field1", "field3", "field4", "field5"), class = 
"data.frame", row.names = c(NA, 
-26L))


##for selecting the first 'n' rows

dat2New <- merge(dat1,dat2,by="field1")
library(plyr)
res1 <- ddply(dat2New,.(field1),function(x) head(x,unique(x$field2)))[,-2]


#or
res2 <- dat2[with(dat1,rep(match(field1, 
dat2$field1),field2)+sequence(field2)-1),]

A.K.


 Sorry, I think now the message is correct.

Hi , Is it possible select a different number of rows by each group with R
I must to select different number (specific quantity in field2:Table1) of rows 
in each group(field1:Table2).
I have these 2 tables:

Table1
field1 field2
1 3
2 6
3 9
4 3
5 3
6 3
7 3
8 9
9 6
10 3
11 3
12 3
13 3
14 3
   
Table2
field1 field3 field4 field5
1 0.375 Sp Rm1
1 0.416667 Sp Rm2
1 0.458333 Sp Rm3
1 0.5    Sp Rm4
1 0.541667 Sp Rm5
1 0.58 Sp Rm6
1 0.625 Sp Jm1
1 0.67 Sp Jm2
1 0.708333 Sp Jm3
1 0.75   Sp Jm4
1 0.791667 Sp Jm5
1 0.83 Sp Jm6
1 0.875 Sp Jm7

thx!!! 



On Monday, April 21, 2014 4:02 PM, Marta Tobeña  wrote:

Hi , Is it possible select a different number of rows by each group with R
I must to select different number (specific quantity in field2:Table1) of rows 
in each group(field1:Table2). I have these 2 
tables:Table1Table2field1field2field1field3field4field51310.375SpRm12610.416667SpRm23910.458333SpRm34310.5SpRm45310.541667SpRm56310.58SpRm67310.625SpJm18910.67SpJm29610.708333SpJm310310.75SpJm411310.791667SpJm512310.83SpJm613310.875SpJm714320.916667SpJm820.958333SpJm921SpJm1021.041667SpJm1121.08SpJm1221.125SpJm1321.17SpJm1421.208333SpJm1521.25SpJm1621.291667SpJm1721.33SpJm18Thanks
 youMarta               
    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do rea

[R] Fast way to populate a sparse matrix

2014-04-24 Thread Tom Wright
I need to generate a sparse matrix. Currently I have the data held in two
regular matrices. One 'targets' holds the column subscripts while the other
'scores' holds the values. I have written a 'toy' sample below. Using this
approach takes about 90 seconds to populate a 3 x 3 element matrix.
I'm going to need to scale this up by a factor of about 1000 so I really
need a faster way of populating the sparse matrix.
Any advice received gratefully.

# toy code starts here

require('Matrix')
set.seed(0)

adjM<-Matrix(0,nrow=10,ncol=10)

#generate the scores for the sparse matrix, with the target locations
targets<-matrix(nrow=10,ncol=5)
scores<-matrix(nrow=10,ncol=5)
for(iloc in 1:10)
  {
  targets[iloc,]<-sample(1:10,5,replace=FALSE)
  scores[iloc,]<-rnorm(5)
  }

#populate the sparse matrix
for(iloc in 1:10)
  {
  adjM[iloc,targets[iloc,!is.na(targets[iloc,])]]<-scores[iloc,!is.na
(targets[iloc,])]
  }

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta-question about R

2014-04-24 Thread Duncan Murdoch

On 24/04/2014 9:56 AM, m.r...@5-cent.us wrote:

Duncan Murdoch wrote:
> On 24/04/2014, 7:42 AM, Jim Lemon wrote:
>> On 04/24/2014 08:52 PM, mark wrote:
>>> On 04/23/14 23:22, William Dunlap wrote:

>>> deleting those files. I'm sure that my user's long-running job is
>>> creating them. What I'm asking is if ANYONE HERE knows if there is some
>>> configuration file, or command inside R, that would tell R, whatever
>>> package it's using (I assume that all packages inherit from the
>>> top-level process), when it creates files in /dev/shm, to name them
>>> something that I can use with wildcards in rkhunter's configuration
>>> file so that rkhunter ignores them.

>> You are correct, I didn't understand what you were asking. Doing a bit
>> of searching, the sem_open function's first argument is the name of the
>> file that is to be created. It doesn't sound like you are specifying
>> these filenames, so it is probably a matter of finding the function that
>> calls sem_open or sem_init. I would approach this by grepping the source
>> code of the functions that you are calling, but as I have no idea what
>> these functions are (or how many levels of function calling goes on
>> before one of these two functions is called), I can't provide a
>> straightforward answer. If you do find the offending function, you can
>> just edit the source code to include your "R_temp" prefix, save the
>> edited function, and "source" it to replace the function that is not
>> providing the prefixes.
>
> Using debug(sem_open) is a quick way to find who is calling them.  R
> will break execution when it enters that function.  Use the debugger
> "where" command to see the calling stack.

Thank you both very much - that's what I needed to know. One question,
though - is there an R.conf or something, where the default is format of
that filename is set? I've looked through the rpm for R-core, and what
.../etc/... files are in it, and I don't see that. Is there such a config,
or is that hard-coded into R itself?


There isn't an R.conf file.  Jim told you how the filename is set in the 
low level sem_open; you'll have to look at the source of the caller to 
see how it determines the name it uses.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CHAID in R

2014-04-24 Thread Preetam Pal
Hi,
I want to implement CHAID in R, but at this point am not sure how to go
about it.
Would be glad if someone please helps me out with it. I am attaching the
data set for your perusal.
The variable in the 1st column is the dependent variable.
Thanks,
Preetam

-- 
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year, Room No. N-114
Statistics Division,   C.V.Raman
Hall
Indian Statistical Institute, B.H.O.S.
Kolkata.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] metafor - rstudent(res) - omitted rows

2014-04-24 Thread Michael Dewey

At 11:56 22/04/2014, Dipl. Kfm Dominik Wagner MSc; MSc wrote:

Dear all,

I am quite new to R. Now my following easy question.

I use metafor and performed an outlier test with rstudent(res).
This is resulting in 1000 rows of 1578 and 578 omitted rows (starting with
row 598).


   1. How can I display all 1578 rows in R-studio? Because in the
   standardized residual plot it starts with study 1 (see attachment). In
   R-studio with row 598.
   2. How can I just plot the standardized residuals with manipulated
   x-axis to see every single study?


I cannot help with your Rstudio probelm as I do not use it but as far 
as your plotting question is concerned:


1 - do you really want to see all of the residuals? Why not just keep 
the ones outside the range -2 to +2 which you might then need to study further
2 - the pictures would probably be clearer if you identify and do not 
print out the two studies with r very close to -1 as they are 
compressing everything else

3 - hollow circles are often a good idea when you have overprinting.





Thank you very much for your help.

Cordially

Dominik

--

_


*Dipl.-Kfm. Dominik Wagner MSc. MSc.*

Content-Type: application/pdf; name="Rplot.pdf"
Content-Disposition: attachment; filename="Rplot.pdf"
X-Attachment-Id: f_hub2q8dv0


Michael Dewey
i...@aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta-question about R

2014-04-24 Thread Duncan Murdoch

On 24/04/2014, 7:42 AM, Jim Lemon wrote:

On 04/24/2014 08:52 PM, mark wrote:

On 04/23/14 23:22, William Dunlap wrote:

Aren't those files support for named semaphores (made with sem_open())?
Packages like BH and RSQLite contain calls to sem_open. Is your
long-running
R process using such a package?

I don't think you would want to delete those files, but perhaps you
can look into
whatever R package creates them and see if you can modify the code to
give
them better names and then add those names to rkhunter's whitelist.


You don't seem to understand what I'm asking. I have zero intention of
deleting those files. I'm sure that my user's long-running job is
creating them. What I'm asking is if ANYONE HERE knows if there is some
configuration file, or command inside R, that would tell R, whatever
package it's using (I assume that all packages inherit from the
top-level process), when it creates files in /dev/shm, to name them
something that I can use with wildcards in rkhunter's configuration file
so that rkhunter ignores them.

mark


Hi mark,
You are correct, I didn't understand what you were asking. Doing a bit
of searching, the sem_open function's first argument is the name of the
file that is to be created. It doesn't sound like you are specifying
these filenames, so it is probably a matter of finding the function that
calls sem_open or sem_init. I would approach this by grepping the source
code of the functions that you are calling, but as I have no idea what
these functions are (or how many levels of function calling goes on
before one of these two functions is called), I can't provide a
straightforward answer. If you do find the offending function, you can
just edit the source code to include your "R_temp" prefix, save the
edited function, and "source" it to replace the function that is not
providing the prefixes.



Using debug(sem_open) is a quick way to find who is calling them.  R 
will break execution when it enters that function.  Use the debugger 
"where" command to see the calling stack.


Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] meta-question about R

2014-04-24 Thread Jim Lemon

On 04/24/2014 08:52 PM, mark wrote:

On 04/23/14 23:22, William Dunlap wrote:

Aren't those files support for named semaphores (made with sem_open())?
Packages like BH and RSQLite contain calls to sem_open. Is your
long-running
R process using such a package?

I don't think you would want to delete those files, but perhaps you
can look into
whatever R package creates them and see if you can modify the code to
give
them better names and then add those names to rkhunter's whitelist.


You don't seem to understand what I'm asking. I have zero intention of
deleting those files. I'm sure that my user's long-running job is
creating them. What I'm asking is if ANYONE HERE knows if there is some
configuration file, or command inside R, that would tell R, whatever
package it's using (I assume that all packages inherit from the
top-level process), when it creates files in /dev/shm, to name them
something that I can use with wildcards in rkhunter's configuration file
so that rkhunter ignores them.

mark


Hi mark,
You are correct, I didn't understand what you were asking. Doing a bit 
of searching, the sem_open function's first argument is the name of the 
file that is to be created. It doesn't sound like you are specifying 
these filenames, so it is probably a matter of finding the function that 
calls sem_open or sem_init. I would approach this by grepping the source 
code of the functions that you are calling, but as I have no idea what 
these functions are (or how many levels of function calling goes on 
before one of these two functions is called), I can't provide a 
straightforward answer. If you do find the offending function, you can 
just edit the source code to include your "R_temp" prefix, save the 
edited function, and "source" it to replace the function that is not 
providing the prefixes.


Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Request for R " Initial value of MLE"

2014-04-24 Thread S Ellison
> Sir I have this problem,
> 
> >  res <-
> maxLik(logLik=loglik1,start=c(a=1.5,b=1.5,c=1.5,dee=2),method="BFGS")
> There were 50 or more warnings (use warnings() to see the first 50)
> > summary(res)
> "Maximum Likelihood estimation
> BFGS maximisation, 0 iterations
> Return code 100: Initial value out of range."
> 
> Dear sir how we give the initial value to estimate the parameters.

i) Avoid cross-posting; some folk get a bit snippy about that.

ii) read the help page for maxLik and look for an argument specifying the 
initial value of parameters


S Ellison


***
This email and any attachments are confidential. Any use...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rgl and axes3d() labels

2014-04-24 Thread Duncan Murdoch

On 23/04/2014, 9:02 PM, Alex Reynolds wrote:

Unfortunately, that doesn't help as it removes axis lines. It looks like
I can't use segments3d() without knowing what the bounds are of the
current axes and I don't know what to call to expose those.

Thanks again for your help, though, I appreciate it. Hopefully this gets
fixed in a future release!


There is no bug, so it won't be fixed.

Duncan Murdoch



-Alex


On Wed, Apr 23, 2014 at 5:34 PM, Duncan Murdoch
mailto:murdoch.dun...@gmail.com>> wrote:

On 23/04/2014, 7:51 PM, Alex Reynolds wrote:

I am making an rgl-based 3d plot. It works fine, except when I
try to
remove axis value labels and tick marks with axes3d(labels=FALSE,
ticks=FALSE):

---
rgl.open()
offset <- 50
par3d(windowRect=c(offset, offset, 1280+offset, 1280+offset))
rm(offset)
rgl.clear()
rgl.viewpoint(theta=__thetaStart, phi=30, fov=30, zoom=1)
spheres3d(df$PC1, df$PC2, df$PC3, radius=featureRadius,
color=df$rColor,
alpha=featureTransparency, shininess=featureShininess)
aspect3d(1, 1, 1)

/* -- */
axes3d(col='black', box=FALSE, labels=FALSE, ticks=FALSE)
/* -- */

title3d("", "", "PCoA1", "PCoA2", "PCoA3", col='black', line=1)
texts3d(df$PC1, df$PC2, df$PC3, text=df$ctName, color="blue",
adj=c(0,0))
bg3d("white")
rgl.clear(type='lights')
rgl.light(-45, 20, ambient='black', diffuse='#dd',
specular='white')
rgl.light(60, 30, ambient='#dd', diffuse='#dd',
specular='black')
filename <- paste("results/PCoA.labeled.__pdf", sep="")
rgl.postscript(filename, fmt="pdf")
---

When I run this code, these flags are ignored and I still get
axis labels
and tick marks. What am I misunderstanding about the documentation?


If you specify edges="bbox" (the default), labels is ignored, and
the bbox3d() function is used to draw the axes.  There's no ticks
argument, so it'll be absorbed by the ... argument.

I don't know what you want, but you might get it with

  axes3d(edges=c("x", "y", "z"), col='black', box=FALSE,
labels=FALSE, tick=FALSE)

This won't join the axis lines at the lower corner; if that's what
you want, I'd just draw them explicitly using segments3d.

BTW, mixing rgl.* functions with *3d functions is likely to give you
strange results.  I don't recommend it.

Duncan Murdoch




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cramer Rao upper bound computation

2014-04-24 Thread Mohammed Ouassou
 Dear R users;

I have a question about Cramer Rao upper/lower bounds

Is it possible to compute Crammer Rao upper/lower bounds from residuals
and  corresponding covariance matrices ?


Any suggestions will be appreciated, thanks in advance.


M.O

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] INET_NTOA equivalent?

2014-04-24 Thread Martin Maechler
> "EL" == Eberhard Lisse 
> on Thu, 24 Apr 2014 01:21:37 +0100 writes:

EL> In MySQL
EL> SELECT INET_ATON('127.0.0.1')

EL> returns the integer 2130706433

EL> Is there a function in R to reverse that, ie so that something like

EL> ip <- inet_ntoa(2130706433)

EL> would put  '127.0.0.1' into ip?

almost:

  install.packages("sfsmisc")
  require("sfsmisc")

  # NTOA :

  > digitsBase(2130706433, base = 256)
  Class 'basedInt'(base = 256) [1:1]
   [,1]
  [1,]  127
  [2,]0
  [3,]0
  [4,]1

  # ATON :

  > as.intBase(digitsBase(2130706433, base = 256), base = 256)
   1 
  2130706433 
  > 

So, an easy solution seems


> ip.ntoa <- function(n) paste(sfsmisc::digitsBase(n, base = 256), collapse=".")
> ip.ntoa(2130706433)
[1] "127.0.0.1"
> 

but that does not vectorize (work for  length(n) > 1 )
correctly.

The correct solution then is

ip.ntoa <- function(n) 
apply(sfsmisc::digitsBase(n, base = 256), 2, paste, collapse=".")

and that does work nicely:

> ip.ntoa(10+ (0:10))

 [1] "59.154.202.0"  "59.154.202.1"  "59.154.202.2"  "59.154.202.3"  
"59.154.202.4" 
 [6] "59.154.202.5"  "59.154.202.6"  "59.154.202.7"  "59.154.202.8"  
"59.154.202.9" 
[11] "59.154.202.10"

right ?

--
Martin Maechler, ETH Zurich

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Derivative of expm function

2014-04-24 Thread Martin Maechler
> Wagner Bonat 
> on Wed, 23 Apr 2014 12:12:17 +0200 writes:

> Hi all !
> I am look for some efficient method to compute the derivative of
> exponential matrix function in R. For example, I have a simple matrix like

> log.Sigma  <- matrix(c(par1, rho, rho, par2),2,2)

> require(Matrix)
> Sigma <- expm(log.Sigma)

> I want some method to compute the derivatives of Sigma in relation the
> parameters par1, par2 and rho. Some idea ?

The  'expm' package has slightly newer / more reliable
algorithms for the matrix exponential.

It also contains an  expmFrechet()  function
which computes the Frechet derivative of the matrix exponential.

I'm pretty confident -- but did not start thinking more deeply --
that this should provide the necessary parts to get
partial derivatives like yours as well.

Martin Maechler, ETH Zurich

> Wagner Hugo Bonat
> LEG - Laboratório de Estatística e Geoinformação
> UFPR - Universidade Federal do Paraná

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.