Re: [R] Option "-shared" not being passed to gcc when installing packages

2023-09-21 Thread Simon Knapp
Thanks Ivan.

I do have things in /etc/R. I have copied all files in /etc/R them out of a
freshly pulled r-base:latest docker container. This seems to have worked.
Thank you!

How would I generally replace those or get them updated when installing a
new version of version of R (which I do using R apt package repo)?

Thanks again,
Simon



On Thu, 21 Sept 2023 at 17:09, Ivan Krylov  wrote:

> On Thu, 21 Sep 2023 10:09:56 +1000
> Simon Knapp  wrote:
>
> > I am using R version 4.3.1 (2023-06-16) -- "Beagle Scouts" on ubuntu
> > 20.04
>
> > gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG-fopenmp -fpic
> > -g -O2 -fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=.
>
> It seems to be picking up compiler flags from a much older version of
> R. Do you have anything in ~/.R/Makevars*? What about /etc/R/Make*?
>
> https://cran.r-project.org/doc/manuals/R-admin.html#Customizing-package-compilation
>
> --
> Best regards,
> Ivan
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Option "-shared" not being passed to gcc when installing packages

2023-09-20 Thread Simon Knapp
oth.c -o sparse-smooth.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG-fopenmp -fpic  -g -O2
-fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=.
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time
-D_FORTIFY_SOURCE=2 -g  -c sparse.c -o sparse.o
gcc -std=gnu99 -I"/usr/share/R/include" -DNDEBUG-fopenmp -fpic  -g -O2
-fdebug-prefix-map=/build/r-base-jbaK_j/r-base-3.6.3=.
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time
-D_FORTIFY_SOURCE=2 -g  -c tprs.c -o tprs.o
gcc -std=gnu99 -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o
mgcv.so coxph.o davies.o discrete.o gdi.o init.o magic.o mat.o matrix.o
mgcv.o misc.o mvn.o ncv.o qp.o soap.o sparse-smooth.o sparse.o tprs.o
-llapack -lblas -lgfortran -lm -lquadmath -fopenmp -L/usr/lib/R/lib -lR
/usr/bin/ld:
/usr/lib/gcc/x86_64-linux-gnu/9/../../../x86_64-linux-gnu/Scrt1.o: in
function `_start':
(.text+0x24): undefined reference to `main'
collect2: error: ld returned 1 exit status
make: *** [/usr/share/R/share/make/shlib.mk:10: mgcv.so] Error 1
ERROR: compilation failed for package ‘mgcv’
* removing ‘/home//R/x86_64-pc-linux-gnu-library/4.3/mgcv’

The downloaded source packages are in
‘/tmp/RtmpI7uMoa/downloaded_packages’
Warning message:
In install.packages("mgcv") :
  installation of package ‘mgcv’ had non-zero exit status
>

Kind regards,
Simon Knapp

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Problem reading from NetCDF.

2013-05-24 Thread Simon Knapp
Hi List,

I'm not sure what I'm doing wrong here and would appreciate some help. I
create a NetCDF dataset as follows:

library(ncdf)
combined.ncdf - with(new.env(), {
nLat - 7
nLon - 8
nTime - 9
missing.val - -999
filename - file.path(tempdir(), 'combined_d6fg4s64s6g4l.nc')
lat.dim - dim.def.ncdf('lat', 'degrees', 1:nLat)
lon.dim - dim.def.ncdf('lon', 'degrees', 1:nLon)
time.dim - dim.def.ncdf(time, days since 1970-01-01,
as.integer(1:nTime), unlim=T)
var.def - var.def.ncdf('Data', 'mm', list(lon.dim, lat.dim, time.dim),
missing.val, prec='single')
nc - create.ncdf(filename, var.def)
for(i in 1:nLon)
for(j in 1:nLat)
for(k in 1:nTime)
put.var.ncdf(nc, var.def, i + j*10 + k*100, c(i, j, k),
rep(1, 3))
nc
})


the following do not work:
get.var.ncdf(combined.ncdf)
get.var.ncdf(combined.ncdf, varid='Data')
get.var.ncdf(combined.ncdf, varid=4)

all of which I thought would be equivalent and would give me the same as:
get.var.ncdf(combined.ncdf, forcevarid=4)

Could someone please point out the error of my ways.

Thanx in advance,
Simon Knapp

R version: 2.14.1,
ncdf built with version: 2.14.2
os: windows 7

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question on callNextMethod

2012-11-06 Thread Simon Knapp
I don't understand why I get the following results. I define two classes
'Base' and 'Derived', the latter of which 'contains' the first. I then
define a generic method 'test' and overload it for each of these classes. I
call 'callNextMethod()' in the overload for Derived. From the output, it
appears that the overload for Base gets called twice. Why is this? Test
code follows:

setClass('Base')
setClass('Derived', contains='Base')
setGeneric('test', function(x) standardGeneric('test'))
setMethod('test', signature(x='Base'), function(x) print('base called'))
setMethod('test', signature(x='Derived'), function(x) {print('derived
called'); callNextMethod()})

d = new('Derived')
test(d)


Produces the output:

[1] derived called
[1] base called
[1] base called


and I was expecting:

[1] derived called
[1] base called


Thanx in advance,
Simon Knapp

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Logistic regression/Cut point? predict ??

2012-10-20 Thread Simon Knapp
What do you mean by at x equal zero?

On Sun, Oct 21, 2012 at 8:37 AM, Adel Powell powella...@gmail.com wrote:
 I am new to R and I am trying to do a monte carlo simulation where I
 generate data and interject error then test various cut points; however, my
 output was garbage (at x equal zero, I did not get .50)
 I am basically testing the performance of classifiers.

 Here is the code:
 n - 1000; # Sample size

 fitglm - function(sigma,tau){
 x - rnorm(n,0,sigma)
 intercept - 0
 beta - 5
* ystar - intercept+beta*x*
* z - rbinom(n,1,plogis(ystar))**# I believe plogis accepts the a
 +bx augments and return the  e^x/(1+e^x)  which is then used to generate 0
 and 1 data*
 xerr - x + rnorm(n,0,tau)# error is added here
 model-glm(z ~ xerr, family=binomial(logit))
 int-coef(model)[1]
 slope-coef(model)[2]
 pred-predict(model)  #this gives me the a+bx data for new error?  I
 know I can add type= response to get the probab. but only e^x not *e^x/(1+e^x)
 *

 pi1hat-length(z[which(z==1)]/length(z)) My cut point is calculated  is
 the proportion of 0s to 1.
 pi0hat-length(z[which(z==0)]/length(z))

 cutmid - log(pi0hat/pi1hat)
 pred-ifelse(predcutmid,1,0) * I am not sure if I need to compare
 these two. I think this is an error.
 *
 accuracy-length(which(pred==z))/length(z)
 accuracy

 rocpreds-prediction(pred,z)
 auc-performance(rocpreds,auc)@y.values

 output-c(int,slope,cutmid,accuracy,auc)
 names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC)
 return(output)

 }

 y-fitglm(.05,1)
 y

 nreps - 500;
 output-data.frame(matrix(rep(NA),nreps,6,ncol=6))

 mysigma-.5
 mytau-.1

 i-1

 for(j in 1:nreps) {
 output[j,1:5]-fitglm(mysigma,mytau)
 output[j,6]-j
 }

 names(output)-c(Intercept,Slope,CutPoint,Accuracy,AUC,Iteration)

 apply(output,2, mean)
 apply(output,2, var)

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] time extraction and normalization

2012-10-16 Thread Simon Knapp
Whoops... tried to get all descriptive with the argument names before
I posted but after I tested. Try this

toDaySecond - function(date.string, format.string='%d%b%Y:%T', tz='') {
d - as.POSIXct(date.string, format=format.string, tz=tz)
sum(mapply(function(f, l) as.numeric(format(d, format=f)) * l,
c('%H', '%M', '%S'), c(3600, 60, 1)))
}


On Tue, Oct 16, 2012 at 11:22 PM, arun smartpink...@yahoo.com wrote:
 Hi Simon,
 I tried your code, but I am getting some error message:
 toDaySecond('04MAY2011:08:19:00')
 #Error in as.POSIXct(d, format = format.string, tz = tz) :
   #object 'd' not found

 A.K.




 - Original Message -
 From: Simon Knapp sleepingw...@gmail.com
 To: Jeff Newmiller jdnew...@dcn.davis.ca.us
 Cc: r-help@r-project.org; york8866 yu_y...@hotmail.com
 Sent: Tuesday, October 16, 2012 1:32 AM
 Subject: Re: [R] time extraction and normalization

 toDaySecond - function(date.string, format.string='%d%b%Y:%T', tz='') {
 d - as.POSIXct(d, format=format.string, tz=tz)
 sum(mapply(function(f, l) as.numeric(format(d, format=f)) * l,
 c('%H', '%M', '%S'), c(3600, 60, 1)))
 }

 toDaySecond('04MAY2011:08:19:00')

 Will calculate the second of the day. divide by 3600 to convert to hours.


 On Tue, Oct 16, 2012 at 4:21 PM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
 There are multiple ways. For lack of your example, I would suggest 
 multiplying by 24.

 Please post the previous context of the thread if you must post from Nabble.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 york8866 yu_y...@hotmail.com wrote:

Thanks,

The code gives the numbers in days, how can I adjust the code to
directly
get the numbers in hours?

I tried units but it did not work.

Thanks1



--
View this message in context:
http://r.789695.n4.nabble.com/time-extraction-and-normalization-tp4646275p4646298.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] vectors comparison

2012-10-16 Thread Simon Knapp
Your question does not seem to make sense - there is no value of -500
in Y (did you mean -10?). Anyway, I think this might work:

which(y==-10  (x==1 | c(0, x[-length(x)]) == 1 | c(x[-1], 0) == 1))

... though one would think there is a more elegant way


On Wed, Oct 17, 2012 at 10:07 AM, Francesco Nutini
nutini.france...@gmail.com wrote:

 Dear R-Users,
 I'd like to have your help on this problem:
 I have two vectors:x- c(0,1,0,0,0,0,0,0,0,0,0,1,0)y- 
 c(0,0,-10,0,0,-10,0,-10,0,0,0,0,0)
 And I want to know where the value -500 in y have a correspondence value 1 in 
 x.Considering a buffer of one position before and after in x.i.e. in this 
 example only the -10 in position y[3] satisfies the criteria, because x has 
 in position x[2]the figure 1.
 Thank in advance for you help,Francesco






 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] rJava install - %1 is not a valid Win32 application.

2012-10-15 Thread Simon Knapp
My guess would be that your running the 32 bit version of R - and
rJava is looking for the 64 bit dll. I'd suggest starting the 64 bit
version of R explicitly (e.g. the 64 bit version of Rgui lives at
R_HOME/bin/x64/Rgui.exe, whereas the 32 bit version lives at
R_HOME/bin/i386/Rgui.exe).

On Tue, Oct 16, 2012 at 2:37 AM, Steven Ranney steven.ran...@gmail.com wrote:
 All –



 I’m having a problem with the rJava package.  I can download it to my
 machine (Win 7 64-bit) but when I try to load the package into R (2.15.1,
 64-bit version), I get the following error:


 require(rJava)

 Loading required package: rJava

 Error : .onLoad failed in loadNamespace() for 'rJava', details:

   call: inDL(x, as.logical(local), as.logical(now), ...)

   error: unable to load shared object
 'C:/Users/sranney/Documents/R/win-library/2.15/rJava/libs/x64/rJava.dll':

   LoadLibrary failure:  %1 is not a valid Win32 application.


 I have verified that the file R is looking for is in the appropriate place,
 but I continue to get the error.  I have tried to download rJava from
 another source, but still get the error.


 I have not been able to find another user with this same issue.


 Thanks for your help –


 Steven Ranney

 [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] warning message

2012-10-15 Thread Simon Knapp
The second and third arguments to ifelse are evaluated for all
elements of x (producing your warnings), then the appropriate elements
of each result are combined based on the (logical) values of the first
argument to ifelse.

On Tue, Oct 16, 2012 at 10:08 AM, Shi, Tao shida...@yahoo.com wrote:
 Hi list,

 Can somebody explain why there are these warning messages?  I just don't get 
 it.  I'm using R 2.15.1 on WinXP.

 Thanks!

 Tao


 x
 [1] -2.143510 -1.157450 -1.315581  1.033562 -1.225440 -1.179909

  ifelse(x0, log2(x), -log2(-x))
 [1] -1.099975 -0.210950 -0.395700  0.047625 -0.293300 -0.238675
 Warning messages:
 1: In ifelse(x  0, log2(x), -log2(-x)) : NaNs produced
 2: In ifelse(x  0, log2(x), -log2(-x)) : NaNs produced


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calling R from C (Windows)

2012-10-15 Thread Simon Knapp
Yes.

Your looking at R_HOME (the installation directory), not the
directory of the sources.

On Tue, Oct 16, 2012 at 10:08 AM, BayesForever
andrew.mor...@verizon.net wrote:

 I am trying to follow the instructions in section 8.2.2 of Writing R
 extensions, to learn how to call R from C.  In this manual and at other
 sites I see continual references to examples in the tests/Embedding
 directory of the sources.

 There is no directory named Embedding anywhere in my 2.13.1 install of R.
 Am I missing components that I need to call R from C, or am I
 misunderstanding the meaning of the tests/Embedding directory of the
 sources?



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Calling-R-from-C-Windows-tp4646285.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Building a package with an existing dll

2012-10-15 Thread Simon Knapp
Anything you put in the folder 'inst' of a package gets copied, as is,
to the installed package once 'everything else' is done (see section
1.1.3 of Writing R Extensions - which you read for caveats). Hence
you can create

inst/libs/arch/libs/package_name.dll

where:
- arch is either 'x64' or 'i386' (or both if you can compile for
both architectures) for 64 or 32 bit R respectively, and
- package_name is the name of the package.

Then as per section 1.6.3 of Writing R Extensions You can then add the line

useDynLib(package_name)

to your NAMESPACE file and it should then work as before (noting that
you should use the PACKAGE argument in your .C, .Call and .External
calls to the routines therein). Optionally, you can list the symbol
names in the call to useDynLib as per section 1.6.3 of Writing R
Extensions.



On Tue, Oct 16, 2012 at 1:24 PM, GlennManion
glenn.man...@environment.nsw.gov.au wrote:
 I have setup a package directory structure with all the relevent files in the
 src, R, man and data directories.
 Also have the correct DESCRIPTION and NAMESPACE files in the root directory.
 I don't wisk to recompile the .dll file in the src directory as it currently
 works fine if I locate it in the same workdirectory as my R source code that
 calls it.
 This is a little tedious having to move source and dll into every directory
 that I wish to work in, thus the need to convert into a package.
 Is there a way to create a package that does not need the dll re-compiled?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] time extraction and normalization

2012-10-15 Thread Simon Knapp
toDaySecond - function(date.string, format.string='%d%b%Y:%T', tz='') {
d - as.POSIXct(d, format=format.string, tz=tz)
sum(mapply(function(f, l) as.numeric(format(d, format=f)) * l,
c('%H', '%M', '%S'), c(3600, 60, 1)))
}

toDaySecond('04MAY2011:08:19:00')

Will calculate the second of the day. divide by 3600 to convert to hours.


On Tue, Oct 16, 2012 at 4:21 PM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 There are multiple ways. For lack of your example, I would suggest 
 multiplying by 24.

 Please post the previous context of the thread if you must post from Nabble.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 york8866 yu_y...@hotmail.com wrote:

Thanks,

The code gives the numbers in days, how can I adjust the code to
directly
get the numbers in hours?

I tried units but it did not work.

Thanks1



--
View this message in context:
http://r.789695.n4.nabble.com/time-extraction-and-normalization-tp4646275p4646298.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Keep rows in a dataset if one value in a column is duplicated

2012-09-27 Thread Simon Knapp
#By using cbind in:
PairIDs-cbind(PairID, PairIDDuplicates)

#You create a numeric matrix (the logical
#vector PairIDDuplicates gets converted
#to numeric - note that your second column
#contains 1s and 0s, not Trues and Falses).
#Matricies are not subsetable using $,
#they are basically a vector with
#a dimension attribute - hence your error).

#Two ways you could have avoided your error are:
# 1) changing the cbind to data.frame
PairIDs - data.frame(PairID, PairIDDuplicates)
names(PairIDs) - c(Pairid,Pairiddups)
Health2PairsOnly - PairIDs[PairIDs$Pairiddups,]

# 2) using the dimensions name like:
PairIDs-cbind(PairID, PairIDDuplicates)
colnames(PairIDs) - c(Pairid,Pairiddups)
Health2PairsOnly - PairIDs[PairIDs[,'Pairiddups']==1,]

#In the latter you can save a line of code with
PairIDs - data.frame(Pairid=PairID, Pairiddups=PairIDDuplicates)



#Note that there is a fair bit of redundancy throughout
#your code. A neater way of subsetting your original
#data, for instance, would be:
PairIDdup - unique(PairID[duplicated(PairID)])
Health2[PairID %in% PairIDdup,]



Have Fun!
Simon Knapp



On Fri, Sep 28, 2012 at 5:46 AM, GradStudentDD dd...@virginia.edu wrote:
 Hi,

 I have a data set of observations by either one person or a pair of people.
 I want to only keep the pair observations, and was using the code below
 until it gave me the error  $ operator is invalid for atomic vectors. I am
 just beginning to learn R, so I apologize if the code is really rough.

 Basically I want to keep all the rows in the data set for which the value of
 Pairiddups is TRUE. How do I do it? And how do I get past the error?

 Thank you so much,
 Diana

 PairID-c(Health2$pairid)

 duplicated(PairID, incomparables=TRUE, fromLast=TRUE)

 PairIDdup=duplicated(PairID)
 cbind(PairID, PairIDdup)
 PairID[which(PairIDdup)]

 PairIDDuplicates-PairID%in%PairID[which(PairIDdup)]
 PairIDs-cbind(PairID, PairIDDuplicates)

 colnames(PairIDs)-c(Pairid,Pairiddups)

 Health2PairsOnly-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] combining numeric vector and column in a data frame

2012-07-10 Thread Simon Knapp
combined - data.frame(mnv=my.numeric.vec[df1$fileName], type=df1$type)
sorted - combined[order(rownames(combined)),]




On Wed, Jul 11, 2012 at 8:38 AM, Adrian Johnson
oriolebaltim...@gmail.com wrote:
 Hi:
 I am trying to map column names of a matrix to another data frame and
 get values in a different column of data frame


 I have a matrix m.

 my.numeric vec - m[1,]

 my.numeric.vec
 a b cd   ef
 2 1 49   10  3



 ##  my data frame = df1

 df1

 fileNametype   status
 b   N   alive
 a   Tdead
 d   N   alive
 c   T   dead
 f   Nalive
 e  Tdead


 I want to combine as.numeric.vec and df1 and create j with correct
 filename and type and numeric value

j

 my.numeric.vectype
 a 2   T
 b 1  N
 c 4  T
 d 9  N
 e 10 T
 f 3  N

 How can I combine my.numeric.vec and df1$type

 when I try:

 df1[df1$fileName %in% names(my.numeric.vec),2]

 I get wrong answer.


 thanks in advance.

 Adrian

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Simon Knapp
They are coding the variables as factors and using orthogonal
polynomial contrasts. This:

data.catapult - data.frame(data.catapult$Distance,
do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T)))
contrasts(data.catapult$h) -
contrasts(data.catapult$s) -
contrasts(data.catapult$l) -
contrasts(data.catapult$e) -
contr.poly(3, contrasts=F)
contrasts(data.catapult$b) - contr.poly(2, contrasts=F)
lm1 - lm(Distance ~ .^2, data=data.catapult)
summary(lm1)

gets you closer (same intercept at least), but I can't explain the
remaining differences. I'm not even sure why the results to look like
they do (interaction terms like a*b not a:b and one level for each
interaction).

Hope that helps,
Simon Knapp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Simon Knapp
... but this is tantalisingly close:

dat1 - with(data.catapult,
data.frame(
Distance,
h=C(h, poly, 1),
s=C(s, poly, 1),
l=C(l, poly, 1),
e=C(e, poly, 1),
b=C(b, poly, 1)
)
)
lm4 - lm(Distance ~ .^2, data = dat1)
summary(lm4)

... wish I knew what it meant.



On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote:
 They are coding the variables as factors and using orthogonal
 polynomial contrasts. This:

 data.catapult - data.frame(data.catapult$Distance,
 do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T)))
 contrasts(data.catapult$h) -
 contrasts(data.catapult$s) -
 contrasts(data.catapult$l) -
 contrasts(data.catapult$e) -
 contr.poly(3, contrasts=F)
 contrasts(data.catapult$b) - contr.poly(2, contrasts=F)
 lm1 - lm(Distance ~ .^2, data=data.catapult)
 summary(lm1)

 gets you closer (same intercept at least), but I can't explain the
 remaining differences. I'm not even sure why the results to look like
 they do (interaction terms like a*b not a:b and one level for each
 interaction).

 Hope that helps,
 Simon Knapp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fractional Factorial - Wrong values using lm-function

2012-06-25 Thread Simon Knapp
and finally...

thingy - function(x) {
x - C(x, poly, 1)
tmp - contrasts(x)
contrasts(x, 1) - 2 * tmp / sum(abs(tmp))
x
}

dat2 - with(data.catapult,
data.frame(
Distance,
h=thingy(h),
s=thingy(s),
l=thingy(l),
e=thingy(e),
b=thingy(b)
)
)
lm5 - lm(Distance ~ .^2, data = dat2)
summary(lm5)




On Tue, Jun 26, 2012 at 12:35 AM, Simon Knapp sleepingw...@gmail.com wrote:
 ... but this is tantalisingly close:

 dat1 - with(data.catapult,
    data.frame(
        Distance,
        h=C(h, poly, 1),
        s=C(s, poly, 1),
        l=C(l, poly, 1),
        e=C(e, poly, 1),
        b=C(b, poly, 1)
    )
 )
 lm4 - lm(Distance ~ .^2, data = dat1)
 summary(lm4)

 ... wish I knew what it meant.



 On Tue, Jun 26, 2012 at 12:18 AM, Simon Knapp sleepingw...@gmail.com wrote:
 They are coding the variables as factors and using orthogonal
 polynomial contrasts. This:

 data.catapult - data.frame(data.catapult$Distance,
 do.call(data.frame, lapply(data.catapult[-1], factor, ordered=T)))
 contrasts(data.catapult$h) -
 contrasts(data.catapult$s) -
 contrasts(data.catapult$l) -
 contrasts(data.catapult$e) -
 contr.poly(3, contrasts=F)
 contrasts(data.catapult$b) - contr.poly(2, contrasts=F)
 lm1 - lm(Distance ~ .^2, data=data.catapult)
 summary(lm1)

 gets you closer (same intercept at least), but I can't explain the
 remaining differences. I'm not even sure why the results to look like
 they do (interaction terms like a*b not a:b and one level for each
 interaction).

 Hope that helps,
 Simon Knapp

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Loop Help

2012-05-19 Thread Simon Knapp
First, what was going wrong:

#this creates a character vector, not an object containing your data sets
TOWERS - c(TOWER1,TOWER2,TOWER3,TOWER4,TOWER5,TOWER6,TOWER7)

for(i in 1:7){
# this creates a variable named TOWER.i (not TOWER.1, TOWER.2,
..., TOWER.7)
# which will get overwritten each time through the loop
TOWER.i - TOWERS[i]
# This is what causes your error, as TOWER.i is a string
(TOWER1, TOWER2, ...), and hence nrow(TOWER.i) return NULL
    TOWER - rep(i,nrow(TOWER.i))
# if you got to here, you would have got more errors as TOWER.i is
cannot be subsetted
# (it is a string, not a 'dataset' with columns)
    TOWER.i - cbind(TOWER.i[1:2], TOWER, TOWER.i[2:length(TOWER.i)])
}




# Note that to create variables like 'TOWER.1', ..., you would have to
say something like
assign(paste('TOWER', i, sep='.'), TOWERS[[i]])
# inside the loop. However, this is not necessary here and you would
keep needing to
# access the variable with similar ugliness. Here is the way I might
approach this:

TOWERS - list(TOWER1,TOWER2,TOWER3,TOWER4,TOWER5,TOWER6,TOWER7)

# create a list of things that look like what you wanted TOWERS.i to look like
# (still with column two repeated
TOWERS.modified - mapply(function(x, i) cbind(x[,1:2], i,
x[,2:ncol(x)]), TOWERS, 1:length(TOWERS), SIMPLIFY=F)

# you can then 'merge' them (though that term would mean something else to most
# R programmers - see ?merge) with:
TOWERS.combined - do.call(rbind, TOWERS.modified)




#Now what I would actually do is:
TOWERS.combined - do.call(rbind, lapply(1:7, function(i) {
x - get(paste('TOWER', i, sep=''))
cbind(x[,1:2], i, x[,2:ncol(x)])
}))



Hopefully that makes sense, if not I can provide more explanation, but
please try ?do.call, ?get, ?lapply and ?mapply first

Simon

On Sat, May 19, 2012 at 9:14 AM, bdossman bdoss...@gmail.com wrote:
 Hi all,

 I am a beginner R user and need some help with a simple loop function.

 Currently, I have seven datasets (TOWER1,TOWER2...TOWER7) that are all in
 the same format (same # of col and headers). I am trying to add a new column
 (factor) to each dataset that simply identifies the dataset. Ultimately, I
 would like to merge all 7 datasets and need that column to identify what
 rows came from what dataset.

 Using the code below, I get the error message Error in rep(i,
 nrow(TOWER.i)) : invalid 'times' argument but it doesn't make sense to me
 since nrow should give an integer value. Any help will be really
 appreciated.

 TOWERS-c(TOWER1,TOWER2,TOWER3,TOWER4,TOWER5,TOWER6,TOWER7)

 for(i in 1:7){
        TOWER.i-TOWERS[i]
        TOWER-rep(i,nrow(TOWER.i))
        TOWER.i-cbind(TOWER.i[1:2],TOWER, TOWER.i[2:length(TOWER.i)])
 }

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Loop-Help-tp4630555.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] converting csv to image file

2012-05-19 Thread Simon Knapp
provided you get the call to read.table (or perhaps read.csv) right
and presuming that file contains only the image data, you should
be able to say:

r - raster(as.matrix(read.csv(file.csv)))
extent(r) - extent(xmin, xmax, ymin, ymax)

and not worry about the projection (if it is plain old decimal degrees)

Regards,
Simon

On Sat, May 19, 2012 at 7:49 PM, Belay Gebregiorgis belay...@gmail.com wrote:
 Hello everyone,

 I want to get a 1km by lkm grid raster image using my csv data. If I call
 latitude=a, longitude=b and preciptation=c.

 a-(1,2,3,4,5)
 b-(6,7,8,9,10)
 c-(10,20, 30,40, 50)

 Then I found an example in r help which goes like

 pts = read.table(file.csv,..)
 library(sp)
 library(rgdal)

 proj4string(pts)=CRS(+init=epsg:4326) # set it to lat-long
 pts = spTransform(pts,CRS(insert your proj4 string here))
 gridded(pts) = TRUE
 r = raster(pts)
 projection(r) = CRS(insert your proj4 string here)

 Because I am new to R, I have no idea what to put into the proj4 string and
 all that. Can anyone help me on this? If there is a different way of doing
 this, that is also fine.

 Kind Regards,

 Belay

        [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Q - scatterplot, plot function trellis linear regressions???

2012-05-19 Thread Simon Knapp
I presume you mean car::scatterplot

A1) plot is a generic function for plotting 'things' and scatterplot
is a specific tool provided by that library

A2) from the documentation one infers that
  1) the straight line is a regression line
  2) the other lines are produced using loess
   a) the solid line is fitted to the data
   b) the dotted lines are created by: a smoother is applied
to the root-mean-square positive and negative residuals from the loess
line to display conditional spread and asymmetry

A3) you forgot the data argument (at least).

Cheers,
Simon

On Sat, May 19, 2012 at 11:34 AM, Jhope jeanwaij...@gmail.com wrote:
 Hi R-listers,

 Q1) What is the difference between the scatterplot and plot function?

 Q2) I am able to make a graph with the scatterplot function:

 scatterplot(DevelopIndex ~ Veg,
 +             data = Turtle,
 +             xlab = Vegetation border (m),
 +             ylab = Embryonic development index)

 And have been successful. But I do not know if the lines are for: linear,
 non-linear, mean, upper and lower limits or percentile? This shows side
 boxplots and 4 lines.

 Yet this plot only has one regression line and no side boxplots?:

 EDIHTL - lm(DevelopIndex~ HTL, data=Turtle)
 plot(DevelopIndex~HTL, data=Turtle, pch=16,
       xlab=High tide line (m), ylab=Embryonic development index)
 abline(EDIHTL)

 Q-3
 I am trying to make a trellis of linear regressions. I want 3 windows
 displaying 3 events of linear regression (two-way interactions). I have been
 trying this:

 require(lattice)
 library(lattice)
 trellis.par.set(col.whitebg())
 scatterplot(DevelopIndex~Veg|Aeventexhumed,
 +             xlab = Vegetation border,
 +             ylab = Embryonic development index)
 Error in eval(expr, envir, enclos) : object 'DevelopIndex' not found

  Please advise, many thanks.
 Jean

 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Q-scatterplot-plot-function-trellis-linear-regressions-tp4630562.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Fwd: Help please

2011-07-20 Thread Simon Knapp
Hi All,

This is not really an R question but a statistical one. If someone could
either give me the brief explanation or point me to a reference that might
help, I'd appreciate it.

I want to estimate the mean of a log-normal distribution, given the (log
scale normal) parameters mu and sigma squared (sigma2). I understood this
should simply be:

exp(mu + sigma2)

... but I the following code gives me something strange:

R - 1000
mu - -400
sigma2 - 200
tmp - rlnorm(R, mu, sqrt(sigma2)) # a sample from the desired log-normal
distribution
muh - mean(log(tmp))
sigma2h - var(log(tmp))

#by my understanding, all of the the following vectors should then contain
very similar numbers
c(mu, muh)
c(sigma2, sigma2h)
c(exp(mu + sigma2/2), exp(muh + sigma2h/2), mean(tmp))


I get the following (for one sample):
 c(mu, muh)
[1] -400. -400.0231
 c(sigma2, sigma2h)
[1] 200. 199.5895
 c(exp(mu + sigma2/2), exp(muh + sigma2h/2), mean(tmp))
[1] 5.148200e-131 4.097249e-131 5.095888e-150

so they do all contain similar numbers, with the exception of the last
vector, which is out by a factor of 10^19. Is this likely to be because one
needs **very** large samples to get a reasonable estimate of the mean... or
am I missing something?

Regards,
Simon

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RWinEdt problem

2011-07-03 Thread Simon Knapp
Hi R Helpers,

I am a long time RWinEdt user and have just acquired a new laptop. I have
installed RWinEdt and things are going smoothly except for one small glitch
- file names are not appearing on the document tabs. When I use WinEdt (as
opposed to RWinEdt), they are appearing. Can anyone offer any advice on
this?

Thanks in advance,
Simon Knapp

OS: windows7
Arch: 64 bit
R version: 2.13.0 (2011-04-13)
WinEdt version: 5.14 (build 20050701)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RWinEdt problem

2011-07-03 Thread Simon Knapp
... I just tried fiddling with the appearance settings, and when I uncheck
Custom Colors under Document Tabs, the file names reappear, though I
don't get the coloring I am used to (red for modified, green for
unmodified).

Thanks again,
Simon Knapp


On Mon, Jul 4, 2011 at 2:48 PM, Simon Knapp sleepingw...@gmail.com wrote:

 Hi R Helpers,

 I am a long time RWinEdt user and have just acquired a new laptop. I have
 installed RWinEdt and things are going smoothly except for one small glitch
 - file names are not appearing on the document tabs. When I use WinEdt (as
 opposed to RWinEdt), they are appearing. Can anyone offer any advice on
 this?

 Thanks in advance,
 Simon Knapp

 OS: windows7
 Arch: 64 bit
 R version: 2.13.0 (2011-04-13)
 WinEdt version: 5.14 (build 20050701)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lapply and list indexing basics (after realizing I wasn't previously subscribed...sorry)

2010-03-07 Thread Simon Knapp
library(MASS)

dat - data.frame(
col1=as.factor(sample(1:4, 100, T)),
col2=as.factor(sample(1:4, 100, T)),
col3=as.factor(sample(1:4, 100, T)),
isi=rnorm(100)
)

dat - split(dat, as.factor(sample(1:3, 100, T)))
lapply(dat, function(x, densfun) fitdistr(x$isi, densfun), 'normal')

Not used fitdistr before, and without data, I can't help with the error you
note below (which may well still occur)

Hope this helps.





On Mon, Mar 8, 2010 at 1:29 PM, Dgnn sharkbrain...@gmail.com wrote:


 I have split my original dataframe to generate a list of dataframes each of
 which has 3 columns of factors and a 4th column of numeric data.
 I would like to use lapply to apply the fitdistr() function to only the 4th
 column (x$isi) of the dataframes in the list.

 Is there a way to do this or am I misusing lapply?

 As a second solution I tried splitting only the numeric data column to
 yield
 a list of vectors and then using
 lapply(myList, fitdistr, densfun='gamma',start=list(scale=1, shape=2))
 returns the error:
 Error in optim(x = c(305, 290, 283, 363, 331, 293, 304, 312, 286, 339,  :
 non-finite finite-difference value [2]
 In addition: Warning message:
 In dgamma(x, shape, scale, log) : NaNs produced

 However, if I use fitdistr(myList[[i]]) on each element of the list, there
 are no errors.

 Thanks in advance for any comments.

 -jason
 --
 View this message in context:
 http://n4.nabble.com/lapply-and-list-indexing-basics-after-realizing-I-wasn-t-previously-subscribed-sorry-tp1584053p1584053.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sub-matrixes that are linked to the Base matrix

2010-01-18 Thread Simon Knapp
The following code gives you functionality close to what you are looking for
(I think)... but someone more familar with proto might be able to make it
sexier. You will need to install the package 'proto' (though with a bit of
work you might be able to do this without proto, using environments alone.




library(proto)
my.mat - function(mat) {
structure(proto( expr = {
mat - mat
set.vals - function(., rs, cs, vals) {
if(max(rs)  nrow(.$mat) || min(rs)  1) stop(row out of
range)
if(max(cs)  ncol(.$mat) || min(cs)  1) stop(column out of
range)
.$mat[rs, cs] - vals
}
get.mat.ref - function(., rows, cols) {
if(max(rows)  nrow(.$mat) || min(rows)  1) stop(row out of
range)
if(max(cols)  ncol(.$mat) || min(cols)  1) stop(column out of
range)
tmp - proto(., expr = {
set.vals - function(., rs, cs, vals) {
if(max(rs)  length(.$rows) || min(rs)  1) stop(row
out of range)
if(max(cs)  length(.$cols) || min(cs)  1) stop(column
out of range)
.super$mat[.$rows[rs], .$cols[cs]] - vals
}
})
tmp$rows - rows
tmp$cols - cols
structure(tmp, class=c('my.sub.mat', 'proto', 'environment'))
}
}), class=c('my.mat', 'proto', 'environment'))
}

[.my.mat - function(x, ...) get('mat', x)[...]
[-.my.sub.mat - [-.my.mat - function(x, i, j, value) {x$set.vals(i,
j, value); as.proto(x)}
print.my.sub.mat - function(x) print(get('mat',x)[get('rows', x),
get('cols', x)])
print.my.mat - function(x) print(get('mat',x))




#You can use this code like:

a - my.mat(matrix(1:(4*3),4,3))
b - a$get.mat.ref(2:3, 2:3)
b[1,1] - 42  # equivalent to b$set.vals(1,1,42)
b[1:2,2] - 43
a[1,1] - 44
a[rep(4, 2), 2:3] - 45
b # print the sub matrix
a # print the matrix

#test passing sub-matrix to a function
test - function(x) x[2,1] - 46
test(b)
a # print the matrix again (it is as we would like)

# extract the original matrix
dat - a$mat




Hope this helps,
Simon Knapp





On Mon, Jan 18, 2010 at 12:36 PM, Friedrich goti...@gmail.com wrote:

 Hello,

 I'm am in the process of trying to port a RATS functions to R and have
 the problem, that RATS allows for the creation of submatrixes that are
 linked to their basematrix.

 Basically it should work like this:

 a = matrix(1:(4*3),4,3)
 a
 #  [,1] [,2] [,3]
 # [1,]159
 # [2,]26   10
 # [3,]37   11
 # [4,]48   12

 # This of course doesnt work :)
 b = submatrix(a,fromx=1,tox=2,fromy=1,toy=2)
 b
 #  [,1] [,2]
 # [1,]0   10
 # [2,]7   11
 b[1,1] = 42
 a[1,1]
 # [1] 42

 so changes in the sub-matrix propagate to the base matrix.

 Does such a feature exist?

 Thanks,

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] change codes into loops

2010-01-18 Thread Simon Knapp
You can do this...

# Some random data:
b_1 - b_2 - b_3 - matrix(,2,3)
for(i in 1:3) eval(substitute(A - matrix(rnorm(6), 2), list(A=paste('a', i,
sep=''


# the loop
for (i in 1:2) {
for (j in 1:3) {
for(k in 1:3) {
eval(substitute(A[i,j] - rank(c(a1[i,j],a2[i,j],a3[i,j]))[k],
list(A=as.symbol(paste('b_', k, sep='')
}
}
}



# For my own interest, I would have thought that:

thingy - expression(A[i,j] - rank(c(a1[i,j],a2[i,j],a3[i,j]))[k])
for (i in 1:2) {
for (j in 1:3) {
for(k in 1:3) {
eval(do.call('substitute', list(expr=thingy,
env=list(A=as.symbol(paste('b_', k, sep=''))
}
}
}


# would have also worked, but it does not. Can someone please explain why
not?


Simon Knapp

On Tue, Jan 19, 2010 at 1:41 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Jan 18, 2010, at 9:21 PM, rusers.sh wrote:

  If the number of datasets for a* is small (here is 3), it is ok for
  creating b_ijn[i, j, nn] and make assignments to it. But it will be
  a little bit impossible for a larger number of datasets for a*, say
  999. We may need 999 lines to do this. Maybe there are other
  alternatives.

 Read more carefully. The b_ijn[ , , ] array can be pre-dimensioned to
 any size. You must know the size since you are specifying a loop range.


 
  2010/1/18 David Winsemius dwinsem...@comcast.net
 
  On Jan 18, 2010, at 7:19 PM, rusers.sh wrote:
 
  Hi,
   See example.
   for (i in 1:2) {
   for (j in 1:3) {
 b_1[i,j]-rank(c(a1[i,j],a2[i,j],a3[i,j]))[1]
b_2[i,j]-rank(c(a1[i,j],a2[i,j],a3[i,j]))[2]
 b_3[i,j]-rank(c(a1[i,j],a2[i,j],a3[i,j]))[3]
   }
  }
   The inner codes is really repeated, so i want to change the inner
  codes
  into loops. Take nn is from 1 to 3,
  something like,
   for (nn in 1:3) {
  b_nn[i,j]-rank(c(a1[i,j]:a3[i,j]))[nn]
  }
   Anybody can tell me the correct method to specify the above codes?
 
  There is no correct method. You cannot index on the object name b_nn
  that way. R has not been developing using a syntax with that much
  flexibility.  If you want a 3D array of values, then you could
  create b_ijn[i, j, nn] and make assignments to it. But if you tried
  to do this with paste and assign, you will spending considerably
  more time degbugging it than it is worth and it would likely be more
  inefficient than what you have.
 
  --
  David.
 
  Thanks.
 
  --
  -
  Jane Chang
  Queen's
 
 [[alternative HTML version deleted]]
 
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  David Winsemius, MD
  Heritage Laboratories
  West Hartford, CT
 
 
 
 
  --
  -
  Jane Chang
  Queen's


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help using Cast (Text) Version

2010-01-17 Thread Simon Knapp
Hi Steve,

These might help.

#generate some example data similar to your original email, with one extra
line containing all NAs (first col included to provide similarity with your
data)

labs - c('4er66', '4gcyi', '5d3hh', '5d3wt', 'v3st5', 'a22g5', 'b5dd3',
'g44d2', 'z')
dat - scan()
1 NA  1  0 NA  0
0  0  1  0  0  0
0  0  0 NA  0  0
0  0  0  0  0  0
NA NA  1 NA NA NA
NA  0 NA NA NA NA
NA  0 NA NA NA NA
NA  0 NA NA NA NA
NA NA NA NA NA NA

data - data.frame(labs=labs, dat=matrix(dat, byrow=T, ncol=6))

# I think this line of code gives what you want (number of rows that contain
at least 1 non-na value).
sum(apply(data[,-1], 1, function(x) any(!is.na(x

# to produce the line (all) in your original output (assuming that this
line counts the number of non-na entries in each col):
apply(data[,-1], 2, function(x) sum(!is.na(x)))

Regards,
Simon Knapp





On Mon, Jan 18, 2010 at 8:37 AM, Steve Sidney sbsid...@mweb.co.za wrote:

 Well now I am totally baffled !!

 Using

 sum( !is.na(b[,3])) I get the total of all col 3 except those that are NA
 -
 Great solves the first problem

 What I can't seem to do is use the same logic to count all the 1's in that
 col, which are there before I use the cast with margins.

 So it seems to me that somehow sum((b[,3]) == 1)  is wrong and is the part
 of my understanding that's missing.

 My guess is that that before using margins and sum in the cast statement
 the col is a character type and in order for == 1 to work I need to convert
 this to an integer.

 Hope this helps you to understand the problem.

 Regards
 Steve

 Your help is much appreciated

 - Original Message - From: David Winsemius 
 dwinsem...@comcast.net
 To: Steve Sidney sbsid...@mweb.co.za
 Cc: r-help@r-project.org
 Sent: Sunday, January 17, 2010 7:36 PM

 Subject: Re: [R] Help using Cast (Text) Version



 On Jan 17, 2010, at 11:56 AM, Steve Sidney wrote:

  David

 Thanks, I'll try that..but no what I need is the total (1's) for
 each of the rows, labelled 1-6 at the top of each col in the table
 provided.


 Part of my confusion with your request (which remains unaddressed) is
 what you mean by valid. The melt-cast operation has turned a bunch  of
 NA's into 0's which are now indistinguishable from the original  0's. So I
 don't see any way that operating on b could tell you the  numbers you
 are asking for. If you were working on the original data,  res, you
 might have gotten the column-wise valid counts of column  2 with
 something like:

  sum( !is.na(res[,2]) )


 What I guess I am not sure of is how to identify the col after the  melt
 and cast.


 The cast object represents columns as a list of vectors. The i-th  column
 is b[[i]] which could be further referenced as a vector. So the  j-th row
 entry for the i-th column would be b[[i]][j].



 Steve

 - Original Message - From: David Winsemius
 dwinsem...@comcast.net
 
 To: Steve Sidney sbsid...@mweb.co.za
 Cc: r-help@r-project.org
 Sent: Sunday, January 17, 2010 4:39 PM
 Subject: Re: [R] Help using Cast (Text) Version



 On Jan 17, 2010, at 5:31 AM, Steve Sidney wrote:

  Sorry to repeat the meassage, not sure if the HTML version has  been
 received - Apologies for duplication

 Dear list

 I am trying to count the no of occurances in a column of a data   frame
 and there is missing data identifed by NA.

 I am able to melt and cast the data correctly as well as sum the
 occurances using margins and sum.

 Here are the melt and cast commands

 bw = melt(res, id=c(lab,r), pf_zbw)
 b = cast(bw, lab ~ r, sum, margins = T)

 Sample Data (before using sum and margins)

  lab  1  2  3  4  5  6
 1  4er66  1 NA  1  0 NA  0
 2  4gcyi  0  0  1  0  0  0
 3  5d3hh  0  0  0 NA  0  0
 4  5d3wt  0  0  0  0  0  0
 .
 . lines deleted to save space
 .
 69 v3st5 NA NA  1 NA NA NA
 70 a22g5 NA  0 NA NA NA NA
 71 b5dd3 NA  0 NA NA NA NA
 72 g44d2 NA  0 NA NA NA NA

 Data after using sum and margins

  lab 1 2 3 4 5 6 (all)
 1  4er66 1 0 1 0 0 0 2
 2  4gcyi 0 0 1 0 0 0 1
 3  5d3hh 0 0 0 0 0 0 0
 4  5d3wt 0 0 0 0 0 0 0
 5  6n44r 0 0 0 0 0 0 0
 .
 .lines deleted to save space
 .
 70 a22g5 0 0 0 0 0 0 0
 71 b5dd3 0 0 0 0 0 0 0
 72 g44d2 0 0 0 0 0 0 0
 73 (all) 5 2 4 3 5 726

 Uisng length just tells me how many total rows there are.



  What I need to do is count how many rows there is valid data, in   this
 case either a one (1) or a zero (0) in b


 I'm guessing that you mean to apply that test to the column in b
 labeled (all) . If that's the case, then something like  (obviously
 untested):

 sum( b$'(all)' == 1 | b$'(all)' == 0)




 I have a report to construct for tomorrow Mon so any help would be
 appreciated

 Regards
 Steve


 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help

Re: [R] coplots with missing values

2010-01-17 Thread Simon Knapp
Never used coplot, but why not try removing the rows with missing variables?
Perhaps with:

dat - cbind(E2=resid(baea.ancova6, type = normalized), year, population)
coplot(E2~year|population, data=dat[apply(dat, 1, function(x) !any(is.na
(x))),])

Regards,
Simon Knapp

On Mon, Jan 18, 2010 at 6:54 AM, LisaB lisaba...@hotmail.com wrote:


 Hello - I'm trying to construct a coplot to check the residuals conditioned
 on a factor using the following formula, however I keep getting an error
 message.  I've used the same formula for another data set and it worked
 fine
 - the only difference is that now I have missing values for each level (2
 levels) of the factor. Is this the problem and how can I fix it.  Thank
 you.

  E2 - resid(baea.ancova6, type = normalized)
  coplot(E2~year|population)
 Error in bad.lengths() : incompatible variable lengths

 --
 View this message in context:
 http://n4.nabble.com/coplots-with-missing-values-tp1016157p1016157.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] some help regarding combining columns from different files

2010-01-12 Thread Simon Knapp
Hi Hari,

You have not given examples of 'list1.bp.files.names' or
'list2.bp.files.names' and hence it is difficult to determine what the code
is trying to achieve. I will assume you want to create all pairwise merges
of the attached files. If this is indeed what you are doing, the problem is
not a problem at all, it is just that the intersection of values of the
variables geneDescription between any two files is the empty set, as is
demonstrated by the following code snippet (for each merge, TRUE is printed
if there is some overlap and FALSE is printed if there is not - a FALSE
implies concatenation in the code you have provided)

list1.bp.files.names - list2.bp.files.names -
c('colNamesgenesplitList1_alcoholMetobolic_genes.txt',
'colNamesgenesplitList1_cellcycle_genes.txt',
'colNamesgenesplitList2_alcoholMetabolism_genes.txt',
'colNamesgenesplitList2_cellcycle_genes.txt')

x.col.names - y.col.names -
c(genesymbol,geneDescription,orgSymbol,orgName)

for (i in 1:length(list1.bp.files.names)){
temp1 -
read.table(list1.bp.files.names[i],sep=\t,header=T,stringsAsFactors=F,quote='')
for (j in (i+1):length(list2.bp.files.names)){
if(i==length(list2.bp.files.names)) break
temp2 -
read.table(list2.bp.files.names[j],sep=\t,header=T,stringsAsFactors=F,quote='')
cat(any(apply(temp1, 1, function(x, y) x['geneDescription'] %in% y,
temp2['geneDescription'])), '\n')
}
}

A tip - there are trailing and/or leading whitespace on the variables
'geneDescription' and 'orgSymbol'. (in the latter case  RG  is a different
value to RG). This is probably going to give different results to what you
were expecting.

Hope this helps,
Simon Knapp

On Wed, Jan 13, 2010 at 8:48 AM, Harikrishnadhar hari.bom...@gmail.comwrote:

 Hi Jim,

 I am want to merge two files into one file :

 Here is my code . But the problem with this is that I am getting the 2nd
 file appended to the first when i write temp3 in my code to the text file.
 I
 am not sure what mistake I am doing .

 also find the test files to run the code .

 Please help me with this !!!

 temp1 - NULL
 temp2 - NULL
 x.col.names -c(genesymbol,geneDescription,orgSymbol,orgName)
 y.col.names - c(genesymbol,geneDescription,orgSymbol,orgName)
 for (i in 1:length(list1.bp.files.names)){
temp1 -

 read.table(list1.bp.files.names[i],sep=\t,header=T,stringsAsFactors=F,quote=\)
  for (j in 1:length(list2.bp.files.names)){
 temp2 -

 read.table(list2.bp.files.names[j],sep=\t,header=T,stringsAsFactors=F,quote=\)
temp3 - merge(temp1,temp2,by.x = x.col.names,by.y=y.col.names,all=T)
myfile-gsub(( ), , paste(1_,merge.bp.files.names[i],.txt))
write.table(temp3,file=myfile,sep=\t,quote=FALSE,row.names=F)
  }
  }
 Thanks
 --Hari--

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Calculate the percentages of the numbers in every column.

2010-01-12 Thread Simon Knapp
tmp - scan()
0 2 1 0 1 0 2 1 2 3 0 0 0 0 1 0 0 2 3 1

dat - matrix(tmp, byrow=T, ncol=4)

apply(dat, 2, function(x, min.val, max.val) {
tmp - table(x)/length(x)
res - rep(0, max.val - min.val + 1)
res[as.numeric(names(tmp)) - min.val + 1] - tmp
res
}, 0, 3)

Should do it (but I bet there is a more elegant way).

Regards,
Simon Knapp

On Wed, Jan 13, 2010 at 5:25 AM, Kelvin 6kelv...@gmail.com wrote:

 Dear friends,

 I have a table like this, I have A B C D ... levels, the first column
 you see is just the index, and there are different numbers in the
 table.

  A  B  C  D  ...
 10   2   1   0
 21   0   2   1
 32   3   0   0
 40   0   1   0
 50   2   3   1
 ...

 I want to calculate the frequencies or the percentages of the numbers
 in every column.

 How do I get a table like this, the first column is the levels of
 numbers, and the numbers inside the table are the percentages. All the
 percentages should add up to 1 in every column.

  A B  C D   ...
  0  0.2   0.3   0.1   0.1
  1  0.1   0.1   0.2   0.1
  2  0.1   0.2   0.2   0.2
  3  0.2   0.1   0.1   0
  ...

 Thanks your help!

 Kelvin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Recommended visualization for hierarchical data

2010-01-12 Thread Simon Knapp
Not sure if there are any 'recommended' visualisations, but the following is
a start (you will need to tune the scaling of the x-axis label and state
identifiers).

In the plot produced, the widths of the 'columns' are proportional of
purchases occurring in the state, and the height of the 'boxes' are
proportional to the purchases occurring in the County - hence the area of
each box represents the proportion of total (i.e. national) sales occurring
in the County.

#generate some random data
n - 100
dat - data.frame(
State=(tmp - sample(LETTERS, n, T)),
County=paste(tmp, 1:n, sep=''),
Purchases=floor(rnorm(n, 100, 20))
)

# draw the plot
split.dat - split(dat, dat$State)
widths - sapply(split.dat, function(x) sum(x$Purchases))
widths - widths / sum(widths)
cum.widths - c(0.0, cumsum(widths)[-length(widths)])
plot(0:1, 0:1, type='n', xaxt='n', yaxt='n', xlab='', ylab='')
par(usr=c(0,1,0,1), xpd=NA)
mapply(function(x, cum.tot, width) {
heights - x$Purchases / sum(x$Purchases)
cum.heights - c(0.0, cumsum(heights)[-length(heights)])
mapply(function(y, height, x, width) rect(x, y, x+width, y+height),
cum.heights, heights, MoreArgs=list(cum.tot, width))
text(cum.tot + width/2, heights/2 + cum.heights, x$County)
}, split.dat, cum.widths, widths)

axis(1, (widths / 2 + cum.widths), names(split.dat))



Hope this helps,
Simon Knapp

On Wed, Jan 13, 2010 at 2:46 PM, Rex C. Eastbourne rex.eastbou...@gmail.com
 wrote:

 On Tue, Jan 12, 2010 at 5:26 PM, Rex C. Eastbourne 
 rex.eastbou...@gmail.com
  wrote:

  Let's say I have data in the following schema that describes the number
 of
  purchases a company has received from each County in the US:
 
  State | County | Purchases
  ---
  NJ | Mercer | 550
  CA | Orange | 23
  
 
  I would like to visualize what states contribute the most to the overall
  total, and furthermore within those states, what Counties contribute the
  most. What are some recommended R visualizations for this type of data? I
  created a treemap using map.market from the portfolio library, like the
  following:
 
  http://zoonek2.free.fr/UNIX/48_R/g126.png
 
  Although this is an attractive visual, I want something that makes it
  easier to compare the relative sizes of components at a glance (hard with
 a
  treemap because rectangles have different aspect ratios). Does anyone
 have a
  recommended alternate visualization?
 
  Thanks!
 

 Just to clarify: I made up the above example for simplicity's sake to
 illustrate what I meant by hierarchical data. My actual data is not
 related to maps or geography, so a map-based visualization wouldn't work.

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] proto question

2009-01-07 Thread Simon Knapp
Dear R Users,

 

I have a couple of proto objects like:

 

wedge - proto(expr={

start.year - 2008

end.year - 2050

})

 

star.rating - wedge$proto(

star = c(4, 5, 8, 10),

gain = c(0, .3, .5, .7),

cost = c(0, 2100, 4000, 7500),



star.rating - function(., year) 6.0,

 

setup = function(.){

.$cost.for.star - approxfun(.$star, .$cost)

.$gain.for.star - approxfun(.$star, .$gain)

},

 

test = function(., year) {

gs - .$with(gain.for.star)(.$star.rating(year))

}

)

 

 

 

And a function to create and modify 'instances':

 

create.star.rating - function(switch.years, star.ratings) {

res - star.rating$proto(switch.years = switch.years, star.ratings =
star.ratings)

res$setup()

res

}

 

 

When I use them as follows:

ee - create.star.rating(ratings, switch.years)

ee$test(2009)

 

 

The second line gives me the error message:

Error in .$with(gain.for.star)(.$star.rating(year)) : 

   object y not found

 

I understand why this happens (when objects are inserted into a proto
their environment is set to that of the proto)... but I can't figure out
how to get around it!

 

I have tried setting the environments of the functions created in setup
as follows:

 

setup = function(.){

t1 - approxfun(.$star, .$cost)

t2 - approxfun(.$star, .$gain)

.$cost.for.star - t1

.$gain.for.star - t2

environment(.$cost.for.star) - environment(t1)

environment(.$gain.for.star) - environment(t2)

}

 

 

But then I get the error:

Error in get(gain.for.star, env = `*tmp*`, inherits = TRUE) : 

  object *tmp* not found

 

 

Which I think occurs because the last two lines in 'setup' only create
references to their respective environments which disappear on exit of
'setup'.

 

 

 

Any ideas?

 

Regards,

Simon Knapp.

 


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] using jackknife in linear models

2008-12-18 Thread Simon Knapp
The definition of jackknife implies that the result of theta must be a
numeric scalar. So, presuming you wanted to jackknife a coefficient,
theta would need to be something like:

theta - function(x, xdata, coefficient) coef(lm(model.lm,
data=xdata[x,]))[coefficient]

and would be called something like:
results - jackknife(1:n,theta,xdata=DF,coefficient=(Intercept))

To do all coefficients, you could write a function like like:
jackknife.apply - function(x, xdata, coefs) sapply(coefs,
function(coefficient) jackknife(x, theta, xdata=xdata,
coefficient=coefficient), simplify=F)

and call it like:
results - jackknife.apply(1:n, DF, c((Intercept), V1, V2, V3))

which would give you a list with the jackknife output for each coefficient.

note that you don't need to attach DF for your code to work :-)

Regards,
Simon Knapp.




On Fri, Dec 19, 2008 at 7:02 AM, Mark Heckmann mark.heckm...@gmx.de wrote:
 Hi R-experts,

 I want to use the jackknife function from the bootstrap package onto a
 linear model.
 I can't figure out how to do that. The manual says the following:

...

 So I tried:

  DF - as.data.frame(matrix(rnorm(250), ncol=5))
  attach(DF)
  model.lm - formula(V1 ~ V2 + V3 + V4)
  n - 50
  theta - function(x, xdata){ lm(model.lm, data=xdata) }
  results - jackknife(1:n,theta,xdata=DF)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I generate one vector for every row of a data frame?

2008-12-18 Thread Simon Knapp
Your code will always generate the same number of samples from each of
the normals specified on every call, where the number of samples from
each is (roughly) proportional to the weights column. If the weights
column in your data frame represents probabilities of draws coming
from each distribution, then this behaviour is not correct. Further,
it does not guarantee that the sample size is actually n.

This definition will work with arbitrary numbers of rows:

 gmm_data - function(n, data){
rows - sample(1:nrow(data), n, T, dat$weight)
rnorm(n, data$mean[rows], data$sd[rows])
}

and this one enforces a bit more sanity :-)

gmm_data - function(n, data, tol=1e-8){
if(any(data$sd  0)) stop(all of data$sd must be  0)
if(any(data$weight  0)) stop(all of data$weight must be  0)
wgts - if(abs(sum(data$weight) - 1)  tol) {
warning(data$weight does not sum to 1 - rescaling)
data$weight/sum(data$weight)
} else data$weight
rows - sample(1:nrow(data), n, T, wgts)
rnorm(n, data$mean[rows], data$sd[rows])
}

Regards,
Simon Knapp.

On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW)
bill...@u.washington.edu wrote:
 I am trying to generate a set of data points from a Gaussian mixture
 model.  My mixture model is represented by a data frame that looks
 like this:

 gmm
  weight mean  sd
 10.30 1.0
 20.2   -2 0.5
 30.44 0.7
 40.15 0.3

 I have written the following function that generates the appropriate data:

 gmm_data - function(n, gmm) {
c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd),
rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd),
rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd),
rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd))
 }

 However, the fact that my mixture has four components is hard-coded
 into this function.  A better implementation of gmm_data() would
 generate data points for an arbitrary number of mixture components
 (i.e. an arbitrary number of rows in the data frame).

 How do I do this?  I'm sure it's simple, but I can't figure it out.

 Thanks.
 --
 Bill McNeill
 http://staff.washington.edu/billmcn/index.shtml

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How do I generate one vector for every row of a data frame?

2008-12-18 Thread Simon Knapp
... actually, the scaling of the weights was not required as it is
done by sample anyway.

On Fri, Dec 19, 2008 at 5:16 PM, Simon Knapp sleepingw...@gmail.com wrote:
 Your code will always generate the same number of samples from each of
 the normals specified on every call, where the number of samples from
 each is (roughly) proportional to the weights column. If the weights
 column in your data frame represents probabilities of draws coming
 from each distribution, then this behaviour is not correct. Further,
 it does not guarantee that the sample size is actually n.

 This definition will work with arbitrary numbers of rows:

  gmm_data - function(n, data){
rows - sample(1:nrow(data), n, T, dat$weight)
rnorm(n, data$mean[rows], data$sd[rows])
 }

 and this one enforces a bit more sanity :-)

 gmm_data - function(n, data, tol=1e-8){
if(any(data$sd  0)) stop(all of data$sd must be  0)
if(any(data$weight  0)) stop(all of data$weight must be  0)
wgts - if(abs(sum(data$weight) - 1)  tol) {
warning(data$weight does not sum to 1 - rescaling)
data$weight/sum(data$weight)
} else data$weight
rows - sample(1:nrow(data), n, T, wgts)
rnorm(n, data$mean[rows], data$sd[rows])
 }

 Regards,
 Simon Knapp.

 On Fri, Dec 19, 2008 at 4:14 PM, Bill McNeill (UW)
 bill...@u.washington.edu wrote:
 I am trying to generate a set of data points from a Gaussian mixture
 model.  My mixture model is represented by a data frame that looks
 like this:

 gmm
  weight mean  sd
 10.30 1.0
 20.2   -2 0.5
 30.44 0.7
 40.15 0.3

 I have written the following function that generates the appropriate data:

 gmm_data - function(n, gmm) {
c(rnorm(n*gmm[1,]$weight, gmm[1,]$mean, gmm[1,]$sd),
rnorm(n*gmm[2,]$weight, gmm[2,]$mean, gmm[2,]$sd),
rnorm(n*gmm[3,]$weight, gmm[3,]$mean, gmm[3,]$sd),
rnorm(n*gmm[4,]$weight, gmm[4,]$mean, gmm[4,]$sd))
 }

 However, the fact that my mixture has four components is hard-coded
 into this function.  A better implementation of gmm_data() would
 generate data points for an arbitrary number of mixture components
 (i.e. an arbitrary number of rows in the data frame).

 How do I do this?  I'm sure it's simple, but I can't figure it out.

 Thanks.
 --
 Bill McNeill
 http://staff.washington.edu/billmcn/index.shtml


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Fwd: Categorial Response Questions

2008-10-18 Thread Simon Knapp
Not the same data your using but...

# get something like a reasonable
# age dist for dummy data:
data('mw.ages', package='UsingR')
mw.age - mw.ages[2:80,]

# size of dummy dataset
n.obs - 1000

#pr of survival
pr.s - 0.80

# dummy data
dat - data.frame(
pclass=sample(c('1st', '2nd', '3rd'), n.obs, T),
age=sample(1:79, n.obs, T, apply(mw.age, 1, sum)/sum(mw.age)),
sex=sample(c('male', 'female'), n.obs, T, apply(mw.age, 2,
sum)/sum(mw.age)),
survived=sample(0:1, n.obs, T, c(1-pr.s, pr.s))
)
dat$age.group - cut(dat$age, seq(0, 80, 10), right=F)


#-
# get the logits
attach(dat)
logits - by(dat, list(age.group, sex, pclass), function(x){
p - sum(x$survived)/nrow(x)
p/(1-p)
})
detach(dat)
#-
#-
# the class of the object logits is 'by' -
# this function munges it into a data frame
# (note that 'as.data.frame' is generic)

# ARGS
# x   an object of class 'by'
# out.names names you want to use
# in the output data frame.

as.data.frame.by - function(x, out.names=NULL) {
nms - dimnames(x)
no.levs - sapply(nms, length)
res - as.numeric(x)
for(i in 1:length(nms)){
res - data.frame(
if(i==1)
rep(nms[[i]], times=prod(no.levs[(i+1):length(nms)]))
else if(i==length(nms))
rep(nms[[i]], each=prod(no.levs[1:(i-1)]))
else
rep(nms[[i]], each=prod(no.levs[1:(i-1)]),
times=prod(no.levs[(i+1):length(nms)]))
, res)
}
names(res) - out.names
res
}

as.data.frame(logits, c('age.group', 'sex', 'pclass', 'logit'))
#-





2008/10/18 andyer weng [EMAIL PROTECTED]:
 hi all,

 For my question in the first email below, I found I made a mistake on
 my coding in the previous email, the one I was trying to type should
 be

 grouped.titanic.df-data.frame(group.age.group=sort(unique(titanic.df$age.group)),
 + 
 expand.grid(sex=sort(unique(titanic.df$sex)),pclass=sort(unique(titanic.df$pclass))),
 + r=as.vector(tapply(titanic.df$survived,titanic.df$age.group,sum)),
 + n=as.vector(tapply(titanic.df$survived,titanic.df$age.group,length)))

 Error in data.frame(group.age.group = sort(unique(titanic.df$age.group)),  :
  arguments imply differing number of rows: 8, 6


 please advise what I have done wrong? why the error message come up.
 Am I doing the right thing to fix the question i mentioned in the
 first email (the bottom email)?

 Cheers. Andyer


 -- Forwarded message --
 From: andyer weng [EMAIL PROTECTED]
 Date: 2008/10/18
 Subject: Fwd: Categorial Response Questions
 To: r-help@r-project.org


 hi all,

 me again. i try to type the following coding for my question below,
 but it comes up a error messgae. please advise whether the way i was
 trying to do will solve my question stated in the previous email. If
 so , please advise what is wrong with my coding.
 (p.s. all the data are stored in xxx.df)

  grouped.xxx.df-data.frame(group.age.group=sort(unique(xxx.df$age.group)),
 + 
 expand.grid(sex=c(female,male),age.group=c(0-9,10-19,20-29,30-39,40-49,50-59,60-69,70-79)),
 + r=tapply(xxx.df$survived,titanic.df$age.group,sum),
 + n=tapply(xxx.df$survived,titanic.df$age.group,length))

 Error in data.frame(group.age.group = sort(unique(xxx.df$age.group)),  :
  arguments imply differing number of rows: 8, 16
 In addition: Warning messages:
 1: In Ops.factor(left) : + not meaningful for factors
 2: In Ops.factor(left) : + not meaningful for factors


 thanks millions.

 Regards,
 Andyer






 -- Forwarded message --
 From: andyer weng [EMAIL PROTECTED]
 Date: 2008/10/18
 Subject: Fwd: Categorial Response Questions
 To: r-help@r-project.org


 Sorry Guys, i press the wrong button to send out the uncompleted message.

 let me do it again.

 my purpose for below questions  is to assess the effect of class, age
 and sex on the survival.


 I have a data set containing :

 pclass:  A factor giving the class of the passenger: one of 1st, 2nd, 3rd.
 age:  The age of the passenger in years.
 sex:  Passenger's gender: female or male
 age.group:Passengers age group, one of 0‐9 , 10‐19, 20‐29,
 30‐39, 40‐49, 50‐59, 60‐69,70‐79
 survived:Passenger's survival (1=survived, 0=did not survive)

 Ignoring the variable age,
 - I need to group the data into groups corresponding to each
 age‐group/sex/class combination,
 - I need to compute the logits for each combination.
 - Make a data frame containing the logits, and the categorical
 variables. I need to have one line in the data frame for each
 combination of the factor levels.

 Can someone please help with the R code for above???!!!

 Thanks millions!!

 Cheers
 Andyer.

 __
 R-help@r-project.org mailing list
 

Re: [R] R script from Python

2008-10-17 Thread Simon Knapp
'Rterm --help' shows the usage as:

Rterm [options] [ infile] [outfile] [EnvVars]

just in case you didn't understand what the angle brackets meant:

the term [ infile] means read input from 'infile', and
the term [ outfile] means write output to 'outfile'.

Though your code works at the command line, the spacing appeared to
imply that you thought the angle brackets simply 'wrapped' the input
file.

This syntax doesn't work from python as the angle brackets are OS
shell operators but won't be interpreted that way when used to
construct a Popen object.

the following will do what you want:

import subprocess
command  = 'c:\\Program Files\\R\\R-2.7.2\\bin\\Rterm.exe --vanilla -q
-f d:\\test\\run\\geneBank.r'
subprocess.Popen(command, stdout=open('d:\\output.out', 'w')).wait()

Regards,
Simon






On Sat, Oct 18, 2008 at 10:28 AM, Tomislav Puđa [EMAIL PROTECTED] wrote:
 Hi,

 I'm trying to execute R-script from Python. I'm using R 2.7.2, Python 2.5
 and WinXP.
 I don't won't to use Python/R interface because of nature of project.

 Python code :

 import subprocess
 command  = 'c:\\Program Files\\R\\R-2.7.2\\bin\\Rterm.exe --vanilla -q
 d:\\test\\run\\geneBank.r d:\\output.out'
 subprocess.Popen(command).wait()

 After that, I get error messages on Rterm.exe terminal :

 ARGUMENT 'd:\lloydMax.r' __ignored__

 ARGUMENT 'd:\output.txt' __ignored__

 Why ?

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Please help me in Converting this from C# to R

2008-09-14 Thread Simon Knapp
# bit hard to provide a simple conversion without definitions of the
class 'Node', the template 'DirectedGraph' and the function 'Writed'!
# I've used the package 'igraph' as a drop in - hope it is still clear.
#
# by the way:
# - your curly braces don't match,
# - not all elements of P are initialised before they are used.

#
# original code (cleaned to make comparison easier).
#
#Random r = new Random();
#DirectedGraphSimpleNode graph = GetGraph();
#decimal B = 0.1m;
#decimal D = 0.05m;
#int nodes = graph.NodesCount;
#decimal[] E = new decimal[nodes];
#decimal[] P = new decimal[nodes];
#
#for (int i = 7; i = 10; ++i) P[i] = (decimal)r.NextDouble();
#
#for (int t = 0; t  100; ++t){
#Writed(P, P);
#
#foreach (SimpleNode n in graph.Nodes) {
#int id = graph.index[n];
#
#decimal product = 1;
#foreach (var item in graph.GetAdjacentNodes(n)){
#int j = graph.index[item];
#product *= (1 - B * P[j]);
#}
#
#E[id] = product;
#}
#
#foreach (SimpleNode n in graph.Nodes){
#int i = graph.index[n];
#P[i] = 1 - ((1 - P[i]) * E[i] + D * (1 - P[i]) * E[i] + 0.5m
* D * P[i] * (1 - E[i]));
#if (P[i]  0) P[i] = 0;
#}
#}
#
#}


#
# drop-in for your method getGraph (produces a 10 'random' node
directed graph). I only assign to temporary so I can use the same
'grph' and 'P' in both implementations.
#
library(igraph)
GetGraph - function() graph.adjacency(matrix(sample(0:1, size=100,
replace=T), nrow=10))
grph.t - GetGraph()
P.t - runif(nodes) # assume you meant to initialise all elements of P

#
# IMPLEMENTATON 1.
# A 'mirror' implementation. Some of the code relies
# on the specifics of package igraph, but I've tried to
# be as similar as possible. Hope it still makes sense!
#
B - 0.1
D - 0.05
grph - grph.t
nodes - vcount(grph)
E - numeric(nodes)
P - P.t

for(t in 0:99){
cat('P:', P, '\n')# is this equivalent to 'Writed(P, P)' ???
graph.Nodes - get.adjlist(grph) # returns a list of vectors,
where each vector is the nodes a node is connected to.
id - 0 # we loop over the vectors and so must index separately
for(n in graph.Nodes){ # n is a vector containing the verticies
the vertex at index id+1 is connected to.
id - id+1
product - 1;
for(item in n){
product - product * (1 - B * P[item+1]); # verticies are
indexed from 0. no operator*= in R.
}
E[id] - product;
}

at - 0
for(i in 1:nodes){
P[i] - 1 - ((1 - P[i]) * E[i] + D * (1 - P[i]) * E[i] + 0.5 *
D * P[i] * (1 - E[i])); # we are accessing nodes in order so the
indexes are also ordered.
if (P[i]  0) P[i] - 0;
}
}

P # print the result

#
# IMPLEMENTATION 2.
# a more 'R-ish' implementation.
#
B - 0.1
D - 0.05
P - P.t
grph - grph.t

for(t in 0:99){
E - sapply(get.adjlist(grph), function(node) prod(1-B*P[node+1]))
P - 1 - ((1 - P) * E + D * (1 - P) * E + 0.5 * D * P * (1 - E))
}

P # print the result

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with misclass function on tree classification

2008-09-14 Thread Simon Knapp
Did you say:

library(tree)

at the top of your script?

On Sun, Sep 14, 2008 at 5:47 PM, Meir Preiszler
[EMAIL PROTECTED] wrote:

 I am working through Tom Minka's lectures on Data Mining and am now on Day 
 32. The following
 is the link: 
 http://alumni.media.mit.edu/~tpminka/courses/36-350.2001/lectures/day32/
 In order to use the functions cited I followed the instructions as follows:

 Installed tree package from CRAN mirror (Ca-1)
 Downloaded and sourced the file tree.r
 Downloaded the function clus1.r

 Having defined a tree tr, when I write misclass(tr,x$test) as shown in the 
 link
 I get an error message that R does not find the function pred1.tree.

  Is this function included in the tree package? If so it was not in my 
 download. Is this a bug?
 Do you know of a fix?

 Thanks for your help
 Meir

  clus1.r  tree.r

 
 Meir Preiszler - Research Engineer
 I t a m a r M e d i c a l Ltd.
 Caesarea, Israel:
 Tel: +(972) 4 617 7000 ext 232
 Fax: +(972) 4 627 5598
 Cell: +(972) 54 699 9630
 Email: [EMAIL PROTECTED]
 Web: www.Itamar-medical.com
 *




 88---8---
  This E-mail is confidential information of Itamar medical Ltd. It may also
  be legally privileged. If you are not the addressee you may not copy, 
 forward,
  disclose or use any part of it. If you have received this message in error,
  please delete it and all copies from your system and notify the sender
  immediately by return E-mail. Internet communications cannot be guaranteed
  to be timely, secure, error or virus-free. The sender does not accept
  liability for any errors or omissions. Before printing this email ,
  kindly think about the environment.   Itamar Medical Ltd. MIS Yan Malgin.
 88---8---


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Join data by minimum distance

2008-09-14 Thread Simon Knapp
 I am wondering if there is a function which will do a join between 2 
 data.frames by minimum distance, as it is done in ArcGIS for example. For 
 people who are not familiar with ArcGIS here it is an explanation:

 Suppose you have a data.frame with x, y, coordinates called track, and a 
 second data frame with different x, y coordinates and some other attributes 
 called classif. The track data.frame has a different number of rows than 
 classif. I want to join the rows from classif to track in such a way that for 
 each row in track I add only the row from classif that has coordinates 
 closest to the coordinates in the track row (and hence minimum distance in 
 between the 2 rows), and also add a new column which will record this minimum 
 distance. Even if the coordinates in the 2 data.frames have same name, the 
 values are not identical between the data.frames, so a merge by column is not 
 what I am after.



#---
# get the distance between two points on the globe.
#
# args:
# lat1 - latitude of first point.
# long1 - longitude of first point.
# lat2 - latitude of first point.
# long2 - longitude of first point.
# radius - average radius of the earth in km
#
# see: http://en.wikipedia.org/wiki/Great_circle_distance
#---
greatCircleDistance - function(lat1, long1, lat2, long2, radius=6372.795){
sf - pi/180
lat1 - lat1*sf
lat2 - lat2*sf
long1 - long1*sf
long2 - long2*sf
lod - abs(long1-long2)
radius * atan2(
sqrt((cos(lat1)*sin(lod))**2 +
(cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2),
sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod)
)
}

#---
# Calculate the nearest point using latitude and longitude.
# and attach the other args and nearest distance from the
# other data.frame.
#
# args:
# x as you describe 'track'
# y as you describe 'classif'
# xlongnme name of longitude variable in x
# xlatnme name of latitude location variable in x
# ylongnme name of longitude location variable on y
# ylatnme name of latitude location variable on y
#---
dist.merge - function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){
tmp - t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){
dists - apply(y, 1, function(x, y) greatCircleDistance(x[2],
x[1], y[2], y[1]), x)
cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
}
, y[,c(ylongnme, ylatnme)]))
tmp - cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme,
ylatnme), names(y))])
row.names(tmp) - NULL
tmp
}

# demo
track - data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90))
classif - data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90),
v1=letters[1:20], v2=1:20)
dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Question on substitute.

2008-03-11 Thread Simon Knapp
I have a data frame of around 4500 rows which contain file name
prefixes and other interesting variables. each prefix corresponds to
multiple files and a I only have 4300 complete sets of files. If any
one of the files is missing for a given prefix, then the function I
use fails. The effect is emulated by the function f and data
dummy.data:

f - function(x) if(runif(1)  0.1) stop('oops') else rnorm(10)
dummy.data - data.frame(cbind(paste('a', 1:100, sep=''),
matrix(rnorm(9000),nrow=100, ncol=9)))

I would like the results as a data.frame and if it weren't for the
errors would do:

res - apply(dummy.data, 1, function(x) try(f(x))) #... though I
wouldn't really need the try would I.
#dimnames(res)[[2]] - as.character(dummy.data[,1])

but the errors make res a list and the second line fails. I can do:

res - NULL
for(i in 1:nrow(dummy.data)) if(class(tmp - try(f(dummy.data[i,])))
!= 'try-error') res - cbind(res, tmp)

which gives me the correct result, but I don't know which column of
res belongs to which row of dummy.data.

I can get the desired result with with:

#--- desired result 
res - NULL
for(i in 1:nrow(dummy.data)) if(class(tmp - try(f(dummy.data[i,])))
!= 'try-error')
eval(parse(text=paste(res - cbind(res,, dummy.data[i,1],
=tmp), sep='')))
#--

but it is sort of ugly and I was hoping I could do something like:

res - NULL
for(i in 1:nrow(dummy.data)) if(class(tmp - try(f(dummy.data[i,])))
!= 'try-error')
res - eval(substitute(cbind(res, a=b), list(a=dummy.data[i,1], b=tmp)))

but the a is not being substituted (not that this is prettier, but I'm
trying to come to grips with
substitute/do.call/calls/expressions/...). I guess that argument names
are a different beast to argument values in expressions.



My question (finally) is: is there a way of achieving 'desired
result', but using 'quote', 'substitute' or any other similar
functions/idioms I've not run into yet? Alternatively, is there a
completely different way?

Any help appreciated,
Simon Knapp.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.