[R] Remove individual rows from a matrix based upon a list

2012-03-20 Thread Grant Gillis
Dear All,

Thanks in advance for any help.  I have a square matrix of measures of
interactions among individuals and would like to calculate a values from a
function (colSums for example) with a single individual (row) excluded in
each instance.  That individual would be returned to the matrix before the
next is removed and the function recalculated.  I can do this by hand
removing rows based upon ids however I would like specify individuals to be
removed from a list (lots of data).

An example matrix:

MyMatrix
   E985047 E985071 E985088 F952477 F952478 J644805 J644807 J644813  E985047
1 0.09 0 0 0 0 0 0.4  E985071 0.09 1 0 0 0 0 0 0.07  E985088 0 0 1 0 0 0
0.14 0  F952477 0 0 0 1 0.38 0 0 0  F952478 0 0 0 0.38 1 0 0 0  J644805 0 0
0 0 0 1 0.07 0  J644807 0 0 0.14 0 0 0.07 1 0  J644813 0.4 0.07 0 0 0 0 0 1
Example list of individuals to be removed

MyList

E985088






F952477







F952478


If I were to do this by hand it would look like

MyMat1 - MyMatrix[!rownames(MyMatrix)%in% E985088,]
colSums(MyMat1)

MyMat2 -  MyMatrix[!rownames(MyMatrix)%in%  F952477 ,]
colSums(MyMat2)

MyMat3 -  MyMatrix[!rownames(MyMatrix)%in%  F952478 ,]
colSums(MyMat3)

How might I replace the individual ids (in quotes) with a list and remove
rows corresponding to that list from the matrix for the calculation and
returning the row to the list after each calculation before the next.

I hope I've been clear!!

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] calculating time interval distributions

2011-11-16 Thread Grant Gillis
Dear List,

I have data on a approximately 100 individuals visiting a a central logging
station over a 1000 times.  I would like to be able to calculate the
distribution of inter-visit time intervals for all possible pairs am stuck
on how to code for this.  Single pairs are not a problem but extending it
has been difficult for me.  So for the toy data below I'd like to calculate
for each 'a' how long until I see 'b' as well as for each 'b', how long
until I see 'a' and so on for all possible pairs (2 triangle of a matrix?)

Thanks for any help, hints, and suggestions.

Grant


toy data:

ind -  c('a', 'b', 'a', 'c', 'b', 'b', 'c', 'a', 'c')


sec - c(1, 3, 5, 6, 12, 22, 66, 85, 99)

What I am looking for

ab
2

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tricky (for me) merging of data...more clarity

2011-03-04 Thread Grant Gillis
For our animals we are comfortable with saying that body condition
represents a roughly 30 day period, plus and minus 15 days from
measurement.  However, we have monitored individuals for longer periods and
during those periods we do not wish there to be any values for body
condition.  There are other data associated with those days.  For some
analyses we will use data from the periods with body condition data and for
others not.

Thanks very much for looking into this!  An R solution would save tons of
time and potential mistakes.

On 2 March 2011 09:40, Tal Galili tal.gal...@gmail.com wrote:

 Question,
 How do you know that the following two rows should have NAs ?

 1 16/02/87 NA NA
 1 17/02/87 NA NA


 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --




 On Tue, Mar 1, 2011 at 11:06 AM, Grant Gillis grant.j.gil...@gmail.comwrote:

 1 16/02/87 NA NA
 1 17/02/87 NA NA




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] tricky (for me) merging of data...more clarity

2011-03-01 Thread Grant Gillis
Hi Again,

Thanks very much for your response.  It seems my example got rearranged
(transposed?) after I posted it.  Hopefully this example will be more
clear.  I have one file (ex. sheet 1) that will have a column for
individuals (ind) and a column for the date (date).  I would like to merge
this with another file (ex. sheet 2) that has both the 'ind' and date column
as well as associated body condition measurements (BC1 and BC2).

My problem: The body condition values were measured intermittently
throughout an individuals history but for our purposes we would like to
treat them as representative of 15 days before and after measurement and
days outside of this window should have NAs (there are other data associated
with those days).  When I merge these to files, is there a way to write
these body condition values forward and back 15 days?This would give me
something that looks like sheet 3.

Thank you!


Sheet 1


ind date
1 01/02/87
1 02/02/87
1 03/02/87
1 04/02/87
1 05/02/87
1 06/02/87
1 07/02/87
1 08/02/87
1 09/02/87
1 10/02/87
1 11/02/87
1 12/02/87
1 13/02/87
1 14/02/87
1 15/02/87
1 16/02/87
1 17/02/87
1 18/02/87
1 19/02/87
1 20/02/87
1 21/02/87
1 22/02/87
1 23/02/87
1 24/02/87
1 25/02/87
1 26/02/87
1 27/02/87
1 28/02/87
1 01/03/87
1 02/03/87
1 03/03/87
1 04/03/87
1 05/03/87
1 06/03/87
1 07/03/87
1 08/03/87
1 09/03/87
1 10/03/87
1 11/03/87
1 12/03/87
1 13/03/87
1 14/03/87
1 15/03/87
1 16/03/87
1 17/03/87
1 18/03/87
1 19/03/87
1 20/03/87
1 21/03/87
1 22/03/87
1 23/03/87
1 24/03/87

Sheet 2

ind   DateBC1 BC2
1 01/02/87 33 3
1 03/03/87 44 3


Sheet 3
ind date BC1 BC2
1 01/02/87 33 3
1 02/02/87 33 3
1 03/02/87 33 3
1 04/02/87 33 3
1 05/02/87 33 3
1 06/02/87 33 3
1 07/02/87 33 3
1 08/02/87 33 3
1 09/02/87 33 3
1 10/02/87 33 3
1 11/02/87 33 3
1 12/02/87 33 3
1 13/02/87 33 3
1 14/02/87 33 3
1 15/02/87 33 3
1 16/02/87 NA NA
1 17/02/87 NA NA
1 18/02/87 44 3
1 19/02/87 44 3
1 20/02/87 44 3
1 21/02/87 44 3
1 22/02/87 44 3
1 23/02/87 44 3
1 24/02/87 44 3
1 25/02/87 44 3
1 26/02/87 44 3
1 27/02/87 44 3
1 28/02/87 44 3
1 01/03/87 44 3
1 02/03/87 44 3
1 03/03/87 44 3
1 04/03/87 44 3
1 05/03/87 44 3
1 06/03/87 44 3
1 07/03/87 44 3
1 08/03/87 44 3
1 09/03/87 44 3
1 10/03/87 44 3
1 11/03/87 44 3
1 12/03/87 44 3
1 13/03/87 44 3
1 14/03/87 44 3
1 15/03/87 44 3
1 16/03/87 44 3
1 17/03/87 44 3
1 18/03/87 44 3
1 19/03/87 44 3
1 20/03/87 44 3
1 21/03/87 NA NA
1 22/03/87 NA NA
1 23/03/87 NA NA
1 24/03/87 NA NA



On 27 February 2011 20:49, Tal Galili tal.gal...@gmail.com wrote:

 Hi Grant,
 I don't have a solution, but just to be clearer on your situation:

 One row from sheet 2 looks like this:
 BC1 BC2  1 01/02/87 33 3  1 03/03/87 44 3
 ?

 Are you using only the first 6 columns for the data to be replicated, and
 using the other columns as some sort of indicators on when a sequence ends?




 If so, I would suggest asking the group how you might be able to turn sheet
 2 so that it will have as many rows as you needs (which you will then merge
 with sheet 1).

 And for clarity sake, consider using ?dput for your objects.
 Looking at data the way you pasted them (also, without column names) is not
 very easy for the reader (which might reduce your chances of getting help).

 Cheers,
 Tal

 Contact
 Details:---
 Contact me: tal.gal...@gmail.com |  972-52-7275845
 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
 www.r-statistics.com (English)

 --





 On Sun, Feb 27, 2011 at 6:41 PM, Grant Gillis grant.j.gil...@gmail.comwrote:

 BC1 BC2  1 01/02/87 33 3  1 03/03/87 44 3




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] tricky (for me) merging of data

2011-02-27 Thread Grant Gillis
Dear List,

I am having trouble with a tricky merging task.  I have one data sheet that
has dates (continuous) that radio collared individuals were monitored via
telemetry.  I have a different sheet containing data from instances where
individuals  were recaptured and associated body condition data was recorded
(sheet 2).  I would like to merge the two sheets by individual and date (I
can do this with the merge function) but I would also like to copy the body
condition data ahead and behind 15 days (or less if the individual was
recorded fewer than 15 days in either direction) from the date it was
recorded (this is where I'm stuck).

Thank you very much!
Grant

For example sheet 1 would be merged with sheet 2 to give sheet 3:

Sheet 1


   ind date  1 01/02/87  1 02/02/87  1 03/02/87  1 04/02/87  1 05/02/87  1
06/02/87  1 07/02/87  1 08/02/87  1 09/02/87  1 10/02/87  1 11/02/87  1
12/02/87  1 13/02/87  1 14/02/87  1 15/02/87  1 16/02/87  1 17/02/87  1
18/02/87  1 19/02/87  1 20/02/87  1 21/02/87  1 22/02/87  1 23/02/87  1
24/02/87  1 25/02/87  1 26/02/87  1 27/02/87  1 28/02/87  1 01/03/87  1
02/03/87  1 03/03/87  1 04/03/87  1 05/03/87  1 06/03/87  1 07/03/87  1
08/03/87  1 09/03/87  1 10/03/87  1 11/03/87  1 12/03/87  1 13/03/87  1
14/03/87  1 15/03/87  1 16/03/87  1 17/03/87  1 18/03/87  1 19/03/87  1
20/03/87  1 21/03/87  1 22/03/87  1 23/03/87  1 24/03/87
Sheet 2

   ind
BC1 BC2  1 01/02/87 33 3  1 03/03/87 44 3

Sheet 3
   ind date BC1 BC2  1 01/02/87 33 3  1 02/02/87 33 3  1 03/02/87 33 3  1
04/02/87 33 3  1 05/02/87 33 3  1 06/02/87 33 3  1 07/02/87 33 3  1 08/02/87
33 3  1 09/02/87 33 3  1 10/02/87 33 3  1 11/02/87 33 3  1 12/02/87 33 3  1
13/02/87 33 3  1 14/02/87 33 3  1 15/02/87 33 3  1 16/02/87 NA NA  1
17/02/87 NA NA  1 18/02/87 44 3  1 19/02/87 44 3  1 20/02/87 44 3  1
21/02/87 44 3  1 22/02/87 44 3  1 23/02/87 44 3  1 24/02/87 44 3  1 25/02/87
44 3  1 26/02/87 44 3  1 27/02/87 44 3  1 28/02/87 44 3  1 01/03/87 44 3  1
02/03/87 44 3  1 03/03/87 44 3  1 04/03/87 44 3  1 05/03/87 44 3  1 06/03/87
44 3  1 07/03/87 44 3  1 08/03/87 44 3  1 09/03/87 44 3  1 10/03/87 44 3  1
11/03/87 44 3  1 12/03/87 44 3  1 13/03/87 44 3  1 14/03/87 44 3  1 15/03/87
44 3  1 16/03/87 44 3  1 17/03/87 44 3  1 18/03/87 44 3  1 19/03/87 44 3  1
20/03/87 44 3  1 21/03/87 NA NA  1 22/03/87 NA NA  1 23/03/87 NA NA  1
24/03/87 NA NA

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding copies of rows toa data frame based upon start and end dates

2010-10-28 Thread Grant Gillis
Hello All and thanks in advance for any advice.

I have a data frame with rows corresponding radio-collared animals (see
sample data below).  There is a start date (DATESTART), and end date
(DATEEND), and the number of days on air (DAYSONAIR).  What I would like to
do is add a column called DATE to so that each row ID has a row for every
day that the radio collar was on the air while copying all other data.  For
example ID 1001 would expand into 48 rows beginning with 4/17/91 and ending
with 6/4/91.  all other values would remain the same for each new
rowUnfortunately I have not gotten anywhere with my attempts


Thank you!!




IDGRIDFOODWB1WB2SADRUGFREQDATESTART
DATECOLLARDATEENDDAYSONAIR
100110319999FAI14824/17/91
4/17/916/4/9148.00
100210659671MAC14084/17/91
4/17/916/25/9169.00
100310325662FAI07694/17/91
4/17/916/4/93779.00
100410322655FAC15614/18/91
4/18/915/27/9139.00
93100510654899MAI12884/18/91
4/18/915/27/9139.00
94100610301651MAC15934/18/91
4/18/917/11/9184.00
95100710349669FAI15214/18/91
4/18/9111/2/91198.00

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Variance inflation factor

2010-08-10 Thread Grant Gillis
Hello all and thanks in advance for any advice.

I would like to calculate the variance inflation factor for a linear model
(lm) with 4 explanatory variables.  I would then like to use this to
calculate QAIC.  I have used the function vif() in the car package and I get
values for each variable however the equation for QAIC seems to need a
single variance inflation factor for the global model.  Can I calculate this
based upon the output from this function and if so can someone help me
understand how?

Cheers,
G.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with adding points to allEffects plot

2010-03-09 Thread Grant Gillis
Thanks in advance for any help.

I am attempting to add points to a plot using the allEffects command in the
effects package.  When I try to add the points I get the following error
message:

Error in plot.xy(xy.coords(x, y), type = type, ...) :
  plot.new has not been called yet

Strangely, using the code I've pasted below this has worked for me in the
past however figuring out what has changed has proved to be beyond me.

Cheers,

Grant

 y-c(1,3,2,4,5)
 x-c(1,2,3,4,5)


 GSMOD-lm(y~x)

 plot(allEffects(GSMOD), ask=F)
 points(y, x)
Error in plot.xy(xy.coords(x, y), type = type, ...) :
  plot.new has not been called yet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] multinom() and multinomial() interpretation

2009-02-23 Thread Grant Gillis
Hello and thanks in advance for any advice.

I am not clear how, in practice, the multinom() function in nnet and the
multinomial() function in VGAM differ in terms of interpretation.  I
understand that they are fit differently.  Are there certain scenarios where
one is more appropriate than the other?  In my case I have a dependent
variable with 4 categories and 1 binary and 4 continuous independent
variables.  I am fitting 3 models.

Cheers
Grant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] merging files with different structures

2009-02-17 Thread Grant Gillis
Hello list,

Thanks in advance for any help.

I have many (approx 20) files that I have merged.  For example

d1-read.csv(AlleleReport.csv)
d2-read.csv(AlleleReport.csv)

m1 - merge(d1, d2,  by = c(IND, intersect(colnames(d1), colnames(d2))),
all = TRUE)
m2 - merge(m1, d3,  by = c(IND, intersect(colnames(m1), colnames(d3))),
all = TRUE)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging files with different structures

2009-02-17 Thread Grant Gillis
 Hello list,

 I am sorry for the previous half post.  I accidentily hit send.  Thanks
 again in advance for any help.

 I have many (approx 20) files that I have merged.  Each data set contains
 rows for individuals and data in 2 - 5 columns (depending upon which data
 set).  The individuals in each data set are not necessarily the same and are
 all duplicated (different data in columns) across sheets I am trying to
 merge.  I have used the merge function

For example

 d1-read.csv(AlleleReport.csv)
 d2-read.csv(AlleleReport.csv)

 m1 - merge(d1, d2,  by = c(IND, intersect(colnames(d1), colnames(d2))),
 all = TRUE)
 m2 - merge(m1, d3,  by = c(IND, intersect(colnames(m1), colnames(d3))),
 all = TRUE)



My problem is that when the data is merged it looks something like


Ind   L1 L1.1L2 L2.1L3   L3.1
a 12  13  NA NA  NA  NA
a  NA NA 22 43   34   45
b  14  1545   64   NA  NA
b   NANANA  NA 99  84

Is there a way that I can merge the rows for each individual?

Cheers


Grant
















[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Help with a permutation test

2008-12-12 Thread Grant Gillis
Hello List and thanks in advance for all of your help,


I am trying implement a permutation test of a multinomial logistic
regression ('multinom' within the nnet package).  In the end I want to
compare the parameter estimate from my data to the distribution of
randomized parameter estimates.

I have figured out how to permute my dependent variable (MNNUM) x number of
times, apply multinomial logistic regression, to each permutation, and save
the results in a list.  Where I am stuck is figuring out how to take the
mean and SD of the coefficients from my list of regressions.  I know that
the coefficients are stored in the $wts slot of the model.

Below is what I have so far.  I am sure there are nicer ways to do this and
if you feel so inclined please suggest them.



#this is a function to permute the MNNUM column once
rand- function(DF){
new.DF-DF
new.DF$MNNUM-sample(new.DF$MNNUM)
new.DF

}

#this function does one model I am interested in.

modeltree-function(DF){
MLM.plot - multinom(MN_fact  ~ Canpy  + mean_dbh  + num_beechoak  +
num_class5  + prop_hard , data=hfdata, trace=FALSE)
MLM.plot
}

# this replicates the 'rand' function and applies a model

resamp.funct-function(DF,funct, n){
list-replicate(n,rand(DF), simplify = FALSE)
sapply(list, funct, simplify = FALSE)
}


#So if I paste below:

l-resamp.funct(hfdata, modeltree, 3)

# I get


 l-resamp.funct(hfdata, modltree, 3)

 l
[[1]]
Call:
multinom(formula = MN_fact ~ Canpy + mean_dbh + num_beechoak +
num_class5 + prop_hard, data = hfdata, trace = FALSE)

Coefficients:
 (Intercept)   Canpymean_dbh num_beechoak num_class5
prop_hard
none -11.1845028 0.063880939  0.08440340   -0.7050239 -0.0998379
6.894522
sabrinus -10.6848488 0.055157318  0.19276777   -0.6441996  0.1219245
3.325704
volans-0.2481854 0.004410597 -0.02710102   -0.1061700 -0.1858376
2.495856

Residual Deviance: 163.7211
AIC: 199.7211

[[2]]
Call:
multinom(formula = MN_fact ~ Canpy + mean_dbh + num_beechoak +
num_class5 + prop_hard, data = hfdata, trace = FALSE)

Coefficients:
 (Intercept)   Canpymean_dbh num_beechoak num_class5
prop_hard
none -11.1845028 0.063880939  0.08440340   -0.7050239 -0.0998379
6.894522
sabrinus -10.6848488 0.055157318  0.19276777   -0.6441996  0.1219245
3.325704
volans-0.2481854 0.004410597 -0.02710102   -0.1061700 -0.1858376
2.495856

Residual Deviance: 163.7211
AIC: 199.7211

[[3]]
Call:
multinom(formula = MN_fact ~ Canpy + mean_dbh + num_beechoak +
num_class5 + prop_hard, data = hfdata, trace = FALSE)

Coefficients:
 (Intercept)   Canpymean_dbh num_beechoak num_class5
prop_hard
none -11.1845028 0.063880939  0.08440340   -0.7050239 -0.0998379
6.894522
sabrinus -10.6848488 0.055157318  0.19276777   -0.6441996  0.1219245
3.325704
volans-0.2481854 0.004410597 -0.02710102   -0.1061700 -0.1858376
2.495856

Residual Deviance: 163.7211
AIC: 199.7211

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Estimates of coefficient variances and covariances from a multinomial logistic regression?

2008-11-26 Thread Grant Gillis
Hello and thanks in advance for any help,

I am using the 'multinom' function from the nnet package to calculate a
multinomial logistic regression.  I would like to get a matrix estimates of
the estimated coefficient variances and covariances.   Am I missing some
easy way to extract these?

Grant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] restricted bootstrap

2008-09-04 Thread Grant Gillis
Hello Professor Ripely,

Sorry for not being clear.  I posted after a long day of struggling.  Also
my toy distance matrix should have been symmetrical.

Simply put I have spatially autocorrelated data collected from many points.
I would like to do a linear regression on these data.  To deal with the
autocrrelation I want to resample a subset of my data with replacement but I
need to restrict subsets such that no two locations where data was collected
are closer than Xm apart (further apart than the autocrrelation in the
data).

Thanks for having a look at this for me.  I will look up the hard-core
spatial point process.

Grant

2008/9/4 Prof Brian Ripley [EMAIL PROTECTED]

 I see nothing here to do with the 'bootstrap', which is sampling with
 replacement.

 Do you know what you mean exactly by 'randomly sample'?  In general the way
 to so this is to sample randomly (uniformly, whatever) and reject samples
 that do not meet your restriction.   For some restrictions there are more
 efficient algorithms, but I don't understand yours.  (What are the 'rows'?
  Do you want to sample rows in space or xy locations?  How come 'dist' is
 not symmetric?)  For some restrictions, an MCMC sampling scheme is needed,
 the hard-core spatial point process being a related example.


 On Wed, 3 Sep 2008, Grant Gillis wrote:

  Hello List,

 I am not sure that I have the correct terminology here (restricted
 bootstrap) which may be hampering my archive searches.  I have quite a
 large
 spatially autocorrelated data set.  I have xy coordinates and the
 corresponding pairwise distance matrix (metres) for each row.  I would
 like
 to randomly sample some number of rows but restricting samples such that
 the
 distance between them is larger than the threshold of autocorrelation.  I
 have been been unsuccessfully trying to link the 'sample' function to
 values
 in the distance matrix.

 My end goal is to randomly sample M thousand rows of data N thousand times
 calculating linear regression coefficients for each sample but am stuck on
 taking the initial sample. I believe I can figure out the rest.


 Example Question

 I would like to radomly sample 3 rows further but withe the restriction
 that
 they are greater than 100m apart

 example data:
 main data:

 y- c(1, 2, 9, 5, 6)
 x-c( 1, 3, 5, 7, 9)
 z-c(2, 4, 6, 8, 10)
 a-c(3, 9, 6, 4 ,4)

 maindata-cbind(y, x, z, a)

y x x a
 [1,] 1 1 1 3
 [2,] 2 3 3 9
 [3,] 9 5 5 6
 [4,] 5 7 7 4
 [5,] 6 9 9 4

 distance matrix:
 row1-c(0, 123, 567, 89)
 row2-c(98, 0, 345, 543)
 row3-c(765, 90, 0, 987)
 row4-c(654, 8, 99, 0)

 dist-rbind(row1, row2, row3, row4)

[,1] [,2] [,3] [,4]
 row10  123  567   89
 row2   980  345  543
 row3  765   900  987
 row4  6548   990

 Thanks for all of the help in the past and now

 Cheers
 Grant

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 --
 Brian D. Ripley,  [EMAIL PROTECTED]
 Professor of Applied Statistics,  
 http://www.stats.ox.ac.uk/~ripley/http://www.stats.ox.ac.uk/%7Eripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] restricted bootstrap

2008-09-03 Thread Grant Gillis
Hello List,

I am not sure that I have the correct terminology here (restricted
bootstrap) which may be hampering my archive searches.  I have quite a large
spatially autocorrelated data set.  I have xy coordinates and the
corresponding pairwise distance matrix (metres) for each row.  I would like
to randomly sample some number of rows but restricting samples such that the
distance between them is larger than the threshold of autocorrelation.  I
have been been unsuccessfully trying to link the 'sample' function to values
in the distance matrix.

My end goal is to randomly sample M thousand rows of data N thousand times
calculating linear regression coefficients for each sample but am stuck on
taking the initial sample. I believe I can figure out the rest.


Example Question

I would like to radomly sample 3 rows further but withe the restriction that
they are greater than 100m apart

example data:
main data:

y- c(1, 2, 9, 5, 6)
x-c( 1, 3, 5, 7, 9)
z-c(2, 4, 6, 8, 10)
a-c(3, 9, 6, 4 ,4)

maindata-cbind(y, x, z, a)

 y x x a
[1,] 1 1 1 3
[2,] 2 3 3 9
[3,] 9 5 5 6
[4,] 5 7 7 4
[5,] 6 9 9 4

distance matrix:
row1-c(0, 123, 567, 89)
row2-c(98, 0, 345, 543)
row3-c(765, 90, 0, 987)
row4-c(654, 8, 99, 0)

dist-rbind(row1, row2, row3, row4)

 [,1] [,2] [,3] [,4]
row10  123  567   89
row2   980  345  543
row3  765   900  987
row4  6548   990

Thanks for all of the help in the past and now

Cheers
Grant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] problems formating scientific collaboration data

2008-08-27 Thread Grant Gillis
Hello all and thanks in advance for any help or direction.  I have
co-authorship data that looks like:


PaperAuthor  Year
1   SmithKK  JonesSD   2008
2   WallaceAR  DarwinCA  1999
3   HawkingS2003


I would like:
Paper  Author  Year
1 SmithKK   2008
1 JonesSD   2008
2 WallaceAR   1999
2  DarwinCA1999
3  HawkingS2003



Thanks for your patience with what is likely an easy question
 r-help@r-project.org

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] resampling from distributions

2008-04-19 Thread Grant Gillis
I am sorry for the incorrect subject.  My subject autofilled without my
noticing in time.  I suppose a better subject would be Calculating
proportion of shared occurances and randomizations.

Grant

2008/4/19 Grant Gillis [EMAIL PROTECTED]:

 Hello All,

 Once again thanks for all of the help to date.  I am climbing my R
 learning curve.  I've got a few more questions that I hope I can get some
 guidance on though.   I am not sure whether the etiquette is to break up
 multiple questions or not but I'll keep them together here for now as it may
 help put the questions in context despite the fact that the post may get a
 little long.


 Question 1:


 My first goal is to calculate the proportion of shared 1) behaviours and
 2) alleles between numerous individuals.  Pasted below ('propshared'
 function) is what I have now and and works very well for calculating the
 proportion of shared behaviours where the data is formatted with each column
 as a behaviour and each row an individual.  Microsatellite genotypes are
 formatted differently.  An example is below.  Each row is an individual and
 each column is one allele from a single locus.  From the below values L1
 and L1.1 each give a copy of an allele for same locus.  Occasionally values
 from different loci will have the same value altough these are not actually
 the same allele.

 I would like the calculation of the proportion of shared values for
 alleles to be restricted to the proportion of shared alleles within loci for
 all individuals (pairs of columns L1 and L1.1, L2 and L2.2)  What I have
 now calculates the proportion of shared values for alleles across loci.  A
 specific example is that I would like the value *2* for individual *w *at
 *L1* to be considered the same as the value* 2* for individual *y* at *
 L1.1* but not the same as the value *2* for any other individual within
 any other pair of columns.


 genos- data.frame(

 L1 = c(2,NA,1,3),
 L1 = c(1,NA,2,3),
 L2 = c(5,2,5,3),
 L2 = c(3,4,2,4),
 L3 = c(4,5,7,2),
 L3 = c(4,6,6,6) )

 rownames(genos) = c(w,x,y,z)

  genos
  L1   L1.1 L2  L2.1 L3   L3.1
 w21 53  4  4
 x   NA   NA  24  5  6
 y12 5 2  7  6
 z33 3 4  2  6



 propshared-function(genos){

 sapply( rownames(genos), function(ind1)
 sapply( rownames(genos), function(ind2)
 (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE )))
 /length(genos[1,]))-x
 is.na(diag(x))-TRUE
 x

 }

  propshared(genos)
   w x y z
 wNA 0.000 0.167 0.167
 x 0.000NA 0.167 0.333
 y 0.167 0.167NA 0.333
 z 0.167 0.333 0.333NA


 The matrix I would like to have would look like this.
   w   xy
  z
 wNA 0  0.3 0.16667
 x0NA   0.16667
 0.16667
 y0.30.16667NA0.16667
 z0.166670.166670.16667  NA


 Question 2:  Thanks if you have made it this far..Next I would
 like to calculate a randomized value of the mean proportion of shared
 alleles.   To do this I thought I would randomize the original data (genos
 above say 1000 times ), recalculate the proportion of shared alleles at each
 step and then take the mean (my attempt below).   When I do this I get the
 same mean proportion of shared alleles (or behaviours) as the original for
 every randomization.  I assume that this is due to some property of
 permuting this type of data that I do not know.  Does anyone have a
 recommendation as to how I might get a value of the proportion of shared
 alleles if alleles were distributed (again within loci) at random?


 randomize - function(genos){
 x - apply(genos, 2, sample)
 rownames(x) - rownames(genos)
 x
 }


 allele.permute-function(genos, n){

 list-replicate(n,randomize(genos), simplify = FALSE)
 sapply(list, propshared, simplify = FALSE)
 }






 I hope this is clear.  I appreciate all insights and input
 Thanks

 Grant





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] permutation/randomization

2008-04-09 Thread Grant Gillis
Hello,

I have what I suspect might be an easy problem but I am new to R and
stumped.  I have a data set that looks something like this

b-c(2,3,4,5,6,7,8,9)
x-c(2,3,4,5,6,7,8,9)
y-c(9,8,7,6,5,4,3,2)
z-c(9,8,7,6,1,2,3,4)
data-cbind(x,y,z)
row.names(data)-c('a','b','c','d','e','f','g','h')

which gives:

 x y z
a 2 9 9
b 3 8 8
c 4 7 7
d 5 6 6
e 6 5 1
f 7 4 2
g 8 3 3
h 9 2 4

I would like to randomize data within columns. The closest I have been able
to come permutes data within the columns but keeps rows together along with
row names(example below).  For some context, eventually I would like use
this to generate many data sets and perform calculations on these random
data sets (I think I know how to do this but we'll see).  So ideally I would
like row names to remain the same (in order a through h) and column data to
randomize within columns but independently of the other columns.  Just
shuffle the data deck I guess


 data[permute(1:length(data[,1])),]
  x y z
b 3 8 8
c 4 7 7
h 9 2 4
e 6 5 1
f 7 4 2
a 2 9 9
g 8 3 3
d 5 6 6


Thanks in advance for the help and also for the good advice earlier this
week.

Cheers

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] row by row similarity

2008-04-06 Thread Grant Gillis
Hello all and thanks in advance for any advice.
I am very new to R and have searched my question but have not come up with
anything quite like what I would like to do.

My problem is:

I have a data set for individuals (rows) and values for behaviours
(columns).  I would like to know the proportion of shared behaviours for all
possible pairs of individuals.  The sum of shared behaviours divided by the
total.  There are zeros in the data that I would like treated as the
behaviour does not exist.


example data format:

indB1  B2  B3  B4  B5  B6
w   215344
x   123456
y   135276
z   232426


Desired output:

w  x   0
w  y   0.17
w  z   0
x   y   0.3
x   z   0.3
etc.


Thanks

Grant

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.