[R] Remove individual rows from a matrix based upon a list
Dear All, Thanks in advance for any help. I have a square matrix of measures of interactions among individuals and would like to calculate a values from a function (colSums for example) with a single individual (row) excluded in each instance. That individual would be returned to the matrix before the next is removed and the function recalculated. I can do this by hand removing rows based upon ids however I would like specify individuals to be removed from a list (lots of data). An example matrix: MyMatrix E985047 E985071 E985088 F952477 F952478 J644805 J644807 J644813 E985047 1 0.09 0 0 0 0 0 0.4 E985071 0.09 1 0 0 0 0 0 0.07 E985088 0 0 1 0 0 0 0.14 0 F952477 0 0 0 1 0.38 0 0 0 F952478 0 0 0 0.38 1 0 0 0 J644805 0 0 0 0 0 1 0.07 0 J644807 0 0 0.14 0 0 0.07 1 0 J644813 0.4 0.07 0 0 0 0 0 1 Example list of individuals to be removed MyList E985088 F952477 F952478 If I were to do this by hand it would look like MyMat1 - MyMatrix[!rownames(MyMatrix)%in% E985088,] colSums(MyMat1) MyMat2 - MyMatrix[!rownames(MyMatrix)%in% F952477 ,] colSums(MyMat2) MyMat3 - MyMatrix[!rownames(MyMatrix)%in% F952478 ,] colSums(MyMat3) How might I replace the individual ids (in quotes) with a list and remove rows corresponding to that list from the matrix for the calculation and returning the row to the list after each calculation before the next. I hope I've been clear!! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] calculating time interval distributions
Dear List, I have data on a approximately 100 individuals visiting a a central logging station over a 1000 times. I would like to be able to calculate the distribution of inter-visit time intervals for all possible pairs am stuck on how to code for this. Single pairs are not a problem but extending it has been difficult for me. So for the toy data below I'd like to calculate for each 'a' how long until I see 'b' as well as for each 'b', how long until I see 'a' and so on for all possible pairs (2 triangle of a matrix?) Thanks for any help, hints, and suggestions. Grant toy data: ind - c('a', 'b', 'a', 'c', 'b', 'b', 'c', 'a', 'c') sec - c(1, 3, 5, 6, 12, 22, 66, 85, 99) What I am looking for ab 2 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tricky (for me) merging of data...more clarity
For our animals we are comfortable with saying that body condition represents a roughly 30 day period, plus and minus 15 days from measurement. However, we have monitored individuals for longer periods and during those periods we do not wish there to be any values for body condition. There are other data associated with those days. For some analyses we will use data from the periods with body condition data and for others not. Thanks very much for looking into this! An R solution would save tons of time and potential mistakes. On 2 March 2011 09:40, Tal Galili tal.gal...@gmail.com wrote: Question, How do you know that the following two rows should have NAs ? 1 16/02/87 NA NA 1 17/02/87 NA NA Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Tue, Mar 1, 2011 at 11:06 AM, Grant Gillis grant.j.gil...@gmail.comwrote: 1 16/02/87 NA NA 1 17/02/87 NA NA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] tricky (for me) merging of data...more clarity
Hi Again, Thanks very much for your response. It seems my example got rearranged (transposed?) after I posted it. Hopefully this example will be more clear. I have one file (ex. sheet 1) that will have a column for individuals (ind) and a column for the date (date). I would like to merge this with another file (ex. sheet 2) that has both the 'ind' and date column as well as associated body condition measurements (BC1 and BC2). My problem: The body condition values were measured intermittently throughout an individuals history but for our purposes we would like to treat them as representative of 15 days before and after measurement and days outside of this window should have NAs (there are other data associated with those days). When I merge these to files, is there a way to write these body condition values forward and back 15 days?This would give me something that looks like sheet 3. Thank you! Sheet 1 ind date 1 01/02/87 1 02/02/87 1 03/02/87 1 04/02/87 1 05/02/87 1 06/02/87 1 07/02/87 1 08/02/87 1 09/02/87 1 10/02/87 1 11/02/87 1 12/02/87 1 13/02/87 1 14/02/87 1 15/02/87 1 16/02/87 1 17/02/87 1 18/02/87 1 19/02/87 1 20/02/87 1 21/02/87 1 22/02/87 1 23/02/87 1 24/02/87 1 25/02/87 1 26/02/87 1 27/02/87 1 28/02/87 1 01/03/87 1 02/03/87 1 03/03/87 1 04/03/87 1 05/03/87 1 06/03/87 1 07/03/87 1 08/03/87 1 09/03/87 1 10/03/87 1 11/03/87 1 12/03/87 1 13/03/87 1 14/03/87 1 15/03/87 1 16/03/87 1 17/03/87 1 18/03/87 1 19/03/87 1 20/03/87 1 21/03/87 1 22/03/87 1 23/03/87 1 24/03/87 Sheet 2 ind DateBC1 BC2 1 01/02/87 33 3 1 03/03/87 44 3 Sheet 3 ind date BC1 BC2 1 01/02/87 33 3 1 02/02/87 33 3 1 03/02/87 33 3 1 04/02/87 33 3 1 05/02/87 33 3 1 06/02/87 33 3 1 07/02/87 33 3 1 08/02/87 33 3 1 09/02/87 33 3 1 10/02/87 33 3 1 11/02/87 33 3 1 12/02/87 33 3 1 13/02/87 33 3 1 14/02/87 33 3 1 15/02/87 33 3 1 16/02/87 NA NA 1 17/02/87 NA NA 1 18/02/87 44 3 1 19/02/87 44 3 1 20/02/87 44 3 1 21/02/87 44 3 1 22/02/87 44 3 1 23/02/87 44 3 1 24/02/87 44 3 1 25/02/87 44 3 1 26/02/87 44 3 1 27/02/87 44 3 1 28/02/87 44 3 1 01/03/87 44 3 1 02/03/87 44 3 1 03/03/87 44 3 1 04/03/87 44 3 1 05/03/87 44 3 1 06/03/87 44 3 1 07/03/87 44 3 1 08/03/87 44 3 1 09/03/87 44 3 1 10/03/87 44 3 1 11/03/87 44 3 1 12/03/87 44 3 1 13/03/87 44 3 1 14/03/87 44 3 1 15/03/87 44 3 1 16/03/87 44 3 1 17/03/87 44 3 1 18/03/87 44 3 1 19/03/87 44 3 1 20/03/87 44 3 1 21/03/87 NA NA 1 22/03/87 NA NA 1 23/03/87 NA NA 1 24/03/87 NA NA On 27 February 2011 20:49, Tal Galili tal.gal...@gmail.com wrote: Hi Grant, I don't have a solution, but just to be clearer on your situation: One row from sheet 2 looks like this: BC1 BC2 1 01/02/87 33 3 1 03/03/87 44 3 ? Are you using only the first 6 columns for the data to be replicated, and using the other columns as some sort of indicators on when a sequence ends? If so, I would suggest asking the group how you might be able to turn sheet 2 so that it will have as many rows as you needs (which you will then merge with sheet 1). And for clarity sake, consider using ?dput for your objects. Looking at data the way you pasted them (also, without column names) is not very easy for the reader (which might reduce your chances of getting help). Cheers, Tal Contact Details:--- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- On Sun, Feb 27, 2011 at 6:41 PM, Grant Gillis grant.j.gil...@gmail.comwrote: BC1 BC2 1 01/02/87 33 3 1 03/03/87 44 3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] tricky (for me) merging of data
Dear List, I am having trouble with a tricky merging task. I have one data sheet that has dates (continuous) that radio collared individuals were monitored via telemetry. I have a different sheet containing data from instances where individuals were recaptured and associated body condition data was recorded (sheet 2). I would like to merge the two sheets by individual and date (I can do this with the merge function) but I would also like to copy the body condition data ahead and behind 15 days (or less if the individual was recorded fewer than 15 days in either direction) from the date it was recorded (this is where I'm stuck). Thank you very much! Grant For example sheet 1 would be merged with sheet 2 to give sheet 3: Sheet 1 ind date 1 01/02/87 1 02/02/87 1 03/02/87 1 04/02/87 1 05/02/87 1 06/02/87 1 07/02/87 1 08/02/87 1 09/02/87 1 10/02/87 1 11/02/87 1 12/02/87 1 13/02/87 1 14/02/87 1 15/02/87 1 16/02/87 1 17/02/87 1 18/02/87 1 19/02/87 1 20/02/87 1 21/02/87 1 22/02/87 1 23/02/87 1 24/02/87 1 25/02/87 1 26/02/87 1 27/02/87 1 28/02/87 1 01/03/87 1 02/03/87 1 03/03/87 1 04/03/87 1 05/03/87 1 06/03/87 1 07/03/87 1 08/03/87 1 09/03/87 1 10/03/87 1 11/03/87 1 12/03/87 1 13/03/87 1 14/03/87 1 15/03/87 1 16/03/87 1 17/03/87 1 18/03/87 1 19/03/87 1 20/03/87 1 21/03/87 1 22/03/87 1 23/03/87 1 24/03/87 Sheet 2 ind BC1 BC2 1 01/02/87 33 3 1 03/03/87 44 3 Sheet 3 ind date BC1 BC2 1 01/02/87 33 3 1 02/02/87 33 3 1 03/02/87 33 3 1 04/02/87 33 3 1 05/02/87 33 3 1 06/02/87 33 3 1 07/02/87 33 3 1 08/02/87 33 3 1 09/02/87 33 3 1 10/02/87 33 3 1 11/02/87 33 3 1 12/02/87 33 3 1 13/02/87 33 3 1 14/02/87 33 3 1 15/02/87 33 3 1 16/02/87 NA NA 1 17/02/87 NA NA 1 18/02/87 44 3 1 19/02/87 44 3 1 20/02/87 44 3 1 21/02/87 44 3 1 22/02/87 44 3 1 23/02/87 44 3 1 24/02/87 44 3 1 25/02/87 44 3 1 26/02/87 44 3 1 27/02/87 44 3 1 28/02/87 44 3 1 01/03/87 44 3 1 02/03/87 44 3 1 03/03/87 44 3 1 04/03/87 44 3 1 05/03/87 44 3 1 06/03/87 44 3 1 07/03/87 44 3 1 08/03/87 44 3 1 09/03/87 44 3 1 10/03/87 44 3 1 11/03/87 44 3 1 12/03/87 44 3 1 13/03/87 44 3 1 14/03/87 44 3 1 15/03/87 44 3 1 16/03/87 44 3 1 17/03/87 44 3 1 18/03/87 44 3 1 19/03/87 44 3 1 20/03/87 44 3 1 21/03/87 NA NA 1 22/03/87 NA NA 1 23/03/87 NA NA 1 24/03/87 NA NA [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] adding copies of rows toa data frame based upon start and end dates
Hello All and thanks in advance for any advice. I have a data frame with rows corresponding radio-collared animals (see sample data below). There is a start date (DATESTART), and end date (DATEEND), and the number of days on air (DAYSONAIR). What I would like to do is add a column called DATE to so that each row ID has a row for every day that the radio collar was on the air while copying all other data. For example ID 1001 would expand into 48 rows beginning with 4/17/91 and ending with 6/4/91. all other values would remain the same for each new rowUnfortunately I have not gotten anywhere with my attempts Thank you!! IDGRIDFOODWB1WB2SADRUGFREQDATESTART DATECOLLARDATEENDDAYSONAIR 100110319999FAI14824/17/91 4/17/916/4/9148.00 100210659671MAC14084/17/91 4/17/916/25/9169.00 100310325662FAI07694/17/91 4/17/916/4/93779.00 100410322655FAC15614/18/91 4/18/915/27/9139.00 93100510654899MAI12884/18/91 4/18/915/27/9139.00 94100610301651MAC15934/18/91 4/18/917/11/9184.00 95100710349669FAI15214/18/91 4/18/9111/2/91198.00 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Variance inflation factor
Hello all and thanks in advance for any advice. I would like to calculate the variance inflation factor for a linear model (lm) with 4 explanatory variables. I would then like to use this to calculate QAIC. I have used the function vif() in the car package and I get values for each variable however the equation for QAIC seems to need a single variance inflation factor for the global model. Can I calculate this based upon the output from this function and if so can someone help me understand how? Cheers, G. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with adding points to allEffects plot
Thanks in advance for any help. I am attempting to add points to a plot using the allEffects command in the effects package. When I try to add the points I get the following error message: Error in plot.xy(xy.coords(x, y), type = type, ...) : plot.new has not been called yet Strangely, using the code I've pasted below this has worked for me in the past however figuring out what has changed has proved to be beyond me. Cheers, Grant y-c(1,3,2,4,5) x-c(1,2,3,4,5) GSMOD-lm(y~x) plot(allEffects(GSMOD), ask=F) points(y, x) Error in plot.xy(xy.coords(x, y), type = type, ...) : plot.new has not been called yet [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] multinom() and multinomial() interpretation
Hello and thanks in advance for any advice. I am not clear how, in practice, the multinom() function in nnet and the multinomial() function in VGAM differ in terms of interpretation. I understand that they are fit differently. Are there certain scenarios where one is more appropriate than the other? In my case I have a dependent variable with 4 categories and 1 binary and 4 continuous independent variables. I am fitting 3 models. Cheers Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging files with different structures
Hello list, Thanks in advance for any help. I have many (approx 20) files that I have merged. For example d1-read.csv(AlleleReport.csv) d2-read.csv(AlleleReport.csv) m1 - merge(d1, d2, by = c(IND, intersect(colnames(d1), colnames(d2))), all = TRUE) m2 - merge(m1, d3, by = c(IND, intersect(colnames(m1), colnames(d3))), all = TRUE) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging files with different structures
Hello list, I am sorry for the previous half post. I accidentily hit send. Thanks again in advance for any help. I have many (approx 20) files that I have merged. Each data set contains rows for individuals and data in 2 - 5 columns (depending upon which data set). The individuals in each data set are not necessarily the same and are all duplicated (different data in columns) across sheets I am trying to merge. I have used the merge function For example d1-read.csv(AlleleReport.csv) d2-read.csv(AlleleReport.csv) m1 - merge(d1, d2, by = c(IND, intersect(colnames(d1), colnames(d2))), all = TRUE) m2 - merge(m1, d3, by = c(IND, intersect(colnames(m1), colnames(d3))), all = TRUE) My problem is that when the data is merged it looks something like Ind L1 L1.1L2 L2.1L3 L3.1 a 12 13 NA NA NA NA a NA NA 22 43 34 45 b 14 1545 64 NA NA b NANANA NA 99 84 Is there a way that I can merge the rows for each individual? Cheers Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with a permutation test
Hello List and thanks in advance for all of your help, I am trying implement a permutation test of a multinomial logistic regression ('multinom' within the nnet package). In the end I want to compare the parameter estimate from my data to the distribution of randomized parameter estimates. I have figured out how to permute my dependent variable (MNNUM) x number of times, apply multinomial logistic regression, to each permutation, and save the results in a list. Where I am stuck is figuring out how to take the mean and SD of the coefficients from my list of regressions. I know that the coefficients are stored in the $wts slot of the model. Below is what I have so far. I am sure there are nicer ways to do this and if you feel so inclined please suggest them. #this is a function to permute the MNNUM column once rand- function(DF){ new.DF-DF new.DF$MNNUM-sample(new.DF$MNNUM) new.DF } #this function does one model I am interested in. modeltree-function(DF){ MLM.plot - multinom(MN_fact ~ Canpy + mean_dbh + num_beechoak + num_class5 + prop_hard , data=hfdata, trace=FALSE) MLM.plot } # this replicates the 'rand' function and applies a model resamp.funct-function(DF,funct, n){ list-replicate(n,rand(DF), simplify = FALSE) sapply(list, funct, simplify = FALSE) } #So if I paste below: l-resamp.funct(hfdata, modeltree, 3) # I get l-resamp.funct(hfdata, modltree, 3) l [[1]] Call: multinom(formula = MN_fact ~ Canpy + mean_dbh + num_beechoak + num_class5 + prop_hard, data = hfdata, trace = FALSE) Coefficients: (Intercept) Canpymean_dbh num_beechoak num_class5 prop_hard none -11.1845028 0.063880939 0.08440340 -0.7050239 -0.0998379 6.894522 sabrinus -10.6848488 0.055157318 0.19276777 -0.6441996 0.1219245 3.325704 volans-0.2481854 0.004410597 -0.02710102 -0.1061700 -0.1858376 2.495856 Residual Deviance: 163.7211 AIC: 199.7211 [[2]] Call: multinom(formula = MN_fact ~ Canpy + mean_dbh + num_beechoak + num_class5 + prop_hard, data = hfdata, trace = FALSE) Coefficients: (Intercept) Canpymean_dbh num_beechoak num_class5 prop_hard none -11.1845028 0.063880939 0.08440340 -0.7050239 -0.0998379 6.894522 sabrinus -10.6848488 0.055157318 0.19276777 -0.6441996 0.1219245 3.325704 volans-0.2481854 0.004410597 -0.02710102 -0.1061700 -0.1858376 2.495856 Residual Deviance: 163.7211 AIC: 199.7211 [[3]] Call: multinom(formula = MN_fact ~ Canpy + mean_dbh + num_beechoak + num_class5 + prop_hard, data = hfdata, trace = FALSE) Coefficients: (Intercept) Canpymean_dbh num_beechoak num_class5 prop_hard none -11.1845028 0.063880939 0.08440340 -0.7050239 -0.0998379 6.894522 sabrinus -10.6848488 0.055157318 0.19276777 -0.6441996 0.1219245 3.325704 volans-0.2481854 0.004410597 -0.02710102 -0.1061700 -0.1858376 2.495856 Residual Deviance: 163.7211 AIC: 199.7211 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimates of coefficient variances and covariances from a multinomial logistic regression?
Hello and thanks in advance for any help, I am using the 'multinom' function from the nnet package to calculate a multinomial logistic regression. I would like to get a matrix estimates of the estimated coefficient variances and covariances. Am I missing some easy way to extract these? Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] restricted bootstrap
Hello Professor Ripely, Sorry for not being clear. I posted after a long day of struggling. Also my toy distance matrix should have been symmetrical. Simply put I have spatially autocorrelated data collected from many points. I would like to do a linear regression on these data. To deal with the autocrrelation I want to resample a subset of my data with replacement but I need to restrict subsets such that no two locations where data was collected are closer than Xm apart (further apart than the autocrrelation in the data). Thanks for having a look at this for me. I will look up the hard-core spatial point process. Grant 2008/9/4 Prof Brian Ripley [EMAIL PROTECTED] I see nothing here to do with the 'bootstrap', which is sampling with replacement. Do you know what you mean exactly by 'randomly sample'? In general the way to so this is to sample randomly (uniformly, whatever) and reject samples that do not meet your restriction. For some restrictions there are more efficient algorithms, but I don't understand yours. (What are the 'rows'? Do you want to sample rows in space or xy locations? How come 'dist' is not symmetric?) For some restrictions, an MCMC sampling scheme is needed, the hard-core spatial point process being a related example. On Wed, 3 Sep 2008, Grant Gillis wrote: Hello List, I am not sure that I have the correct terminology here (restricted bootstrap) which may be hampering my archive searches. I have quite a large spatially autocorrelated data set. I have xy coordinates and the corresponding pairwise distance matrix (metres) for each row. I would like to randomly sample some number of rows but restricting samples such that the distance between them is larger than the threshold of autocorrelation. I have been been unsuccessfully trying to link the 'sample' function to values in the distance matrix. My end goal is to randomly sample M thousand rows of data N thousand times calculating linear regression coefficients for each sample but am stuck on taking the initial sample. I believe I can figure out the rest. Example Question I would like to radomly sample 3 rows further but withe the restriction that they are greater than 100m apart example data: main data: y- c(1, 2, 9, 5, 6) x-c( 1, 3, 5, 7, 9) z-c(2, 4, 6, 8, 10) a-c(3, 9, 6, 4 ,4) maindata-cbind(y, x, z, a) y x x a [1,] 1 1 1 3 [2,] 2 3 3 9 [3,] 9 5 5 6 [4,] 5 7 7 4 [5,] 6 9 9 4 distance matrix: row1-c(0, 123, 567, 89) row2-c(98, 0, 345, 543) row3-c(765, 90, 0, 987) row4-c(654, 8, 99, 0) dist-rbind(row1, row2, row3, row4) [,1] [,2] [,3] [,4] row10 123 567 89 row2 980 345 543 row3 765 900 987 row4 6548 990 Thanks for all of the help in the past and now Cheers Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/http://www.stats.ox.ac.uk/%7Eripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] restricted bootstrap
Hello List, I am not sure that I have the correct terminology here (restricted bootstrap) which may be hampering my archive searches. I have quite a large spatially autocorrelated data set. I have xy coordinates and the corresponding pairwise distance matrix (metres) for each row. I would like to randomly sample some number of rows but restricting samples such that the distance between them is larger than the threshold of autocorrelation. I have been been unsuccessfully trying to link the 'sample' function to values in the distance matrix. My end goal is to randomly sample M thousand rows of data N thousand times calculating linear regression coefficients for each sample but am stuck on taking the initial sample. I believe I can figure out the rest. Example Question I would like to radomly sample 3 rows further but withe the restriction that they are greater than 100m apart example data: main data: y- c(1, 2, 9, 5, 6) x-c( 1, 3, 5, 7, 9) z-c(2, 4, 6, 8, 10) a-c(3, 9, 6, 4 ,4) maindata-cbind(y, x, z, a) y x x a [1,] 1 1 1 3 [2,] 2 3 3 9 [3,] 9 5 5 6 [4,] 5 7 7 4 [5,] 6 9 9 4 distance matrix: row1-c(0, 123, 567, 89) row2-c(98, 0, 345, 543) row3-c(765, 90, 0, 987) row4-c(654, 8, 99, 0) dist-rbind(row1, row2, row3, row4) [,1] [,2] [,3] [,4] row10 123 567 89 row2 980 345 543 row3 765 900 987 row4 6548 990 Thanks for all of the help in the past and now Cheers Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problems formating scientific collaboration data
Hello all and thanks in advance for any help or direction. I have co-authorship data that looks like: PaperAuthor Year 1 SmithKK JonesSD 2008 2 WallaceAR DarwinCA 1999 3 HawkingS2003 I would like: Paper Author Year 1 SmithKK 2008 1 JonesSD 2008 2 WallaceAR 1999 2 DarwinCA1999 3 HawkingS2003 Thanks for your patience with what is likely an easy question r-help@r-project.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] resampling from distributions
I am sorry for the incorrect subject. My subject autofilled without my noticing in time. I suppose a better subject would be Calculating proportion of shared occurances and randomizations. Grant 2008/4/19 Grant Gillis [EMAIL PROTECTED]: Hello All, Once again thanks for all of the help to date. I am climbing my R learning curve. I've got a few more questions that I hope I can get some guidance on though. I am not sure whether the etiquette is to break up multiple questions or not but I'll keep them together here for now as it may help put the questions in context despite the fact that the post may get a little long. Question 1: My first goal is to calculate the proportion of shared 1) behaviours and 2) alleles between numerous individuals. Pasted below ('propshared' function) is what I have now and and works very well for calculating the proportion of shared behaviours where the data is formatted with each column as a behaviour and each row an individual. Microsatellite genotypes are formatted differently. An example is below. Each row is an individual and each column is one allele from a single locus. From the below values L1 and L1.1 each give a copy of an allele for same locus. Occasionally values from different loci will have the same value altough these are not actually the same allele. I would like the calculation of the proportion of shared values for alleles to be restricted to the proportion of shared alleles within loci for all individuals (pairs of columns L1 and L1.1, L2 and L2.2) What I have now calculates the proportion of shared values for alleles across loci. A specific example is that I would like the value *2* for individual *w *at *L1* to be considered the same as the value* 2* for individual *y* at * L1.1* but not the same as the value *2* for any other individual within any other pair of columns. genos- data.frame( L1 = c(2,NA,1,3), L1 = c(1,NA,2,3), L2 = c(5,2,5,3), L2 = c(3,4,2,4), L3 = c(4,5,7,2), L3 = c(4,6,6,6) ) rownames(genos) = c(w,x,y,z) genos L1 L1.1 L2 L2.1 L3 L3.1 w21 53 4 4 x NA NA 24 5 6 y12 5 2 7 6 z33 3 4 2 6 propshared-function(genos){ sapply( rownames(genos), function(ind1) sapply( rownames(genos), function(ind2) (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE ))) /length(genos[1,]))-x is.na(diag(x))-TRUE x } propshared(genos) w x y z wNA 0.000 0.167 0.167 x 0.000NA 0.167 0.333 y 0.167 0.167NA 0.333 z 0.167 0.333 0.333NA The matrix I would like to have would look like this. w xy z wNA 0 0.3 0.16667 x0NA 0.16667 0.16667 y0.30.16667NA0.16667 z0.166670.166670.16667 NA Question 2: Thanks if you have made it this far..Next I would like to calculate a randomized value of the mean proportion of shared alleles. To do this I thought I would randomize the original data (genos above say 1000 times ), recalculate the proportion of shared alleles at each step and then take the mean (my attempt below). When I do this I get the same mean proportion of shared alleles (or behaviours) as the original for every randomization. I assume that this is due to some property of permuting this type of data that I do not know. Does anyone have a recommendation as to how I might get a value of the proportion of shared alleles if alleles were distributed (again within loci) at random? randomize - function(genos){ x - apply(genos, 2, sample) rownames(x) - rownames(genos) x } allele.permute-function(genos, n){ list-replicate(n,randomize(genos), simplify = FALSE) sapply(list, propshared, simplify = FALSE) } I hope this is clear. I appreciate all insights and input Thanks Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] permutation/randomization
Hello, I have what I suspect might be an easy problem but I am new to R and stumped. I have a data set that looks something like this b-c(2,3,4,5,6,7,8,9) x-c(2,3,4,5,6,7,8,9) y-c(9,8,7,6,5,4,3,2) z-c(9,8,7,6,1,2,3,4) data-cbind(x,y,z) row.names(data)-c('a','b','c','d','e','f','g','h') which gives: x y z a 2 9 9 b 3 8 8 c 4 7 7 d 5 6 6 e 6 5 1 f 7 4 2 g 8 3 3 h 9 2 4 I would like to randomize data within columns. The closest I have been able to come permutes data within the columns but keeps rows together along with row names(example below). For some context, eventually I would like use this to generate many data sets and perform calculations on these random data sets (I think I know how to do this but we'll see). So ideally I would like row names to remain the same (in order a through h) and column data to randomize within columns but independently of the other columns. Just shuffle the data deck I guess data[permute(1:length(data[,1])),] x y z b 3 8 8 c 4 7 7 h 9 2 4 e 6 5 1 f 7 4 2 a 2 9 9 g 8 3 3 d 5 6 6 Thanks in advance for the help and also for the good advice earlier this week. Cheers [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] row by row similarity
Hello all and thanks in advance for any advice. I am very new to R and have searched my question but have not come up with anything quite like what I would like to do. My problem is: I have a data set for individuals (rows) and values for behaviours (columns). I would like to know the proportion of shared behaviours for all possible pairs of individuals. The sum of shared behaviours divided by the total. There are zeros in the data that I would like treated as the behaviour does not exist. example data format: indB1 B2 B3 B4 B5 B6 w 215344 x 123456 y 135276 z 232426 Desired output: w x 0 w y 0.17 w z 0 x y 0.3 x z 0.3 etc. Thanks Grant [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.